Summary
While DOI extraction is supported, the system does not currently distinguish between modern and legacy DOI citation formats.
Adding this distinction at the Reference level could improve downstream analysis and reporting.
Problem
DOIs may appear in different citation styles, for example:
Modern:
https://doi.org/10.xxxx/...
Legacy:
DOI: 10.xxxx/...
Both are valid and can be extracted, but there is currently no way to identify whether a DOI originated from a legacy citation format.
Proposed Enhancement
Extend Reference (in backends.py) to:
Detect when a DOI originates from a legacy citation pattern
Store this as additional metadata (e.g., legacy_format=True)
Example (conceptual)
Input: "DOI: 10.1007/s11089-018-0833-1"
Reference(
ref="10.1007/s11089-018-0833-1",
reftype="doi",
legacy_format=True
)
Expected Behavior
DOI extraction remains unchanged
Additional metadata is available for distinguishing citation styles
Notes
This is a feature enhancement, not a bug fix
Should not interfere with existing extraction logic
Can be extended later for reporting/UI use cases in downstream projects (e.g., rottingresearch)"
Summary
While DOI extraction is supported, the system does not currently distinguish between modern and legacy DOI citation formats.
Adding this distinction at the Reference level could improve downstream analysis and reporting.
Problem
DOIs may appear in different citation styles, for example:
Modern:
https://doi.org/10.xxxx/...
Legacy:
DOI: 10.xxxx/...
Both are valid and can be extracted, but there is currently no way to identify whether a DOI originated from a legacy citation format.
Proposed Enhancement
Extend Reference (in backends.py) to:
Detect when a DOI originates from a legacy citation pattern
Store this as additional metadata (e.g., legacy_format=True)
Example (conceptual)
Input: "DOI: 10.1007/s11089-018-0833-1"
Reference(
ref="10.1007/s11089-018-0833-1",
reftype="doi",
legacy_format=True
)
Expected Behavior
DOI extraction remains unchanged
Additional metadata is available for distinguishing citation styles
Notes
This is a feature enhancement, not a bug fix
Should not interfere with existing extraction logic
Can be extended later for reporting/UI use cases in downstream projects (e.g., rottingresearch)"