Skip to content

Add support for flagging legacy DOI citation styles in Reference classification #472

Description

@Aakash-pal

Summary

While DOI extraction is supported, the system does not currently distinguish between modern and legacy DOI citation formats.

Adding this distinction at the Reference level could improve downstream analysis and reporting.

Problem

DOIs may appear in different citation styles, for example:

Modern:
https://doi.org/10.xxxx/...
Legacy:
DOI: 10.xxxx/...

Both are valid and can be extracted, but there is currently no way to identify whether a DOI originated from a legacy citation format.

Proposed Enhancement

Extend Reference (in backends.py) to:

Detect when a DOI originates from a legacy citation pattern
Store this as additional metadata (e.g., legacy_format=True)
Example (conceptual)
Input: "DOI: 10.1007/s11089-018-0833-1"
Reference(
ref="10.1007/s11089-018-0833-1",
reftype="doi",
legacy_format=True
)

Expected Behavior

DOI extraction remains unchanged
Additional metadata is available for distinguishing citation styles

Notes

This is a feature enhancement, not a bug fix
Should not interfere with existing extraction logic
Can be extended later for reporting/UI use cases in downstream projects (e.g., rottingresearch)"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions