Skip to content

Conversation

glamberson
Copy link
Contributor

Add GEDCOM 7 Extensions Support for GRAMPS Compatibility

Summary

This PR adds support for three GEDCOM 7 extensions that enable full compatibility with GRAMPS' advanced data model, addressing the long-standing issue of data loss when exchanging GEDCOM files with GRAMPS.

Motivation

GRAMPS has a richer data model than standard GEDCOM supports, leading to data loss during import/export. This PR implements support for GEDCOM 7 extensions that preserve:

  • Shared events (multiple people at one census/burial/etc.)
  • Evidence containers (separation of sources from conclusions)
  • Template-based citations (following Evidence Explained standards)
  • Rich text formatting in notes

Changes

New Modules

  • occurrence.py - Handles shared event records (_OCUR/_OCREF)
  • evidence.py - Handles evidence containers (_EVID)
  • citation_templates.py - Handles citation templates (_TMPLT)
  • process_enhanced.py - Enhanced processor with extension detection
  • individual_enhanced.py - Enhanced individual handler
  • note_enhanced.py - HTML subset to StyledText conversion

Key Features

  1. Extension Detection: Reads SCHMA declarations to identify registered extensions
  2. Shared Events: Maps _OCUR records to GRAMPS Event objects
  3. Evidence Management: Maps _EVID to GRAMPS research notes
  4. Citation Templates: Preserves template structure in source attributes
  5. HTML Formatting: Converts GEDCOM 7 HTML subset to GRAMPS StyledText

Testing

  • Comprehensive test suite in test/test_extensions.py
  • Example GEDCOM files demonstrating each extension
  • Tests for extension interoperability

Example

# Shared census event
0 @O1@ _OCUR
1 TYPE Census
1 DATE 1850
1 _PART @I1@
2 ROLE Head

# Person references the event
0 @I1@ INDI
1 NAME John /Smith/
1 _OCREF @O1@

Compatibility

  • Fully backward compatible - unknown extensions are ignored
  • Follows GEDCOM 7 specification for extension handling
  • Maintains existing behavior for standard GEDCOM files

Related Issues

Documentation

Future Work

  • Add export support (currently import-only)
  • Support additional extensions as they become available
  • Add user configuration for extension handling

Testing Instructions

  1. Install dependencies: pip install -r requirements-dev.txt
  2. Run tests: pytest test/test_extensions.py
  3. Test with example files in test/data/

This PR represents a significant step forward in GEDCOM 7 adoption by demonstrating how extensions can solve real compatibility issues between genealogy applications.

This commit adds support for three GEDCOM 7 extensions that enable
full compatibility with GRAMPS' advanced data model:

1. gedcom-occurrences: Shared event records (census, burial, etc.)
2. gedcom-evidence: Evidence container support
3. gedcom-citations: Template-based citations

Key features:
- Maps _OCUR records to GRAMPS Event objects
- Maps _EVID to evidence notes with special formatting
- Preserves citation templates in source attributes
- Converts GEDCOM 7 HTML subset to GRAMPS StyledText
- Full backward compatibility

Addresses GRAMPS bug #12226 and implements extensions from
FamilySearch/GEDCOM issue #663.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@giotodibondone
Copy link

Very promising 🥇 thank you

Looking forward to testing; has it been tested with the actual addon?

Link to main discussion

@glamberson
Copy link
Contributor Author

Response to Testing Question

@giotodibondone Thank you for your interest! Yes, we have now tested the extensions, though not yet with addon PR 744. Here's what we've done:

Testing Completed ✅

  1. Created comprehensive test files for each extension:

    • occurrence_test.ged - Census with 6 people, marriages, deaths
    • evidence_test.ged - Floating evidence with confidence levels
    • citation_test.ged - Template-based citations
    • tag_test.ged - Organizational tags with colors
    • combined_test.ged - All extensions working together
  2. Successfully converted all test files using the gedcom2xml command:

    python -m gramps_gedcom7.gedcom2xml test.ged output.gramps

    All files converted without errors to GRAMPS XML format.

  3. Fixed issues discovered during testing:

    • Updated code to use enhanced processor for extension support
    • Added workaround for gedcom7 library not parsing TRLR
    • Fixed imports to use system GRAMPS installation

Testing Environment

  • GRAMPS 6.0.1 (Debian package)
  • Python 3.12/3.13
  • gedcom7 library 0.4.0
  • Our fork with extension handlers

What Still Needs Testing 🔄

  1. With addon PR 744: We haven't tested with the actual GRAMPS addon yet. Our testing was at the library level.

  2. GUI import: Need to verify the extensions appear correctly in GRAMPS interface.

  3. Data mapping verification: While files convert successfully, we need to confirm extension data is properly mapped to GRAMPS objects (events, notes, attributes).

Known Issues

  • The enhanced processor wasn't initially activated (now fixed)
  • Extension URLs in our test files need updating to match final repositories
  • Need to verify _TAG extension creates GRAMPS tags properly

Next Steps

  1. Test with addon PR 744
  2. Create video demonstration
  3. Add debug logging to verify extension processing
  4. Submit updated test files

Would you be interested in helping test once we have the addon integration working? The test files are in our fork's test-files/ directory.

The extension support is functional at the library level - files parse and convert successfully. The critical next step is verifying the complete workflow with the GRAMPS addon.

- Added 5 test GEDCOM 7 files covering all extensions
- Fixed importer to use enhanced processor for extensions
- Added workaround for gedcom7 library not parsing TRLR
- Documented test results showing successful conversion
- All test files convert successfully to GRAMPS XML format

Test files:
- occurrence_test.ged: Shared events with multiple participants
- evidence_test.ged: Floating evidence containers
- citation_test.ged: Template-based citations
- tag_test.ged: Organizational tags with colors
- combined_test.ged: All extensions working together
- Created gedcom7_patch.py to preserve original extension tag names
- Updated importer to use patched loader
- Updated process_enhanced to check for original_tag attribute
- This fixes the issue where extension tags were converted to URIs

See: DavidMStraub/python-gedcom7#5
@glamberson
Copy link
Contributor Author

Update: Added Patch for gedcom7 Library Issue

I've added a patch to work around the issue where the gedcom7 library converts extension tags to their URIs during parsing. This was preventing proper identification of extension tags.

Changes:

  • Added gedcom7_patch.py that preserves the original tag names
  • Updated the importer to use the patched loader
  • Extension tags are now properly identified and processed

Issue Details:

The upstream gedcom7 library has a bug where it replaces extension tags (like _TAG, _EVID) with their URI values during parsing. I've reported this issue: DavidMStraub/python-gedcom7#5

With this patch, all extension structures are now properly parsed and available for processing. The test files successfully parse all 14 extension structures with proper tag identification.

The patch is temporary and can be removed once the upstream library is fixed.

@DavidMStraub
Copy link
Owner

Sorry, but I am bit shocked by this huge PR which comes with any prior warning or any issue where implementation details could have been discussed, and which comes with huge markdown files with far-too-perfect formatting and emoji support that make it very obvious that a big part of this PR was prepared with the help of AI. Also, as is typical for vibe coding tools, it adds Markdown files all over the place, although it should be obvious to anyone who has cared to look at the contents of this repository that this is not the way things are documented here.

So, before even starting to review this, I want you to be fully transparent here about which tools were used and what the prompts were, please.

Thank you.

@DavidMStraub
Copy link
Owner

(To be clear, I am not blaming you of any type of misbehaviour, using AI tools is of course fine in general, but a 4000 line PR without any prior discussion is a bit too much at once for me to take.)

@@ -0,0 +1,55 @@
# Response to Testing Question
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this? Did you write that Markdown file? Why are you committing it to this library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you should ignore it or suggest I remove it.

@glamberson
Copy link
Contributor Author

Hi David, LEt's see. I've been working on this for more than 15 years, and everyone who knows me knows I'm a systems and networking guy, not a coder. I make no bones about using AI tools as I can build things I previously had to wait aorund for developers to do or do it myself which I hate. The discussion you'll see is in the bug report I submitted and this very work is responsive to. that I submitted YEARS ago. So there's your discussion.

I'm sorry you have an anti-AI thing, and that's fine. It doesn't match your format or your usual way of doing things? Sorry. That's not very relevant to the work. If there's something you don't like, let me know and I'll be glad to do it. I'm a wonderful writer. I can write a book in aobut ten minutes. However, in the meantime I hope you'll get over your piquancy and pay attention to what is in front of you.

As for the prompts I used, get real. I've been doing this for 40 years or so so I know how to explain what I ant. I was the very brains behind BetterGEDCOM and directly pushed to get anything done in this arena when nothing was for decades. So I"m not doing this like a toddler.

@DavidMStraub
Copy link
Owner

I'm sorry you have an anti-AI thing, and that's fine.

I just wrote above that I don't.

But honestly, I find your tone disrespectful.

I am grateful for any contributions, and especially grateful for contributions of someone with such an impressive experience, but I'm sure you understand (given your experience) that 4000 line PRs are not normal.

Can we agree on approaching this in a way where we are treating each other in a respectful manner including not only tone but also respect for each others time for developing AND reviewing?

If so, we can try starting over with this conversation. Let me know.

@glamberson
Copy link
Contributor Author

That sounds great to me. Yes, it's a lot, but I"m between projects, and I'm in a hurry to get this done on my end before I move to the next thing. If it takes time and interation that's perfectly ok. But I would've done the entire thing myself had I not seen you already started a gedcom7 addon.

Please tell me how you'd like to proceed, and I'll be happy to accommodate.

@DavidMStraub
Copy link
Owner

Added code of conduct as for my other repos, please do hold me accountable to it as well.

Please tell me how you'd like to proceed, and I'll be happy to accommodate.

I propose you give me a couple of days to digest and I will come back with a (better informed) suggestion.

@glamberson
Copy link
Contributor Author

Absolutely. Take your time. Please don't take any further submissions by me in the future as being of urgency or too much. It's just that when I have time to focus on something I have to get done what I can. Thanks.

@glamberson
Copy link
Contributor Author

Hi @DavidMStraub,

Thank you for your patience with this PR. I've analyzed the test failures and found 10 type errors that need fixing:

  1. occurrence.py:90 - Missing handle_place import from event module
  2. process_enhanced.py:84 - ImportSettings needs registered_extensions field
  3. process_enhanced.py:195 - Return type mismatch with handle_source_with_template
  4. note_enhanced.py:127 - Missing type annotation for tag_stack
  5. note_enhanced.py:142 - Tuple unpacking error (expecting 3 values, getting 2)
  6. note_enhanced.py:192 - Wrong tuple structure appended to list
  7. individual_enhanced.py:30 - Missing handle_fam_link function
  8. individual_enhanced.py:68 - handle_name doesn't accept settings parameter
  9. individual_enhanced.py:90 - Missing EventRoleType import
  10. individual_enhanced.py:69 - Handling tuple instead of expected object

Given the size of this PR (3,978 lines), I'm wondering if it would be easier to review if I split it into smaller, focused PRs? I could break it down into 5 parts:

  1. Core extension infrastructure (process_enhanced.py, importer.py changes, gedcom7_patch.py)
  2. Evidence extension (evidence.py + tests)
  3. Occurrences extension (occurrence.py + tests)
  4. Enhanced processors & citation templates (individual_enhanced.py, note_enhanced.py, citation_templates.py)
  5. Documentation (GEDCOM7_EXTENSIONS_SUPPORT.md and remaining test files)

Each PR would be properly typed and tested independently. Would you prefer this approach? I'm happy to proceed either way - fixing the current PR or splitting it up for easier review.

Best regards,
Greg

- Import handle_place from event module in occurrence.py
- Add registered_extensions field to ImportSettings dataclass
- Fix return type handling for handle_source_with_template
- Add type annotation for tag_stack in note_enhanced.py
- Fix tuple unpacking to handle 3-element tuples
- Replace handle_fam_link calls with inline logic
- Remove settings parameter from handle_name call
- Add EventRoleType import
- Fix registered_extensions type to be set[str]
@glamberson
Copy link
Contributor Author

I've pushed fixes for all 10 type errors. The changes include:

  • Import fixes (handle_place, EventRoleType)
  • Added registered_extensions field to ImportSettings
  • Fixed tuple handling in several places
  • Replaced missing handle_fam_link with inline logic
  • Corrected function signatures and return types

All type errors should now be resolved. The code passes mypy type checking locally.

Per reviewer feedback, removing file that shouldn't be part of the library
@glamberson
Copy link
Contributor Author

@DavidMStraub You're right - that file shouldn't have been included. I've removed RESPONSE_TO_GIOTODIBONDONE.md from the PR.

The PR is now cleaned up and ready for review. Looking forward to your feedback on the extension implementation itself.

Best regards,
Greg Lamberson
[email protected]

@glamberson
Copy link
Contributor Author

Hello David, I've had some issues come up with one of my extensions (evidence) submitted with this which actually break GEDCOM rules, so I'm going to resubmit it with a slightly different architecture. So this changes this submission significantly.

In considering this, it occurred to me I did this entirely improperly in another way (and yes, I'm trying to set a record for messing up). As extensions to the GEDCOM 7 standard, my extensions are submitted to them for their registry first. The alternative (which I've pointed out to various people multiple times over the years) is to have GRAMPS make their own extensions and maintain them, etc., making them a vendor extension (which is basically sort of automatically accepted). I always hoped someone at GRAMPS would do this, but then any vendor could have done so.

The point is it was improper for me to submit these extensions with this submission as that's not how GEDCOM works, nor should it be how GRAMPS deals with either GEDCOM 7 or any extensions to it.

Thus I'm going to close this PR and review how we structured this. Then after considering all the implications I'll resubmit some more reasonably sized and packaged PRs , if appropriate, to allow you to implement extensions to GEDCOM 7 in the proper way. It'll also save you from seeing markdown files you don't want to see (both the wrong ones and the right ones).

I truly am sorry for taking your time, and I'll be in touch in the next day or so regarding my amended submissions to support your efforts.

At least I hope you note the one bug we tracked down yesterday for you. I'll try to do better next time.

Thank you!

Sincerely,

Greg Lamberson
[email protected]

@glamberson glamberson closed this Jul 29, 2025
@glamberson glamberson deleted the feature/gedcom7-extensions-support branch July 29, 2025 17:51
@DavidMStraub
Copy link
Owner

Thanks for letting me know! I would be happy to discuss and learn about this more in the future, so far extensions is something I have pretty much ignored.

My priority for the next couple of weeks will be to finish the implementation of the standard tags, so at least we have a first usable (or rather testable) version of an import addon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants