Skip to content

Split checksum file into chunks#961

Closed
TamaraNaboulsi wants to merge 1 commit intoEnsembl:release/114from
TamaraNaboulsi:xref/checksum_fix
Closed

Split checksum file into chunks#961
TamaraNaboulsi wants to merge 1 commit intoEnsembl:release/114from
TamaraNaboulsi:xref/checksum_fix

Conversation

@TamaraNaboulsi
Copy link
Copy Markdown
Member

Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion

Requirements

  • Filling out the template is required.
  • Review the contributing guidelines for this repository; remember in particular:
    • do not modify code without testing for regression
    • provide simple unit tests to test the changes
    • if you change the schema you must patch the test databases as well, see Updating the schema
    • the PR must not fail unit testing

Description

Splitting the large checksum file into smaller ones.

Use case

Timeout errors are popping up in the Checksum step of the pipeline because of the 'LOAD DATA INFILE' command being run on a very large file. This fix consists of splitting the big file into multiple smaller ones and running the command on each. At the end, the code combines these smaller files into 1 to revert back to the previous state of things at the end of running.
This change is also accompanied by another in the DB model (ensemb-py) to set the engine for the checksum_xref table to MyISAM as this decreases the probability of getting the error.

Benefits

Probability of errors decreases.

Possible Drawbacks

If applicable, describe any possible undesirable consequence of the changes.

Testing

  • Have you added/modified unit tests to test the changes?
  • If so, do the tests pass?
  • Have you run the entire test suite and no regression was detected?
  • TravisCI passed on your branch

Dependencies

If applicable, define what code dependencies were added and/or updated.

@TamaraNaboulsi TamaraNaboulsi marked this pull request as ready for review October 1, 2024 15:35
@TamaraNaboulsi TamaraNaboulsi deleted the xref/checksum_fix branch July 7, 2025 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant