Write higher level acceptance tests #734

lionel-nj · 2021-02-09T15:40:50Z

At present, most implemented tests are unit test. These make sure that the individual units / components of are tested.

Higher level test should be written to verify the execution of multiple modules.

lionel-nj · 2021-02-09T15:43:16Z

After discussion with @barbeau we think that executing the validator on sample GTFS datasets, and analyzing the output of the generated report (via parsing) could be a way to write such tests.
We could start by elaborating GTFS datasets that will lead to the generation of different system errors, and extend the same process to validation errors.

Any thoughts/advise? @aababilov?

derhuerst · 2021-02-09T16:53:10Z

From the related Issue public-transport/ideas#17, which in turn quotes georust/transitfeed#5 (comment):

reusability

Answering quite basic (but very relevant in practical usage of GTFS) questions like When does any vehicle depart at a bus stop? is surprisingly much work: GTFS Time values are inherently timezone-dependent, frequencies.txt with exact_times=1 defines "stop times" as well, etc. With the ever-growing number of optional parts and extensions, doing GTFS processing right is a lot of work, so we should make the implementation in this project as reusable/flexible as possible.
Also, a project- and language-independent test suite, i.e. a set of fixtures per "question"/operation, would be very helpful for this. Those have been very successful in other areas, e.g. for WebSocket implementations or for HTML5 parsers.

validation

There are a bazillion validation (i.e. "semantic checks on the actual data") cases; The best practices page is long, and the GTFS issue tracker and mailing lists are full of edge cases. There are at east 20 libs across languages doing some form of validation, but none of them cover all the issues that we see with GTFS feeds in the wild.
I'd dare to say that people don't care which language a GTFS validator is written in, but they strongly prefer a certain language for "questions"/analysis. Like the "questions"/analysis mentioned above, validation lends itself to a project- and language-independent set of fixtures, maintained by the wider GTFS community. I hope this will push the overall quality of GTFS feeds, and reduce the amount of duplicated work poured into all those GTFS validation libs. I therefore propose not to put too much effort into validation in this project (I'm obviously just a random stranger telling what to do 😬).

barbeau · 2021-02-09T19:42:07Z

I see this being useful in the context of issues like #712 (comment) as well, where we want to ensure that we aren't generating errors on valid datasets (or at least know when we are, if the spec needs to be updated to reflect practice).

nackko · 2021-02-11T16:04:36Z

GitHub workflows could be leveraged for those integration tests. Maybe a shortcut could consist in a binary diff? That could work if the validator is deterministic.

barbeau · 2021-02-11T16:53:55Z

GitHub workflows could be leveraged for those integration tests.

I agree - although it's not clear to me the mechanism by which the results should be evaluated yet. Is it a separate module that executes and parses and checks the JSON output? Or does it directly instantiate the validator and then check the data structure of notices without outputting JSON?

Parsing the JSON output would certainly be the most "end-to-end" as it would confirm JSON output also isn't broken.

Maybe a shortcut could consist in a binary diff?

If you mean binary check the JSON output (or data structure, if it's somehow done internally), yes, I think that could work as a preliminary shortcut after initial check is run. Basically if we already checked the JSON output previously and confirmed it was correct, and the new JSON output hash matches the old, then nothing changed and it's fine. Although it's likely that we could add INFO notices at some point that could change the file hash, which would then require parsing the contents and checking number of errors, warnings, etc. And we'd need to figure out where to store the "last valid hash" for each feed output.

nackko · 2021-02-11T17:03:46Z

I agree - although it's not clear to me the mechanism by which the results should be evaluated yet. Is it a separate module that executes and parses and checks the JSON output? Or does it directly instantiate the validator and then check the data structure of notices without outputting JSON?

IMHO those being integration tests, it would make sense they would go all the way and check the JSON output itself. In terms of GH workflows, it could either be a new step in end to end or another workflow that grabs the artifacts previously outputted.

If you mean binary check the JSON output (or data structure, if it's somehow done internally)

In my mind that was the JSON file itself, diffed with a previous run output that is deemed valid for the input data.
I didn't think of hashing but it's an obvious thing to do. Being multithreaded, I'm not sure the code is deterministic though (@aababilov ?). You are right regarding the info notices, though would that happen frequently?
Thinking about it, maybe they should be trimmed as their content can be dataset independent (like a timestamp or execution time output).

The reference reports will have to be somewhere, or their hash, but that could also potentially be handled as workflow artifacts, I think?

barbeau · 2021-02-11T17:15:38Z

The reference reports will have to be somewhere, or their hash, but that could also potentially be handled as workflow artifacts, I think?

I think so - although thinking more these JSON files should be pretty small and should be fast to parse, so I'm not sure if the complexity of the hash solution would be worth the performance gain. But we can see.

barbeau · 2021-02-11T23:35:12Z

As discussed in #742, currently when feeds fail to parse the validator still finishes with Java exit code 0 and very little output. #742 added text to the system output to flag this condition for humans observing output.

We should also handle this case for these types of tests - we can parse the JSON output and then intentionally fail the CI build to flag failures to parse so we're automatically notified.

nackko · 2021-03-03T22:55:40Z

Being multithreaded, I'm not sure the code is deterministic though (@aababilov ?).

Got my answer. It is NOT deterministic so I guess an automated process could reason on the number of each type of Notices. Is there any specially crafted verybaddataset/verygooddataset out there that would be a good first case for such CI/CD process?

barbeau · 2021-03-03T23:02:20Z

It is NOT deterministic so I guess an automated process could reason on the number of each type of Notices.

Correct, order of notices isn't deterministic, but number of each type of notice is.

lionel-nj · 2021-03-31T16:11:22Z

Per discussion with @barbeau and @carlfredl this process will be automated. We consider implementing a new module for testing the validator against various datasets (ultimately all the datasets available via Mobility Archives) to make sure that the implementation of a new rule does not make a large portion of existing datasets wrongfully invalid.

Before merging to the master branch, each snapshot(intermediate) version of the validator that includes modifications to existing rules and/or new rule implementation should be executed on the variety of selected GTFS schedule datasets.

We consider using fix versions of datasets in this process in order to obtain consistent results. After parsing the validation report, a dataset will be considered invalid if said report contains more than a certain number of errors, and the percentage of new invalid datasets will be used to determine whether a new implementation is acceptable or not (numbers yet to be determined: that is subject where we would be glad to have community feedback @aababilov 😉).

Our vision here is coming from what we've heard from consumers: they need to trust that changes to the open source project will be incremental. e.g. a change which makes half of the GTFS datasets already in use around the world invalid overnight is too extreme to be useable.

Next steps
Begin implementation using a small number of datasets, get community feedback, and then extend the process to test the validator on a broader variety of GTFS schedule datasets leveraging the Mobility Archives project (@maximearmstrong ).

isabelle-dr · 2021-09-03T15:46:48Z

An update on this issue:
MobilityData is working on it in PR #933 and PR #848

The principle is the following: run two versions of the validator (the latest release and the snapshot version) on each GTFS datasets that are in the MobilityArchives and compare the number of errors per validation report (for each dataset). Finally, an addition or a change of code will be flagged if it makes more than 1% of datasets invalid.
Such tests rely on Kubernetes and Github Actions and are triggered on each push in a given pull request.
See the following diagram:
https://user-images.githubusercontent.com/35747326/130859517-56490d7a-de7d-4d91-b73a-f7bf209337aa.png

Needed:

Validator linked to the MobiltyDatabase (running on all datasets present on the MobilityDatabase)
Add a step before committing/merging a PR = ensure no disruption in the datasets when a new rule is added.

lionel-nj added the community label Feb 9, 2021

lionel-nj added this to the v2.0 milestone Feb 9, 2021

derhuerst mentioned this issue Feb 9, 2021

implement validators? public-transport/gtfs-utils#15

Open

lionel-nj mentioned this issue Feb 9, 2021

feat: non ascii character in ID #712

Merged

3 tasks

barbeau mentioned this issue Feb 11, 2021

ci: Add more feeds to end-to-end GitHub workflow #736

Merged

2 tasks

barbeau mentioned this issue Feb 11, 2021

feat: Show a big error when parsing failed #742

Merged

barbeau modified the milestones: v2.0, v2.1 Feb 15, 2021

lionel-nj self-assigned this Feb 18, 2021

lionel-nj mentioned this issue Feb 19, 2021

Abort on first error #760

Closed

barbeau mentioned this issue Feb 23, 2021

CI - SEVERE: Cannot load GTFS feed - error in opening zip file #768

Closed

barbeau changed the title ~~Write higher level tests~~ Write higher level integration tests Mar 1, 2021

barbeau mentioned this issue Mar 1, 2021

Additional unit tests for processor module #569

Closed

carlfredl changed the title ~~Write higher level integration tests~~ Write higher level integration / acceptance tests Apr 7, 2021

lionel-nj mentioned this issue Apr 9, 2021

feat: acceptance tests #848

Merged

4 tasks

lionel-nj changed the title ~~Write higher level integration / acceptance tests~~ Write higher level acceptance tests Jun 11, 2021

maximearmstrong linked a pull request Jul 13, 2021 that will close this issue

feat: Adds Python script and requirements to harvest the latest versions (acceptance tests) #926

Merged

4 tasks

lionel-nj closed this as completed in #926 Jul 15, 2021

lionel-nj mentioned this issue Jul 19, 2021

chore: add spring boot layer for deployment on k8s #928

Closed

10 tasks

lionel-nj reopened this Jul 19, 2021

lionel-nj linked a pull request Jul 19, 2021 that will close this issue

chore: add spring boot layer for deployment on k8s #928

Closed

10 tasks

lionel-nj mentioned this issue Jul 22, 2021

feat: add springboot module for future deployment on kubernetes #933

Closed

4 tasks

lionel-nj linked a pull request Jul 22, 2021 that will close this issue

feat: add springboot module for future deployment on kubernetes #933

Closed

4 tasks

This was referenced Aug 17, 2021

refactor: add springboot module #962

Closed

chore: upgrade to gradle wrapper 6.8 #963

Closed

ci: verify gradle wrapper #964

Merged

chore: upgrade to gradle wrapper to 7.2 #965

Merged

isabelle-dr modified the milestones: v2.1, v3.0.0 Aug 26, 2021

lionel-nj mentioned this issue Aug 31, 2021

refactor: extract methods from main cli #993

Merged

3 tasks

lionel-nj linked a pull request Sep 8, 2021 that will close this issue

ci: WIP - automate acceptance tests #1002

Closed

4 tasks

This was linked to pull requests Oct 18, 2021

chore: get urls of the database as a list of 256 objets max #1037

Merged

ci: implement queuing mechanism for acceptance tests #1038

Merged

lionel-nj closed this as completed in #1037 Oct 20, 2021

lionel-nj reopened this Oct 20, 2021

lionel-nj closed this as completed in #1038 Oct 26, 2021

isabelle-dr mentioned this issue Oct 28, 2021

Automated tests to see if a PR will trigger new errors in datasets #1045

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write higher level acceptance tests #734

Write higher level acceptance tests #734

lionel-nj commented Feb 9, 2021 •

edited

Loading

lionel-nj commented Feb 9, 2021 •

edited

Loading

derhuerst commented Feb 9, 2021

reusability

validation

barbeau commented Feb 9, 2021

nackko commented Feb 11, 2021

barbeau commented Feb 11, 2021

nackko commented Feb 11, 2021 •

edited

Loading

barbeau commented Feb 11, 2021

barbeau commented Feb 11, 2021

nackko commented Mar 3, 2021

barbeau commented Mar 3, 2021

lionel-nj commented Mar 31, 2021

isabelle-dr commented Sep 3, 2021

Write higher level acceptance tests #734

Write higher level acceptance tests #734

Comments

lionel-nj commented Feb 9, 2021 • edited Loading

lionel-nj commented Feb 9, 2021 • edited Loading

derhuerst commented Feb 9, 2021

reusability

validation

barbeau commented Feb 9, 2021

nackko commented Feb 11, 2021

barbeau commented Feb 11, 2021

nackko commented Feb 11, 2021 • edited Loading

barbeau commented Feb 11, 2021

barbeau commented Feb 11, 2021

nackko commented Mar 3, 2021

barbeau commented Mar 3, 2021

lionel-nj commented Mar 31, 2021

isabelle-dr commented Sep 3, 2021

lionel-nj commented Feb 9, 2021 •

edited

Loading

lionel-nj commented Feb 9, 2021 •

edited

Loading

nackko commented Feb 11, 2021 •

edited

Loading