Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing is currently not user-friendly #479

Open
13 of 14 tasks
northdpole opened this issue Jan 9, 2024 · 0 comments · May be fixed by #482
Open
13 of 14 tasks

Importing is currently not user-friendly #479

northdpole opened this issue Jan 9, 2024 · 0 comments · May be fixed by #482
Assignees
Labels
enhancement New feature or request P2 Medium Priority python Pull requests that update Python code
Milestone

Comments

@northdpole
Copy link
Collaborator

northdpole commented Jan 9, 2024

Issue

What is the issue?

If we need any data changes it can only be done by the developers, is complicated and it takes a very long amount of time.

Expected Behaviour

There should be either a UX page or a command/script that allows for full or partial data import and calculation.
there should be sensible progress reporting and the ability to easily target a database.
there should also be a list of which standards and other resources have been imported, had embeddings calculated and which paths had gap analysis calculated.

Things to do:

  • Make importing progressive. First import the CRE structure, then import each standard.
  • Report a list of standards that are imported from spreadsheets
  • Parallelise importing (if we first import cre, then we can import every standard in parallel and we can import external projects that do not have dependencies on existing standard in parallel)
  • Calculate Gap Analysis Progressively for every new standard.
  • Prioritise Gap Analysis for largest standards first
  • Update neo4j DB on every stage
  • Embeddings calculation in parallel for each standard or in the background
  • Allow reimporting of resources, without structural changes (use case: resource name or hyperlink changed, implementation: Update db table for specific resource, regenerate embeddings for specific resource sections/subsection where updates were needed)
  • Allow reimporting of resource, with structural changes (use case: data quality improvement or resource version change, implementaton: Remove resource links, remove resource gap analysis, re-link, recalculate gap analysis)
  • Add tests for Prompt Client with mocks

If we change importing to be per resource, then we can also report progress since we know how many resources we support so we can figure out how many processors are at which state (importing, linking, embedding calculation, gap analysis calculation)

Implementation Considerations:

  • Since this is a big feature, create a design document
  • Use the workers architecture as much as possible
  • Make as many of the new features into API calls for a possible future frontent
  • Write tests
@northdpole northdpole self-assigned this Jan 9, 2024
@northdpole northdpole added enhancement New feature or request P2 Medium Priority python Pull requests that update Python code labels Jan 9, 2024
@northdpole northdpole added this to the CRE v3 milestone Jan 9, 2024
@northdpole northdpole linked a pull request Jan 14, 2024 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request P2 Medium Priority python Pull requests that update Python code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant