Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create endpoint for user data #1416

Closed
f1sh1918 opened this issue Apr 29, 2024 · 5 comments · Fixed by #1528
Closed

Create endpoint for user data #1416

f1sh1918 opened this issue Apr 29, 2024 · 5 comments · Fixed by #1528
Assignees
Milestone

Comments

@f1sh1918
Copy link
Contributor

f1sh1918 commented Apr 29, 2024

Is your feature request related to a problem? Please describe.
Create an http put endpoint that receives user data.
Describe the solution you'd like

  • create new table: userHash, startDate, endDate, invalid, lastUpdated (timestamp)
  • create a endpoint and function that receives data with a csv body
  • check data validity: hash length, valid date format
@seluianova
Copy link
Contributor

Input data example:

userHash, startDate, endDate, valid
dashj21sasd32, 12.05.2024, 12.10.2028, true

@seluianova
Copy link
Contributor

seluianova commented Jun 26, 2024

Questions:

  1. If userHash already exists, we update the entry, if not, we create an entry?
    Answer: yes.

  2. If userHash exists in the database, but not in the csv, do we remove an entry?
    Answer: no. We will clean such entries in the database later, probably using some kind of scheduled job.

  3. Do we want to have 'koblenz' in the name of the table? Like, koblenzusers ?
    Answer: no, we keep the generic names. But then the table must also contain the project_id column.

  4. How we can check userHash validity?
    Hash example: $argon2id$v=19$m=16,t=2,p=1$MTIzNDU2Nzg5QUJD$KStr3PVblyAh2bIleugv796G+p4pvRNiAON0MHVufVY
    What we could check:

  • $argon2id: means which algorithm is used
  • v=19: means which version is used
  • m=16 means which memory costs are used
  • t = 2 means which number of iterations are used
  • p = 1 means which numer of paralellis is used

@seluianova
Copy link
Contributor

seluianova commented Jul 3, 2024

Might be nice to know how much data we expect in the CSV.
The processing time for 1000 lines takes about 2.82 seconds on my machine locally (before warming up).
After warming up it's about 1.35 sec.

UPD: We have a requirement about the supported data volume from our Leistungsbeschreibung for Koblenz:

Das Backend ermöglicht insbesondere den einmaligen Import von ca. 15.000 -
20.000 Datensätzen bzw. deren Hash-Werten. Das Backend muss auch den
fortlaufenden Import (wöchentlich) von 15.000 - 20.000 Datensätzen ermögli-
chen.

Re-tested for 20000 entries:
29.13 sec before warming up
17.78 sec after warming up

We might want to think about the performance optimization then?

@michael-markl
Copy link
Member

Regarding hash validity checks: We could also require that the first entry is a dummy entry with a hash deduced from dummy data. I think for other hashes we can only check their length. (as I've written in #1499 I think we should not add these parameters to the hashes).

@michael-markl
Copy link
Member

Re-tested for 20000 entries:
29.13 sec before warming up
17.78 sec after warming up

We might want to think about the performance optimization then?

I don't think 30s would be a problem. It should only run once a week and we may require that it only runs at night for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants