can't add personal data db/collection to auth.json #1684

rxng · 2024-06-13T07:55:51Z

According to the instructions, we can add a make_db.py database to auth.json , but does not specify exactly how to do this.

To make a new one for the user, fill `user_path_jon` with documents (can be soft or hard linked to avoid dups across multiple users), do:
```bash
python src/make_db.py --user_path=gptdocsdb/jon--collection_name=JonData --langchain_type=personal --hf_embedding_model=hkunlp/instructor-large --persist_directory=users/jon/db_dir_JonData

Then you'll have:

(h2ogpt) jon@pseudotensor:~/h2ogpt$ ls -alrt users/jon/db_dir_JonData/
total 264
drwx------ 13 jon jon   4096 Apr 16 12:28 ../
drwx------  2 jon jon   4096 Apr 16 12:28 d7ccacb6-93fe-4380-9340-b7f5edffb655/
-rw-------  1 jon jon 249856 Apr 16 12:28 chroma.sqlite3
-rw-------  1 jon jon     41 Apr 16 12:28 embed_info
drwx------  3 jon jon   4096 Apr 16 12:28 ./

You can add that database to the auth.json for their entry if using auth.json type file, and they will see when they login.


h2ogpt is being run like so and everything works well except it does not load the correct collection for the user 
`python generate.py --base_model=mistral-7b-instruct-v0.2.Q8_0.gguf --score_model=None --prompt_type=instruct --auth_access=closed --auth=auth.json --guest_name='' --auth_freeze`

I have tried the following by adding db parameters but it does not work.

{
"jon": {
"password": "jon1306",
"userid": "acb8fef1a77d122b5e12b261202ada7a",
"selection_docs_state": {
"langchain_modes": [
"JonData",
"LLM",
"Disabled"
],
"langchain_mode_types": {
"JonData": "personal"
}
},
"dbs": "users/jon/db_dir_JonData",
"load_db_if_exists": "users/jon/db_dir_JonData"
}
}


How do we make it such that when user logs in, their  collection JonData is automatically added? 
Or, Any way to simply specify a per user user_path? that would be easiest.

The text was updated successfully, but these errors were encountered:

pseudotensor · 2024-06-13T17:10:28Z

If you are trying this for shared collection, did you try the CLI options?

https://github.com/h2oai/h2ogpt/blob/main/docs/README_LangChain.md#multiple-embeddings-and-sources

i.e.

python generate.py --model_lock="[{'base_model': 'llama', 'model_path_llama': 'Phi-3-mini-4k-instruct-q4.gguf', 'tokenizer_base_model': 'microsoft/Phi-3-mini-4k-instruct'}]" --use_auth_token=$HUGGING_FACE_HUB_TOKEN --langchain_modes="['UserData', 'MyData', 'UserData2']"

Would show all users those 2 by default.

Even if a user logs in that already had a db entry, they will be forced to see those CLI ones.

If the system is online, without restarting, there's currently no way to add to all users at once with e.g. some kind of global user added settings. Is that what you are trying to achieve?

pseudotensor · 2024-06-13T17:17:15Z

For personal collections, there's no CLI options for that, it's only in the db/json file. By default sqlite3 db is used in newer h2oGPT to address speed issues with json, so one would have to edit the db using operations like in the src/db_utils.py.

I'll think about how to handle this better, probably adding an option to add things via the admin page is best. Would that work for you?

rxng · 2024-06-14T04:44:09Z

thanks for your quick response! Maybe I was confusing in my explanation. I was trying to achieve having a user logging in and then their own collection would be automatically loaded for them.

However, I tried every single parameter and just found a way to do it via the auth.json file, by adding the line
"langchain_mode": "JonData", above the selection_docs_state entry, like so

"langchain_mode": "JonData",
    "selection_docs_state": {

The only question I have is, if we wanted to then add more documents to the collection via make_db.py , would we then have to restart the entire instance of h2ogpt to automatically use the updated collection?

It would definitely be great if there was an admin page where these things could easily be managed :)

pseudotensor · 2024-06-28T03:07:41Z

rxng · 2024-06-28T03:50:17Z

that's so amazing @pseudotensor !!

pseudotensor · 2024-06-28T04:10:53Z

Note that if you have an auth file that is .json, just pass to CLI that it is now .db and we'll migrate it to .db format that is required for this control

h2ogpt/src/db_utils.py

Lines 80 to 101 in 3498b03

    
           # Connect to an SQLite database (change the database path as necessary) 
        
           if auth_filename.endswith('.json'): 
        
               json_filename = auth_filename 
        
               db_filename = auth_filename[:-4] + '.db' 
        
           else: 
        
               assert auth_filename.endswith('.db') 
        
               db_filename = auth_filename 
        
               json_filename = auth_filename[:-3] + '.json' 
        
           if os.path.isfile(db_filename) and os.path.getsize(db_filename) == 0: 
        
               os.remove(db_filename) 
        
           if os.path.isfile(json_filename) and os.path.getsize(json_filename) == 0: 
        
               os.remove(json_filename) 
        
           if os.path.isfile(json_filename) and not os.path.isfile(db_filename): 
        
               # then make, one-time migration 
        
               with open(json_filename, 'rt') as f: 
        
                   auth_dict = json.load(f) 
        
               create_table(db_filename) 
        
               upsert_auth_dict(db_filename, auth_dict, verbose=verbose) 
        
               # Slow way: 
        
               # [upsert_user(db_filename, username1, auth_dict[username1]) for username1 in auth_dict]

pseudotensor closed this as completed in 3498b03 Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can't add personal data db/collection to auth.json #1684

can't add personal data db/collection to auth.json #1684

rxng commented Jun 13, 2024

pseudotensor commented Jun 13, 2024 •

edited

Loading

pseudotensor commented Jun 13, 2024

rxng commented Jun 14, 2024

pseudotensor commented Jun 28, 2024

rxng commented Jun 28, 2024

pseudotensor commented Jun 28, 2024

can't add personal data db/collection to auth.json #1684

can't add personal data db/collection to auth.json #1684

Comments

rxng commented Jun 13, 2024

pseudotensor commented Jun 13, 2024 • edited Loading

pseudotensor commented Jun 13, 2024

rxng commented Jun 14, 2024

pseudotensor commented Jun 28, 2024

rxng commented Jun 28, 2024

pseudotensor commented Jun 28, 2024

pseudotensor commented Jun 13, 2024 •

edited

Loading