speed up embeddings #11

edoyango · 2024-08-26T23:03:12Z

The embedding process is much slower than it could be. Breadcrumbs:

GPU compute utilisation is around 30%
- Other RAG apps like anything llm max out the GPU when embedding.
embedding is done through the langchain chromadb api. Might not have been implemented efficiently.

edoyango · 2024-09-03T03:33:03Z

Other DB tech to look at:

postgres + pgvector https://github.com/pgvector/pgvector
lancedb, pinecone, faiss https://python.langchain.com/v0.1/docs/modules/data_connection/vectorstores/
qdrant https://github.com/qdrant/qdrant-client

edoyango · 2024-09-05T03:11:02Z

Can confirm that the embeddings are the bottleneck, maybe not the db (yet). Looks like the most time is spent:

embedding documents
- Can be partially addressed by alleviated by using the ollama python api directly.
- ...
processing/tokenizing documents
- Not sure how to speed this up directly, but can be parallelized
it also looks like it's parsing files that have already been parsed.
- Perhaps some hashing and checking with the database is required.

See below cProfile stats viewed in pstats.

Thu Sep  5 12:18:57 2024    popdb_stats2

         1675229476 function calls (1649723988 primitive calls) in 3366.444 seconds

   Ordered by: cumulative time
   List reduced from 10062 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       15    0.044    0.003 1625.424  108.362 /usr/local/lib/python3.12/site-packages/langchain_community/embeddings/ollama.py:199(embed_documents)
       15    1.672    0.111 1625.380  108.359 /usr/local/lib/python3.12/site-packages/langchain_community/embeddings/ollama.py:183(_embed)
    70634    0.604    0.000 1614.783    0.023 /usr/local/lib/python3.12/site-packages/langchain_community/embeddings/ollama.py:147(_process_emb_response)
    70634    0.731    0.000 1558.543    0.022 /usr/local/lib/python3.12/site-packages/requests/api.py:103(post)
    70635    0.682    0.000 1558.191    0.022 /usr/local/lib/python3.12/site-packages/requests/api.py:14(request)
70637/70599    0.947    0.000 1548.910    0.022 /usr/local/lib/python3.12/site-packages/requests/sessions.py:500(request)
70638/70599    1.755    0.000 1475.243    0.021 /usr/local/lib/python3.12/site-packages/requests/sessions.py:673(send)
70638/70600    1.165    0.000 1455.706    0.021 /usr/local/lib/python3.12/site-packages/requests/adapters.py:613(send)
70638/70600    1.201    0.000 1423.626    0.020 /usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py:594(urlopen)
70638/70600    0.955    0.000 1411.269    0.020 /usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py:379(_make_request)
        1    0.011    0.011 1339.256 1339.256 populate_database.py:38(load_documents)
70638/70613    1.586    0.000 1329.856    0.019 /usr/local/lib/python3.12/site-packages/urllib3/connection.py:438(getresponse)
70638/70613    0.449    0.000 1320.127    0.019 /usr/local/lib/python3.12/http/client.py:1384(getresponse)
70638/70613    0.990    0.000 1291.867    0.018 /usr/local/lib/python3.12/http/client.py:324(begin)
218021/217947    2.289    0.000 1278.355    0.006 /usr/local/lib/python3.12/socket.py:694(readinto)
   218012 1275.932    0.006 1275.932    0.006 {method 'recv_into' of '_socket.socket' objects}
70638/70613    0.977    0.000 1275.525    0.018 /usr/local/lib/python3.12/http/client.py:291(_read_status)
565121/564917    0.895    0.000 1274.644    0.002 {method 'readline' of '_io.BufferedReader' objects}
      159    0.011    0.000 1195.850    7.521 /usr/local/lib/python3.12/site-packages/langchain_core/document_loaders/base.py:28(load)
      292    0.008    0.000 1136.305    3.891 /usr/local/lib/python3.12/site-packages/langchain_community/document_loaders/unstructured.py:105(lazy_load)
      146    0.002    0.000 1136.277    7.783 /usr/local/lib/python3.12/site-packages/langchain_community/document_loaders/html.py:30(_get_elements)
      146    0.002    0.000 1133.111    7.761 /usr/local/lib/python3.12/site-packages/unstructured/documents/elements.py:603(wrapper)
      146    0.011    0.000 1132.932    7.760 /usr/local/lib/python3.12/site-packages/unstructured/file_utils/filetype.py:704(wrapper)
      146    0.015    0.000 1132.677    7.758 /usr/local/lib/python3.12/site-packages/unstructured/file_utils/filetype.py:660(wrapper)
      146    0.004    0.000 1132.272    7.755 /usr/local/lib/python3.12/site-packages/unstructured/chunking/dispatch.py:69(wrapper)
      146    0.004    0.000 1132.223    7.755 /usr/local/lib/python3.12/site-packages/unstructured/partition/html/partition.py:23(partition_html)
     5679    0.010    0.000 1132.219    0.199 /usr/local/lib/python3.12/site-packages/unstructured/partition/lang.py:449(apply_lang_metadata)
     5679    0.006    0.000 1126.665    0.198 /usr/local/lib/python3.12/site-packages/unstructured/partition/html/partition.py:219(iter_elements)
     5679    0.032    0.000 1126.660    0.198 /usr/local/lib/python3.12/site-packages/unstructured/partition/html/partition.py:224(_iter_elements)
21893/5679    0.156    0.000 1122.432    0.198 /usr/local/lib/python3.12/site-packages/unstructured/partition/html/parser.py:350(iter_elements)

edoyango · 2024-09-09T03:29:03Z

Dunno how to enable parallel processing of embeddings using a single Ollama server. Could further investigate using huggingface embedding generation instead.

I've investigated (in combination with multiprocessing):

OLLAMA_NUM_PARALLEL doesn't change anything.
OLLAMA_MAX_LOADED_MODELS doesn't change anything.
I surmise that models aren't loaded more than once, so I can't have another copy of the model producing embeddings in parallel.

edoyango mentioned this issue Sep 5, 2024

Use checksums when updating database #1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speed up embeddings #11

speed up embeddings #11

edoyango commented Aug 26, 2024

edoyango commented Sep 3, 2024

edoyango commented Sep 5, 2024

edoyango commented Sep 9, 2024 •

edited

Loading

speed up embeddings #11

speed up embeddings #11

Comments

edoyango commented Aug 26, 2024

edoyango commented Sep 3, 2024

edoyango commented Sep 5, 2024

edoyango commented Sep 9, 2024 • edited Loading

edoyango commented Sep 9, 2024 •

edited

Loading