Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up embeddings #11

Open
edoyango opened this issue Aug 26, 2024 · 3 comments
Open

speed up embeddings #11

edoyango opened this issue Aug 26, 2024 · 3 comments

Comments

@edoyango
Copy link
Collaborator

The embedding process is much slower than it could be. Breadcrumbs:

  • GPU compute utilisation is around 30%
    • Other RAG apps like anything llm max out the GPU when embedding.
  • embedding is done through the langchain chromadb api. Might not have been implemented efficiently.
@edoyango
Copy link
Collaborator Author

edoyango commented Sep 3, 2024

@edoyango
Copy link
Collaborator Author

edoyango commented Sep 5, 2024

Can confirm that the embeddings are the bottleneck, maybe not the db (yet). Looks like the most time is spent:

  • embedding documents
    • Can be partially addressed by alleviated by using the ollama python api directly.
    • ...
  • processing/tokenizing documents
    • Not sure how to speed this up directly, but can be parallelized
  • it also looks like it's parsing files that have already been parsed.
    • Perhaps some hashing and checking with the database is required.

See below cProfile stats viewed in pstats.

Thu Sep  5 12:18:57 2024    popdb_stats2

         1675229476 function calls (1649723988 primitive calls) in 3366.444 seconds

   Ordered by: cumulative time
   List reduced from 10062 to 30 due to restriction <30>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       15    0.044    0.003 1625.424  108.362 /usr/local/lib/python3.12/site-packages/langchain_community/embeddings/ollama.py:199(embed_documents)
       15    1.672    0.111 1625.380  108.359 /usr/local/lib/python3.12/site-packages/langchain_community/embeddings/ollama.py:183(_embed)
    70634    0.604    0.000 1614.783    0.023 /usr/local/lib/python3.12/site-packages/langchain_community/embeddings/ollama.py:147(_process_emb_response)
    70634    0.731    0.000 1558.543    0.022 /usr/local/lib/python3.12/site-packages/requests/api.py:103(post)
    70635    0.682    0.000 1558.191    0.022 /usr/local/lib/python3.12/site-packages/requests/api.py:14(request)
70637/70599    0.947    0.000 1548.910    0.022 /usr/local/lib/python3.12/site-packages/requests/sessions.py:500(request)
70638/70599    1.755    0.000 1475.243    0.021 /usr/local/lib/python3.12/site-packages/requests/sessions.py:673(send)
70638/70600    1.165    0.000 1455.706    0.021 /usr/local/lib/python3.12/site-packages/requests/adapters.py:613(send)
70638/70600    1.201    0.000 1423.626    0.020 /usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py:594(urlopen)
70638/70600    0.955    0.000 1411.269    0.020 /usr/local/lib/python3.12/site-packages/urllib3/connectionpool.py:379(_make_request)
        1    0.011    0.011 1339.256 1339.256 populate_database.py:38(load_documents)
70638/70613    1.586    0.000 1329.856    0.019 /usr/local/lib/python3.12/site-packages/urllib3/connection.py:438(getresponse)
70638/70613    0.449    0.000 1320.127    0.019 /usr/local/lib/python3.12/http/client.py:1384(getresponse)
70638/70613    0.990    0.000 1291.867    0.018 /usr/local/lib/python3.12/http/client.py:324(begin)
218021/217947    2.289    0.000 1278.355    0.006 /usr/local/lib/python3.12/socket.py:694(readinto)
   218012 1275.932    0.006 1275.932    0.006 {method 'recv_into' of '_socket.socket' objects}
70638/70613    0.977    0.000 1275.525    0.018 /usr/local/lib/python3.12/http/client.py:291(_read_status)
565121/564917    0.895    0.000 1274.644    0.002 {method 'readline' of '_io.BufferedReader' objects}
      159    0.011    0.000 1195.850    7.521 /usr/local/lib/python3.12/site-packages/langchain_core/document_loaders/base.py:28(load)
      292    0.008    0.000 1136.305    3.891 /usr/local/lib/python3.12/site-packages/langchain_community/document_loaders/unstructured.py:105(lazy_load)
      146    0.002    0.000 1136.277    7.783 /usr/local/lib/python3.12/site-packages/langchain_community/document_loaders/html.py:30(_get_elements)
      146    0.002    0.000 1133.111    7.761 /usr/local/lib/python3.12/site-packages/unstructured/documents/elements.py:603(wrapper)
      146    0.011    0.000 1132.932    7.760 /usr/local/lib/python3.12/site-packages/unstructured/file_utils/filetype.py:704(wrapper)
      146    0.015    0.000 1132.677    7.758 /usr/local/lib/python3.12/site-packages/unstructured/file_utils/filetype.py:660(wrapper)
      146    0.004    0.000 1132.272    7.755 /usr/local/lib/python3.12/site-packages/unstructured/chunking/dispatch.py:69(wrapper)
      146    0.004    0.000 1132.223    7.755 /usr/local/lib/python3.12/site-packages/unstructured/partition/html/partition.py:23(partition_html)
     5679    0.010    0.000 1132.219    0.199 /usr/local/lib/python3.12/site-packages/unstructured/partition/lang.py:449(apply_lang_metadata)
     5679    0.006    0.000 1126.665    0.198 /usr/local/lib/python3.12/site-packages/unstructured/partition/html/partition.py:219(iter_elements)
     5679    0.032    0.000 1126.660    0.198 /usr/local/lib/python3.12/site-packages/unstructured/partition/html/partition.py:224(_iter_elements)
21893/5679    0.156    0.000 1122.432    0.198 /usr/local/lib/python3.12/site-packages/unstructured/partition/html/parser.py:350(iter_elements)

@edoyango
Copy link
Collaborator Author

edoyango commented Sep 9, 2024

Dunno how to enable parallel processing of embeddings using a single Ollama server. Could further investigate using huggingface embedding generation instead.

I've investigated (in combination with multiprocessing):

  • OLLAMA_NUM_PARALLEL doesn't change anything.
  • OLLAMA_MAX_LOADED_MODELS doesn't change anything.
    I surmise that models aren't loaded more than once, so I can't have another copy of the model producing embeddings in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@edoyango and others