You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After reading the source code of DocumentSummaryIndex to better understand how it works under the hood, I find that summaries are created for each chunk (../../response_synthesizers/tree_summarize.py), and then, recursion is applied to produce a single summary per Document:
By default, when ingesting Documents via SimpleDirectoryReader, a Document list is created where by default each document is 1024 tokens. So assume one applies sentence splitting where each sentence is size 50 tokens, that would lead to 20 chunks. When summarizing, an LLM is called 20 times, and then the chunks are combined recursively, for at best log(20) calls. So altogether, perhaps 30 calls to the LLM per Document. That is very expensive. If a single summary is required for the collection of nodes, why not simply do either: 1) concatenate the chunk contents into a single string, fed with a single call to an LLM), or 2) simply summarize the original document before chunking, and associate it with each node produced. That would be much much faster and cost must less. If DocumentSummaryIndex is the useful tool many bloggers say it is, why is it so inefficient? Perhaps I completely misunderstand how it works.
I just noticed a code segment I missed:
# repack text_chunks so that each chunk fills the context window
text_chunks = self._prompt_helper.repack(
summary_template, text_chunks=text_chunks
)
This code implies that the context of the LLM used for summarization should be long enough or else the number of times the LLM is called would be too high. That means that the context length used in retrieval should probably be sufficiently high, contrary to the context length used for text generation and RAG, which can probably often be much smaller, thus saving on computational time and cost.
I have not found where these issues are discussed in the documentation.
Question: Is there a document somewhere that explains the low-level architecture of LlamaIndex? This framework has grown so large that this would be very beneficial to the LlamaIndex community. Overall the documentation is great, though.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
After reading the source code of DocumentSummaryIndex to better understand how it works under the hood, I find that summaries are created for each chunk (../../response_synthesizers/tree_summarize.py), and then, recursion is applied to produce a single summary per Document:
By default, when ingesting Documents via
SimpleDirectoryReader
, a Document list is created where by default each document is 1024 tokens. So assume one applies sentence splitting where each sentence is size 50 tokens, that would lead to 20 chunks. When summarizing, an LLM is called 20 times, and then the chunks are combined recursively, for at best log(20) calls. So altogether, perhaps 30 calls to the LLM per Document. That is very expensive. If a single summary is required for the collection of nodes, why not simply do either: 1) concatenate the chunk contents into a single string, fed with a single call to an LLM), or 2) simply summarize the original document before chunking, and associate it with each node produced. That would be much much faster and cost must less. If DocumentSummaryIndex is the useful tool many bloggers say it is, why is it so inefficient? Perhaps I completely misunderstand how it works.I just noticed a code segment I missed:
This code implies that the context of the LLM used for summarization should be long enough or else the number of times the LLM is called would be too high. That means that the context length used in retrieval should probably be sufficiently high, contrary to the context length used for text generation and RAG, which can probably often be much smaller, thus saving on computational time and cost.
I have not found where these issues are discussed in the documentation.
Question: Is there a document somewhere that explains the low-level architecture of LlamaIndex? This framework has grown so large that this would be very beneficial to the LlamaIndex community. Overall the documentation is great, though.
Thanks for any feedback.
Beta Was this translation helpful? Give feedback.
All reactions