Efficiency of DocumentSummaryIndex #14817

erlebach · 2024-07-18T15:27:17Z

erlebach
Jul 18, 2024

After reading the source code of DocumentSummaryIndex to better understand how it works under the hood, I find that summaries are created for each chunk (../../response_synthesizers/tree_summarize.py), and then, recursion is applied to produce a single summary per Document:

            # recursively summarize the summaries
            return await self.aget_response(
                query_str=query_str,
                text_chunks=summaries,
                **response_kwargs,
            )

By default, when ingesting Documents via SimpleDirectoryReader, a Document list is created where by default each document is 1024 tokens. So assume one applies sentence splitting where each sentence is size 50 tokens, that would lead to 20 chunks. When summarizing, an LLM is called 20 times, and then the chunks are combined recursively, for at best log(20) calls. So altogether, perhaps 30 calls to the LLM per Document. That is very expensive. If a single summary is required for the collection of nodes, why not simply do either: 1) concatenate the chunk contents into a single string, fed with a single call to an LLM), or 2) simply summarize the original document before chunking, and associate it with each node produced. That would be much much faster and cost must less. If DocumentSummaryIndex is the useful tool many bloggers say it is, why is it so inefficient? Perhaps I completely misunderstand how it works.

I just noticed a code segment I missed:

        # repack text_chunks so that each chunk fills the context window
        text_chunks = self._prompt_helper.repack(
            summary_template, text_chunks=text_chunks
        )

This code implies that the context of the LLM used for summarization should be long enough or else the number of times the LLM is called would be too high. That means that the context length used in retrieval should probably be sufficiently high, contrary to the context length used for text generation and RAG, which can probably often be much smaller, thus saving on computational time and cost.

I have not found where these issues are discussed in the documentation.

Question: Is there a document somewhere that explains the low-level architecture of LlamaIndex? This framework has grown so large that this would be very beneficial to the LlamaIndex community. Overall the documentation is great, though.

Thanks for any feedback.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficiency of DocumentSummaryIndex #14817

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Efficiency of DocumentSummaryIndex #14817

erlebach Jul 18, 2024

Replies: 0 comments

erlebach
Jul 18, 2024