Code and data for performing GraphRAG with an example SEC Edgar Knowledge Graph. The knowledge graph contains information about companies, their financial reports, and the relationships between them.
From the Edgar database, two forms are used to construct a knowledge graph that mixes structured and unstructured data. The two forms are:
- Form 10-K: the annual report that publicly traded companies must file with the SEC. It provides a comprehensive summary of a company's financial performance.
- Form 13: filed by institutional managers who manage $100 million or more in assets.
Form 13 is used as structured data about the investments made by institutional managers. The form contains information about the companies in which the manager has invested, the number of shares owned, and the value of the investment.
(:Manager)-[:OWNS_STOCK_IN]->(:Company)
Form 10-K is used as a source of unstructured data about the company's financial performance. The form contains sections such as "Risk Factors", "Management's Discussion and Analysis of Financial Condition and Results of Operations", and "Financial Statements and Supplementary Data".
(:Company)-[:FILED]->(:Form)
The form is divided into sections, and each section is split into chunks. The chunks contain the text of the form plus a vector embedding to enable vector similarity search.
(:Form)-[:SECTION]->(:Chunk) // first chunk of a section
(:Chunk)-[:PART_OF]->(:Form) // each chunk connects back up to the form
(:Chunk)-[:NEXT]->(Chunk) // the chunks are connected in a linked list
kg-construction.cypher is a multi-statement Cypher script that constructs the knowledge graph. To create the knowledge graph, run the script in the Neo4j Browser.
- Start a Neo4j database
- Set the OpenAI API Key at the top of the script
- Run the script
Note: If you are using Neo4j Aura, the query interface does not support client-side commands in multi-statement scripts. So, you should first run the :params
statement by itself to set query parameters. Then, run the rest of the script.
To understand all the details about how the knowledge graph is constructed, follow the step-by-step guide in the kg-construction directory.