Knowledge Graph Constructor with Schema.org

Automatic Knowledge Graph Constructor using LLM-based entity extraction and Schema.org type alignment.

Overview

This project provides a pipeline for automatically building knowledge graphs from unstructured text. It uses Large Language Models (LLMs) to extract entities and relationships, then aligns them with Schema.org types and properties to create standards-compliant knowledge graphs.

Key Features

LLM-based Entity Extraction: Automatically identifies entities (people, organizations, places, events, products) from text
Schema.org Alignment: Maps extracted entities to standard Schema.org types
Relationship Extraction: Discovers relationships between entities using Schema.org properties
Multiple Output Formats: Export as JSON-LD or RDF/Turtle
Queryable: Ask questions about the constructed knowledge graph

Architecture

Text Input
    ↓
[SchemaOrgEntityExtractor]
    ↓ (uses LLM)
Entities + Relations
    ↓
[KnowledgeGraphConstructor]
    ↓
Knowledge Graph
    ↓
JSON-LD / RDF Output

Components:

schema_loader.py: Loads and queries Schema.org type definitions
schemaorg_memory_entry.py: Pydantic models for entities, relations, and graph entries
schemaorg_entity_extractor.py: LLM-based entity and relation extraction
knowledge_graph_constructor.py: Main class for building and querying knowledge graphs
config.py: Configuration settings and constants
example_usage.py: Example script demonstrating usage

Installation

Prerequisites

Python 3.8 or higher
OpenAI API key (or compatible LLM endpoint)

Install Dependencies

pip install -r requirements.txt

Dependencies:

rdflib>=6.0.0 - RDF graph manipulation
pydantic>=2.0.0 - Data validation and models
openai>=1.0.0 - LLM API client
numpy>=1.20.0 - Numerical operations

Environment Setup

Set your OpenAI API key:

export OPENAI_API_KEY='your-api-key-here'

Usage

Basic Example

from knowledge_graph_constructor import KnowledgeGraphConstructor

# Initialize the constructor
kg = KnowledgeGraphConstructor(
    api_key="your-openai-api-key",
    model="gpt-4"
)

# Add text to extract entities and relations
kg.add_text(
    "Alice works at Google as a software engineer. She met Bob at "
    "TechConference 2025 in San Francisco."
)

# Finalize the knowledge graph
kg.finalize()

# Get statistics
stats = kg.get_stats()
print(f"Entities: {stats['entities']}, Relations: {stats['relations']}")

# Query the knowledge graph
answer = kg.query("Where does Alice work?")
print(answer)

# Export as JSON-LD
jsonld = kg.to_jsonld()

# Export as RDF/Turtle
rdf = kg.to_rdf()

Run the Example Script

python example_usage.py

Expected Output

The example script will:

Extract entities and relations from sample text
Print statistics about the knowledge graph
Answer a query about the extracted information
Export the knowledge graph in JSON-LD and RDF/Turtle formats
Save outputs to output.jsonld and output.ttl

Sample JSON-LD Output:

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@id": "urn:kg:entity-id-1",
      "@type": "Person",
      "name": "Alice",
      "jobTitle": "software engineer",
      "worksFor": {
        "@id": "urn:kg:entity-id-2"
      }
    },
    {
      "@id": "urn:kg:entity-id-2",
      "@type": "Organization",
      "name": "Google"
    },
    {
      "@id": "urn:kg:entity-id-3",
      "@type": "Person",
      "name": "Bob",
      "jobTitle": "CEO"
    }
  ]
}

Sample RDF/Turtle Output:

@prefix schema: <https://schema.org/> .
@prefix kg: <urn:kg:> .

kg:entity-id-1 a schema:Person ;
    schema:name "Alice" .

kg:entity-id-1 schema:worksFor kg:entity-id-2 .

kg:entity-id-2 a schema:Organization ;
    schema:name "Google" .

Supported Entity Types

The system supports the following Schema.org types:

Person: Individuals
Organization: Companies, institutions, corporations
Place: Locations, cities, countries
Event: Conferences, meetings, gatherings
Product: Products, software applications
CreativeWork: Articles, books, movies

Supported Relationships

Common Schema.org properties used for relationships:

worksFor: Person employed by Organization
location: Entity located at Place
knows: Person knows Person
attendee: Person attended Event
manufacturer: Organization produces Product
founder: Person founded Organization
memberOf: Person/Organization member of Organization
author: Person authored CreativeWork

Advanced Usage

Custom LLM Configuration

# Use a different model
kg = KnowledgeGraphConstructor(
    api_key="your-api-key",
    model="gpt-3.5-turbo"
)

# Use a custom endpoint
kg = KnowledgeGraphConstructor(
    api_key="your-api-key",
    model="custom-model",
    base_url="https://your-custom-endpoint.com/v1"
)

Query the Knowledge Graph

# Ask questions about extracted information
answer = kg.query("Who is the CEO of StartupXYZ?")
answer = kg.query("Where is the conference located?")
answer = kg.query("What does Alice do?")

Multiple Text Inputs

kg.add_text("First piece of information...")
kg.add_text("Second piece of information...")
kg.add_text("Third piece of information...")
kg.finalize()

Project Structure

simplemem_schemaorg-kg-constructor/
├── schema_loader.py              # Schema.org type definitions
├── schemaorg_memory_entry.py     # Pydantic models
├── schemaorg_entity_extractor.py # LLM-based extraction
├── knowledge_graph_constructor.py # Main constructor class
├── config.py                      # Configuration settings
├── example_usage.py               # Example script
├── requirements.txt               # Python dependencies
├── README.md                      # This file
├── LICENSE                        # License information
└── .gitignore                     # Git ignore rules

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

See the LICENSE file for details.

References

Schema.org - Structured data vocabulary
JSON-LD - JSON-based linked data format
RDF - Resource Description Framework

Citation

If you use this project in your research, please cite:

@software{schemaorg_kg_constructor,
  title = {Automatic Knowledge Graph Constructor using LLM-based entity extraction and Schema.org type alignment.},
  author = {Tom Young},
  year = {2026},
  url = {https://github.com/imadcat/simplemem_schemaorg-kg-constructor}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge Graph Constructor with Schema.org

Overview

Key Features

Architecture

Installation

Prerequisites

Install Dependencies

Environment Setup

Usage

Basic Example

Run the Example Script

Expected Output

Supported Entity Types

Supported Relationships

Advanced Usage

Custom LLM Configuration

Query the Knowledge Graph

Multiple Text Inputs

Project Structure

Contributing

License

References

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
example_usage.py		example_usage.py
knowledge_graph_constructor.py		knowledge_graph_constructor.py
requirements.txt		requirements.txt
schema_loader.py		schema_loader.py
schemaorg_entity_extractor.py		schemaorg_entity_extractor.py
schemaorg_memory_entry.py		schemaorg_memory_entry.py

Folders and files

Latest commit

History

Repository files navigation

Knowledge Graph Constructor with Schema.org

Overview

Key Features

Architecture

Installation

Prerequisites

Install Dependencies

Environment Setup

Usage

Basic Example

Run the Example Script

Expected Output

Supported Entity Types

Supported Relationships

Advanced Usage

Custom LLM Configuration

Query the Knowledge Graph

Multiple Text Inputs

Project Structure

Contributing

License

References

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages