Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CLI speed with lazy imports #1319

Merged
merged 79 commits into from
Nov 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
7370bf6
extracted cli layer entry functions
KennyZhang1 Oct 14, 2024
858b1cf
added __init__ file
KennyZhang1 Oct 14, 2024
e67c926
Merge branch 'main' of github.com:microsoft/graphrag into reorganize-…
KennyZhang1 Oct 14, 2024
2e1b953
added top-level main file to hook into cli layer
KennyZhang1 Oct 15, 2024
36bf13d
migrated universal and indexing args
KennyZhang1 Oct 15, 2024
12ea874
migrated prompt-tune and querying args
KennyZhang1 Oct 15, 2024
2f1f1e5
added cli connection functions
KennyZhang1 Oct 15, 2024
4b14051
Merge branch 'main' of github.com:microsoft/graphrag into reorganize-…
KennyZhang1 Oct 15, 2024
b926ccb
tested initial cli reorg
KennyZhang1 Oct 16, 2024
d29278f
separated out init functionality from index cli
KennyZhang1 Oct 16, 2024
a6ef473
deleted old cli files
KennyZhang1 Oct 16, 2024
4f82894
semversioner and ruff fixes
KennyZhang1 Oct 17, 2024
c370ac4
unify help message across all arguments
jgbradley1 Oct 20, 2024
ca9d933
Merge branch 'main' into reorganize-cli-layer
jgbradley1 Oct 20, 2024
0e9e635
unify style of more cli arguments
jgbradley1 Oct 20, 2024
6f437fc
update docs
jgbradley1 Oct 20, 2024
5f4e5e8
define prog for better help menu output
jgbradley1 Oct 20, 2024
37e8124
Merge branch 'main' into reorganize-cli-layer
AlonsoGuevara Oct 21, 2024
0af7c59
convert CLI to Typer
jgbradley1 Oct 22, 2024
4e5d862
update docs
jgbradley1 Oct 22, 2024
88e169a
convert Enum class to str + Enum
jgbradley1 Oct 22, 2024
c06afc0
Merge branch 'main' into joshbradley/convert-cli-to-typer
jgbradley1 Oct 22, 2024
c36d860
update docs
jgbradley1 Oct 22, 2024
d96bb50
minor bug fixes in order to get the end-to-end tutorial to run
jgbradley1 Oct 22, 2024
ca4550d
ruff formatting
jgbradley1 Oct 22, 2024
d583d91
fix pytests
jgbradley1 Oct 22, 2024
b9d2825
update tests to remove calls to as_posix() - it breaks Windows tests
jgbradley1 Oct 22, 2024
92c1ef2
cast test filepaths to str
jgbradley1 Oct 22, 2024
1f6ec8c
Merge branch 'main' into joshbradley/convert-cli-to-typer
AlonsoGuevara Oct 22, 2024
6823295
move heavy imports into functions
jgbradley1 Oct 23, 2024
e913b92
move some imports inside cli functions
jgbradley1 Oct 23, 2024
f5093e1
more cleanup of unnecessary import statements
jgbradley1 Oct 23, 2024
a30d49f
remove import statements from init files
jgbradley1 Oct 24, 2024
70188d0
Merge branch 'main' into joshbradley/import-speedup
jgbradley1 Oct 24, 2024
c90c6d8
remove duplicate
jgbradley1 Oct 24, 2024
dca41ca
update more imports
jgbradley1 Oct 24, 2024
752755d
update cli parameter
jgbradley1 Oct 24, 2024
d112182
update
jgbradley1 Oct 24, 2024
252b7c0
update docs
jgbradley1 Oct 24, 2024
49e6f5d
ruff format fixes
jgbradley1 Oct 24, 2024
a93f925
update docstrings in some init files
jgbradley1 Oct 24, 2024
3d3743e
missed a few import statements that were uncovered during testing
jgbradley1 Oct 25, 2024
6bd6c3f
add semversioner file
jgbradley1 Oct 25, 2024
816fe80
Merge branch 'main' into joshbradley/import-speedup
jgbradley1 Oct 27, 2024
3bb835b
Merge branch 'main' into joshbradley/import-speedup
jgbradley1 Oct 30, 2024
b470d30
code cleanup after merge
jgbradley1 Oct 30, 2024
a0fb206
fix import
jgbradley1 Oct 30, 2024
5163b4b
fix auto-completion
jgbradley1 Oct 31, 2024
3550ec3
add semversioner file
jgbradley1 Oct 31, 2024
331df21
Merge branch 'main' into joshbradley/import-speedup
jgbradley1 Nov 1, 2024
9be3569
cleanup after merge
jgbradley1 Nov 1, 2024
ce50132
ruff fix
jgbradley1 Nov 1, 2024
c8b394d
fix import
jgbradley1 Nov 1, 2024
93e6fa3
fix import
jgbradley1 Nov 1, 2024
182c5d4
Merge branch 'joshbradley/import-speedup' into joshbradley/fix-cli-au…
jgbradley1 Nov 1, 2024
1e8924d
Merge branch 'main' into joshbradley/import-speedup
jgbradley1 Nov 6, 2024
95c1343
fix imports after merge from main
jgbradley1 Nov 6, 2024
9328955
Merge branch 'main' into joshbradley/import-speedup
jgbradley1 Nov 9, 2024
69bc1ba
Merge branch 'joshbradley/fix-cli-autocomplete' into joshbradley/impo…
jgbradley1 Nov 9, 2024
34ebb0c
ruff formatting fixes
jgbradley1 Nov 9, 2024
f882d9f
fix autocompleter
jgbradley1 Nov 9, 2024
90634ba
Merge branch 'main' into joshbradley/import-speedup
jgbradley1 Nov 13, 2024
9861f8e
merge main into branch
jgbradley1 Nov 13, 2024
81fd390
Merge branch 'main' into joshbradley/import-speedup
jgbradley1 Nov 14, 2024
ddddc33
fix merge from main
jgbradley1 Nov 14, 2024
020d5e8
fix merge from main
jgbradley1 Nov 14, 2024
3a9dc5d
update comment
jgbradley1 Nov 14, 2024
81c5af6
Merge branch 'main' into joshbradley/import-speedup
jgbradley1 Nov 14, 2024
e1a7dba
fix merge conflict
jgbradley1 Nov 14, 2024
a4aae95
ruff formatting
jgbradley1 Nov 14, 2024
b1a8613
Merge branch 'main' into joshbradley/import-speedup
jgbradley1 Nov 15, 2024
e1d7ca0
convert relative imports to absolute imports
jgbradley1 Nov 15, 2024
e89f680
fix import path issue
jgbradley1 Nov 15, 2024
5a311bb
fix import paths in init files
jgbradley1 Nov 15, 2024
83f309f
found more relative imports
jgbradley1 Nov 15, 2024
f8fbd49
update comment
jgbradley1 Nov 16, 2024
527f4ac
Merge branch 'main' into joshbradley/import-speedup
jgbradley1 Nov 16, 2024
0ac1285
fix import paths
jgbradley1 Nov 16, 2024
17a9133
fix import path issue
jgbradley1 Nov 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .semversioner/next-release/patch-20241025031711368197.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"type": "patch",
"description": "move import statements out of init files"
}
4 changes: 4 additions & 0 deletions .semversioner/next-release/patch-20241031180003172666.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"type": "patch",
"description": "fix autocompletion of existing files/directory paths."
}
6 changes: 3 additions & 3 deletions docs/prompt_tuning/auto_prompt_tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ Before running auto tuning, ensure you have already initialized your workspace w
You can run the main script from the command line with various options:

```bash
graphrag prompt-tune [--root ROOT] [--domain DOMAIN] [--method METHOD] [--limit LIMIT] [--language LANGUAGE] \
graphrag prompt-tune [--root ROOT] [--config CONFIG] [--domain DOMAIN] [--selection-method METHOD] [--limit LIMIT] [--language LANGUAGE] \
[--max-tokens MAX_TOKENS] [--chunk-size CHUNK_SIZE] [--n-subset-max N_SUBSET_MAX] [--k K] \
[--min-examples-required MIN_EXAMPLES_REQUIRED] [--no-entity-types] [--output OUTPUT]
[--min-examples-required MIN_EXAMPLES_REQUIRED] [--discover-entity-types] [--output OUTPUT]
```

## Command-Line Options
Expand All @@ -49,7 +49,7 @@ graphrag prompt-tune [--root ROOT] [--domain DOMAIN] [--method METHOD] [--limit

- `--min-examples-required` (optional): The minimum number of examples required for entity extraction prompts. Default is 2.

- `--no-entity-types` (optional): Use untyped entity extraction generation. We recommend using this when your data covers a lot of topics or it is highly randomized.
- `--discover-entity-types` (optional): Allow the LLM to discover and extract entities automatically. We recommend using this when your data covers a lot of topics or it is highly randomized.

- `--output` (optional): The folder to save the generated prompts. Default is "prompts".

Expand Down
2 changes: 1 addition & 1 deletion examples/custom_input/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

import pandas as pd

from graphrag.index import run_pipeline_with_config
from graphrag.index.run import run_pipeline_with_config

pipeline_file = os.path.join(
os.path.dirname(os.path.abspath(__file__)), "./pipeline.yml"
Expand Down
4 changes: 2 additions & 2 deletions examples/single_verb/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@

import pandas as pd

from graphrag.index import run_pipeline, run_pipeline_with_config
from graphrag.index.config import PipelineWorkflowReference
from graphrag.index.config.workflow import PipelineWorkflowReference
from graphrag.index.run import run_pipeline, run_pipeline_with_config

# our fake dataset
dataset = pd.DataFrame([{"col1": 2, "col2": 4}, {"col1": 5, "col2": 10}])
Expand Down
7 changes: 4 additions & 3 deletions examples/use_built_in_workflows/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@
import asyncio
import os

from graphrag.index import run_pipeline, run_pipeline_with_config
from graphrag.index.config import PipelineCSVInputConfig, PipelineWorkflowReference
from graphrag.index.input import load_input
from graphrag.index.config.input import PipelineCSVInputConfig
from graphrag.index.config.workflow import PipelineWorkflowReference
from graphrag.index.input.load_input import load_input
from graphrag.index.run import run_pipeline, run_pipeline_with_config

sample_data_dir = os.path.join(
os.path.dirname(os.path.abspath(__file__)), "../_sample_data/"
Expand Down
2 changes: 1 addition & 1 deletion graphrag/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@

"""The GraphRAG package."""

from .cli.main import app
from graphrag.cli.main import app

app(prog_name="graphrag")
3 changes: 2 additions & 1 deletion graphrag/api/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,15 @@
"""

from graphrag.api.index import build_index
from graphrag.api.prompt_tune import DocSelectionType, generate_indexing_prompts
from graphrag.api.prompt_tune import generate_indexing_prompts
from graphrag.api.query import (
drift_search,
global_search,
global_search_streaming,
local_search,
local_search_streaming,
)
from graphrag.prompt_tune.types import DocSelectionType

__all__ = [ # noqa: RUF022
# index API
Expand Down
5 changes: 3 additions & 2 deletions graphrag/api/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,14 @@

from pathlib import Path

from graphrag.config import CacheType, GraphRagConfig
from graphrag.config.enums import CacheType
from graphrag.config.models.graph_rag_config import GraphRagConfig
from graphrag.index.cache.noop_pipeline_cache import NoopPipelineCache
from graphrag.index.create_pipeline_config import create_pipeline_config
from graphrag.index.emit.types import TableEmitterType
from graphrag.index.run import run_pipeline_with_config
from graphrag.index.typing import PipelineRunResult
from graphrag.logging import ProgressReporter
from graphrag.logging.base import ProgressReporter
from graphrag.vector_stores.factory import VectorStoreType


Expand Down
35 changes: 21 additions & 14 deletions graphrag/api/prompt_tune.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,25 +15,32 @@
from pydantic import PositiveInt, validate_call

from graphrag.config.models.graph_rag_config import GraphRagConfig
from graphrag.index.llm import load_llm
from graphrag.logging import PrintProgressReporter
from graphrag.prompt_tune.generator import (
MAX_TOKEN_COUNT,
create_community_summarization_prompt,
create_entity_extraction_prompt,
create_entity_summarization_prompt,
detect_language,
from graphrag.index.llm.load_llm import load_llm
from graphrag.logging.print_progress import PrintProgressReporter
from graphrag.prompt_tune.defaults import MAX_TOKEN_COUNT
from graphrag.prompt_tune.generator.community_report_rating import (
generate_community_report_rating,
)
from graphrag.prompt_tune.generator.community_report_summarization import (
create_community_summarization_prompt,
)
from graphrag.prompt_tune.generator.community_reporter_role import (
generate_community_reporter_role,
generate_domain,
)
from graphrag.prompt_tune.generator.domain import generate_domain
from graphrag.prompt_tune.generator.entity_extraction_prompt import (
create_entity_extraction_prompt,
)
from graphrag.prompt_tune.generator.entity_relationship import (
generate_entity_relationship_examples,
generate_entity_types,
generate_persona,
)
from graphrag.prompt_tune.loader import (
MIN_CHUNK_SIZE,
load_docs_in_chunks,
from graphrag.prompt_tune.generator.entity_summarization_prompt import (
create_entity_summarization_prompt,
)
from graphrag.prompt_tune.generator.entity_types import generate_entity_types
from graphrag.prompt_tune.generator.language import detect_language
from graphrag.prompt_tune.generator.persona import generate_persona
from graphrag.prompt_tune.loader.input import MIN_CHUNK_SIZE, load_docs_in_chunks
from graphrag.prompt_tune.types import DocSelectionType


Expand Down
6 changes: 3 additions & 3 deletions graphrag/api/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,12 @@
import pandas as pd
from pydantic import validate_call

from graphrag.config import GraphRagConfig
from graphrag.config.models.graph_rag_config import GraphRagConfig
from graphrag.index.config.embeddings import (
community_full_content_embedding,
entity_description_embedding,
)
from graphrag.logging import PrintProgressReporter
from graphrag.logging.print_progress import PrintProgressReporter
from graphrag.query.factories import (
get_drift_search_engine,
get_global_search_engine,
Expand All @@ -47,8 +47,8 @@
from graphrag.query.structured_search.base import SearchResult # noqa: TCH001
from graphrag.utils.cli import redact
from graphrag.utils.embeddings import create_collection_name
from graphrag.vector_stores import VectorStoreFactory, VectorStoreType
from graphrag.vector_stores.base import BaseVectorStore
from graphrag.vector_stores.factory import VectorStoreFactory, VectorStoreType

reporter = PrintProgressReporter("")

Expand Down
11 changes: 5 additions & 6 deletions graphrag/callbacks/factories.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,16 @@

from datashaper import WorkflowCallbacks

from graphrag.config import ReportingType
from graphrag.index.config import (
from graphrag.callbacks.blob_workflow_callbacks import BlobWorkflowCallbacks
from graphrag.callbacks.console_workflow_callbacks import ConsoleWorkflowCallbacks
from graphrag.callbacks.file_workflow_callbacks import FileWorkflowCallbacks
from graphrag.config.enums import ReportingType
from graphrag.index.config.reporting import (
PipelineBlobReportingConfig,
PipelineFileReportingConfig,
PipelineReportingConfig,
)

from .blob_workflow_callbacks import BlobWorkflowCallbacks
from .console_workflow_callbacks import ConsoleWorkflowCallbacks
from .file_workflow_callbacks import FileWorkflowCallbacks


def create_pipeline_reporter(
config: PipelineReportingConfig | None, root_dir: str | None
Expand Down
3 changes: 1 addition & 2 deletions graphrag/callbacks/global_search_callbacks.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,9 @@

"""GlobalSearch LLM Callbacks."""

from graphrag.callbacks.llm_callbacks import BaseLLMCallback
from graphrag.query.structured_search.base import SearchResult

from .llm_callbacks import BaseLLMCallback


class GlobalSearchLLMCallback(BaseLLMCallback):
"""GlobalSearch LLM Callbacks."""
Expand Down
2 changes: 1 addition & 1 deletion graphrag/callbacks/progress_workflow_callbacks.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

from datashaper import ExecutionNode, NoopWorkflowCallbacks, Progress, TableContainer

from graphrag.logging import ProgressReporter
from graphrag.logging.base import ProgressReporter


class ProgressWorkflowCallbacks(NoopWorkflowCallbacks):
Expand Down
14 changes: 7 additions & 7 deletions graphrag/cli/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,15 +11,15 @@
from pathlib import Path

import graphrag.api as api
from graphrag.config import (
CacheType,
enable_logging_with_config,
load_config,
resolve_paths,
)
from graphrag.config.enums import CacheType
from graphrag.config.load_config import load_config
from graphrag.config.logging import enable_logging_with_config
from graphrag.config.resolve_path import resolve_paths
from graphrag.index.emit.types import TableEmitterType
from graphrag.index.validate_config import validate_config_names
from graphrag.logging import ProgressReporter, ReporterType, create_progress_reporter
from graphrag.logging.base import ProgressReporter
from graphrag.logging.factories import create_progress_reporter
from graphrag.logging.types import ReporterType
from graphrag.utils.cli import redact

# Ignore warnings from numba
Expand Down
3 changes: 2 additions & 1 deletion graphrag/cli/initialize.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
from pathlib import Path

from graphrag.config.init_content import INIT_DOTENV, INIT_YAML
from graphrag.logging import ReporterType, create_progress_reporter
from graphrag.logging.factories import create_progress_reporter
from graphrag.logging.types import ReporterType
from graphrag.prompts.index.claim_extraction import CLAIM_EXTRACTION_PROMPT
from graphrag.prompts.index.community_report import (
COMMUNITY_REPORT_PROMPT,
Expand Down
Loading
Loading