Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ‘dimensions’ parameter for OpenAIEmbedding #1041

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

SliverBulle
Copy link

@SliverBulle SliverBulle commented Aug 28, 2024

Description

This pull request introduces an dimensions option into the GraphRAG's embedding, allowing it to take a user-defined size of dimensions which won't affect current usage of default embedding dimensions.

Related Issues

None.

Proposed Changes

Modified the GraphRAG embedding code to accept a dimensions parameter, enabling users to customize its size.
Added relevant error checks to ensure the input dimensions value is within reasonable limits.

Checklist

  • [√ ] I have tested these changes locally.
  • [ √] I have reviewed the code changes.
  • I have updated the documentation (if necessary).
  • I have added appropriate unit tests (if applicable).

Additional Notes

According to https://openai.com/index/new-embedding-models-and-api-updates/, the text-embedding-3-large has a default dimensions size of 3072, which is not suitable for everyone.
Noticing excessive embedding dimensions can lead to significant computational and storage overhead without yielding proportional performance improvements.

How to use

add dimensions: <your dimensions> in setting.yaml after initialize the project.

embeddings:
  async_mode: threaded # or asyncio
  llm:
    api_key: ${EMBEDDING_KEY}
    type: azure_openai_embedding # or azure_openai_embedding
    model: text-embedding-3-large
    api_base: ${EMBEDDING_BASE}
    api_version: "2024-02-01"
    # organization: <organization_id>
    deployment_name: text-embedding-3-large
    dimensions: 1024

add dimensions: <your dimensions> in OpenAIEmbedding(in graphrag_local_search.ipynb or global search )

text_embedder = OpenAIEmbedding(
    api_key=api_key,
    api_base=azure_endpoint,
    api_version=api_version,
    api_type=OpenaiApiType.AzureOpenAI,
    model=embedding_model,
    deployment_name=deployment_name,
    max_retries=20,
    dimensions = 1024
)

@SliverBulle SliverBulle requested review from a team as code owners August 28, 2024 09:32
@SliverBulle
Copy link
Author

Moreover, what bothers me is that after setting the dimensions once in setting.yaml (or not setting it at all, thus defaulting to dimensions=None, which would not be invoked in OpenAI's client.embeddings.create), such dimensions (either None or a certain number) will persist throughout the same project even if the setting.yaml changed. So I have to make a new dir , run python -m graphrag.index --init --root ./ragtest and set up setting.yaml every time I want to set up a new project.

@AlonsoGuevara
Copy link
Contributor

Hi!

Is this a revision of #1020 ?

@SliverBulle
Copy link
Author

Hi!

Is this a revision of #1020 ?

Yes! Sorry I forgot to close it #1020 .

@SliverBulle SliverBulle changed the title Add dimensions option for embedding Add ‘dimensions’ parameter for OpenAIEmbedding Sep 6, 2024
@ZohebAbai
Copy link

Hi @AlonsoGuevara , I'm following the progress of this PR, and the dimensions parameter for embeddings is a useful enhancement. I would like to have this feature in GraphRAG . Please let me know if there's anything I can do to assist in moving this forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants