Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculation of "5:1 -- Cost Ratio of generation of text using GPT-3.5-Turbo vs OpenAI embedding" #18

Open
theomart opened this issue Aug 18, 2023 · 3 comments

Comments

@theomart
Copy link

Hi, could you share the calculation for this one, and maybe add it to the footnote? not sure I understand it.

Here are the numbers I find on the open ai pricing page:

GPT 3.5 Turbo text generation

Model Input Output
4K context $0.0015 / 1K tokens $0.002 / 1K tokens
16K context $0.003 / 1K tokens $0.004 / 1K tokens

Embeddings

Model Usage
Ada v2 $0.0001 / 1K tokens

You give the example of answering "What is the capital of Delaware?", if you had to answer this question with a LM that doesn't have the info in its weights you could say you have to embed all the documents of a corpus that contains this answer, you could choose an arbitrary narrow scope but that could also be the whole wikipedia, which is something like 5.6B tokens, and would cost something like 5.6e9/1000*$0.0001=$560 to just index.

What am I missing ?

Thanks!

@waleedkadous
Copy link
Collaborator

Hey Theo,

Thanks for reaching out!

The point is you don't use an LM, you use a vector database. You embed questions like capitals in a semantic search index. This includes products/oss projects like FAISS, Chroma and commercial products like Vectara, Pinecone, etc.

These are all examples of the general body of techniques called retrieval augmented generation.

@theomart
Copy link
Author

Thank you for your answer Waleed!

I understand the tools you mentioned help in the retrieval part, however when doing retrieval augmented generation you have a retreiver and a generator, the generator being a language model.

I understand you could just retrieve the document without using a lanugage model but that would juste be called document retrieval.

In both scenarios, creating embeddings, indexing, and performing semantic search are necessary steps. Regarding the 5:1 ratio, you mentioned vector lookup being considered free, but I'm curious about the calculations behind it, especially given potential expenses with large document sets.

Add to that the cost of the generator, which I'm sure would be cheaper than an API call to GPT 3.5 Turbo as you don't need a model that big once you feed it the info it needs on a case by case basis, but still requires an infrastructure to run on.

Could you please provide insight into the calculation for the 5:1 ratio?

Appreciate your help!

@christhomas412
Copy link

Sure, here's the calculation breakdown:

For text generation:

GPT-3.5 Turbo with 4K context: $0.0015 per 1K tokens input, $0.002 per 1K tokens output.
GPT-3.5 Turbo with 16K context: $0.003 per 1K tokens input, $0.004 per 1K tokens output.
For embeddings:

Ada v2: $0.0001 per 1K tokens.
You're correct in your Numbers example about embedding documents for answering questions. Let's say you need to index the entire Wikipedia (around 5.6 billion tokens). The cost would be approximately:
Cost = (5.6e9 / 1000) * $0.0001 = $560.

If you have further questions or need clarification, feel free to ask!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants