Replies: 4 comments 1 reply
-
RAG in AC SDKCurrently, the After evaluating potential approaches, only one viable solution emerged: Introduce separate Instance and Session classes tailored for Embedding and Reranking models. This approach would include An alternative idea involved enhancing the existing Now we need to consider what we should do with Vector Databases. It's obvious that we should wrap them in our classes. We solve that problem similar to how we do it with the LLMs. Create a base abstract class which will be inherited and implemented by specific database providers. std::string model = "text-embedding-model.gguf";
EmbeddingInstance einst(model, EmbeddingInstanceOptions{});
EmbeddingSession es = einst.createSession(...);
ChromaVectorDatabase vd(ChromaOptions{}, model);
{
// documentURLs is a list of URLs to the documents which contain the text we need for the vector database
for (const auto& documentURL : documentURLs) {
std::vector<std::string> chunks = getDocumentChunks(documentURL);
std::vector<std::vector<float>> vectors = es.vectorize(chunks);
vd.insert(documentURL, chunks.getMetadata(), vectors);
}
}
{
std::string query = ...;
// in options we can set the number top-K results we want to retrieve
std::vector<std::string> results = vd.query(query, QueryOptions{});
}
LLMInstance linst("text-generation-model.gguf", LLMInstanceOptions{});
LLMSession ls = linst.createSession(...);
{
ls.setInitialPrompt(results);
std::string result;
for (int i = 0; i < 100; i++) {
result += ls.getToken();
}
std::cout << result << std::endl;
}
|
Beta Was this translation helpful? Give feedback.
-
Vector stores/database integration
Some of the most known VSs are:
VSs are usually designed to be used through server for scalability and ease of integration.
FAISS - The only one which seems easy to integrate without a server is Milvus - Has a C++ Client SDK, so we can use it directly. However, I checked that in Qdrant - Since it's written in Rust it'll be easy to make a C library for the Client and integrate it. Lance - Same as Redis - It seems like to a more enterprise oriented VS, but since it's open source and written in C we might have to check it too. |
Beta Was this translation helpful? Give feedback.
-
VS APISince we're about to have different models with big variety of data types (text, audio, video, etc), we need to consider making the VSs more generic to the developers. The folllowing API is basic and still in WIP and about to be enhanced with more funtionlity. For now we'll think it as a sync API as we do for AI models. using EmbeddingVector = std::vector<float>;
using Score = float;
// The embedding instance should be implemented for all models that are meant for embedding
class EmbeddingInstance {
virtual EmbeddingVector getEmbedding(std::string query) = 0;
}
// ========
// The following can be user defined structs which are wrapped In Dicts
// Example structure of text document chunk
struct DocumentRecord {
std::vector<string> metadata;
std::string text;
EmbeddingVector embedding;
}
// Example struct of image
struct ImageRecord {
std::vector<string> metadata;
uint8_t bytes;
EmbeddingVector embedding;
}
// ==========
struct VectorStoreOptions {
uint16_t top; // sets the maximum of the returned results
std::optional<std::function> filter; // search filter before vector search
}
class VectorStore {
// Add/remove records to the store
// Requires a vector of records and their ids
virtual void addRecords(ac::Dict records) = 0;
virtual void removeRecords(ac::Dict ids) = 0;
// Returns a record by id
virtual ac::Dict get(ac::Dict param) = 0;
// Returns a list of ids and scores
// Requires a vector and a top K
virtual ac::Dict search(ac::Dict params) = 0;
};
} |
Beta Was this translation helpful? Give feedback.
-
Design: Vector Stores vs Inference librariesThe decision to make inference libraries as plugins was made upon the restrain we had - they share common libraries which results in ODR violations. We don't have this problem in Vector Stores, that's why we can keep them as libraries and include them to the applications that need them. So to summarize what we'll need for the VS:
|
Beta Was this translation helpful? Give feedback.
-
What is RAG
To support RAG (Retrieval Augmented Generation) application with AC we have to need the following components:
Now we'll check each one of them and see what exactly we have to support.
LLM
The LLM is the generative component of a RAG application. It's responsable for text prediction (token generation). Might be used also to extract context or also rephrase or enhance the query to improve the retrieval.
Embedding models
The emebedding models are used for vectorization of the text. They are used to convert the text into a vector representation. The vector representation is used to compare the similarity between the query and the documents (Semantic Similarity Search).
Vector Database
The vector database is the essential component for efficient storing and searching information based on semantic similarity. The vector database is used to store the vector representation of the documents. The vector representation is used to compare the similarity between the query and the documents. Some of them are capable of supporting multi-modal data handling - support different types of data. Along with the vector emebedding, the vector database can store additional information about the document - a metadata which is used for filtering.
Reranking
The reranking models used to enhance the quality and relevance of the retrieved resulkts before they are passed to the language model for response generation. They can reduce noises and prioritizes results that directly answer the query
Beta Was this translation helpful? Give feedback.
All reactions