Skip to content

Commit

Permalink
[Feature] Add support for hybrid search for pinecone vector database (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
deshraj authored Feb 15, 2024
1 parent 0766a44 commit 38b4e06
Show file tree
Hide file tree
Showing 18 changed files with 470 additions and 326 deletions.
2 changes: 1 addition & 1 deletion docs/_snippets/missing-vector-db-tip.mdx
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@


<p>If you can't find the specific vector database, please feel free to request through one of the following channels and help us prioritize.</p>
<p>If you can't find specific feature or run into issues, please feel free to reach out through one of the following channels.</p>

<CardGroup cols={2}>
<Card title="Slack" icon="slack" href="https://embedchain.ai/slack" color="#4A154B">
Expand Down
2 changes: 1 addition & 1 deletion docs/components/data-sources/google-drive.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,4 @@ app = App()

url = "https://drive.google.com/drive/u/0/folders/xxx-xxx"
app.add(url, data_type="google_drive")
```
```
57 changes: 29 additions & 28 deletions docs/components/data-sources/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -5,34 +5,35 @@ title: Overview
Embedchain comes with built-in support for various data sources. We handle the complexity of loading unstructured data from these data sources, allowing you to easily customize your app through a user-friendly interface.

<CardGroup cols={4}>
<Card title="📰 PDF file" href="/components/data-sources/pdf-file"></Card>
<Card title="📊 CSV file" href="/components/data-sources/csv"></Card>
<Card title="📃 JSON file" href="/components/data-sources/json"></Card>
<Card title="📝 Text" href="/components/data-sources/text"></Card>
<Card title="📁 Directory/ Folder" href="/components/data-sources/directory"></Card>
<Card title="🌐 HTML Web page" href="/components/data-sources/web-page"></Card>
<Card title="📽️ Youtube Channel" href="/components/data-sources/youtube-channel"></Card>
<Card title="📺 Youtube Video" href="/components/data-sources/youtube-video"></Card>
<Card title="📚 Docs website" href="/components/data-sources/docs-site"></Card>
<Card title="📝 MDX file" href="/components/data-sources/mdx"></Card>
<Card title="📄 DOCX file" href="/components/data-sources/docx"></Card>
<Card title="📓 Notion" href="/components/data-sources/notion"></Card>
<Card title="🗺️ Sitemap" href="/components/data-sources/sitemap"></Card>
<Card title="🧾 XML file" href="/components/data-sources/xml"></Card>
<Card title="❓💬 Q&A pair" href="/components/data-sources/qna"></Card>
<Card title="🙌 OpenAPI" href="/components/data-sources/openapi"></Card>
<Card title="📬 Gmail" href="/components/data-sources/gmail"></Card>
<Card title="📝 Github" href="/components/data-sources/github"></Card>
<Card title="🐘 Postgres" href="/components/data-sources/postgres"></Card>
<Card title="🐬 MySQL" href="/components/data-sources/mysql"></Card>
<Card title="🤖 Slack" href="/components/data-sources/slack"></Card>
<Card title="💬 Discord" href="/components/data-sources/discord"></Card>
<Card title="🗨️ Discourse" href="/components/data-sources/discourse"></Card>
<Card title="📝 Substack" href="/components/data-sources/substack"></Card>
<Card title="🐝 Beehiiv" href="/components/data-sources/beehiiv"></Card>
<Card title="💾 Dropbox" href="/components/data-sources/dropbox"></Card>
<Card title="🖼️ Image" href="/components/data-sources/image"></Card>
<Card title="⚙️ Custom" href="/components/data-sources/custom"></Card>
<Card title="PDF file" href="/components/data-sources/pdf-file"></Card>
<Card title="CSV file" href="/components/data-sources/csv"></Card>
<Card title="JSON file" href="/components/data-sources/json"></Card>
<Card title="Text" href="/components/data-sources/text"></Card>
<Card title="Directory" href="/components/data-sources/directory"></Card>
<Card title="Web page" href="/components/data-sources/web-page"></Card>
<Card title="Youtube Channel" href="/components/data-sources/youtube-channel"></Card>
<Card title="Youtube Video" href="/components/data-sources/youtube-video"></Card>
<Card title="Docs website" href="/components/data-sources/docs-site"></Card>
<Card title="MDX file" href="/components/data-sources/mdx"></Card>
<Card title="DOCX file" href="/components/data-sources/docx"></Card>
<Card title="Notion" href="/components/data-sources/notion"></Card>
<Card title="Sitemap" href="/components/data-sources/sitemap"></Card>
<Card title="XML file" href="/components/data-sources/xml"></Card>
<Card title="Q&A pair" href="/components/data-sources/qna"></Card>
<Card title="OpenAPI" href="/components/data-sources/openapi"></Card>
<Card title="Gmail" href="/components/data-sources/gmail"></Card>
<Card title="Google Drive" href="/components/data-sources/google-drive"></Card>
<Card title="GitHub" href="/components/data-sources/github"></Card>
<Card title="Postgres" href="/components/data-sources/postgres"></Card>
<Card title="MySQL" href="/components/data-sources/mysql"></Card>
<Card title="Slack" href="/components/data-sources/slack"></Card>
<Card title="Discord" href="/components/data-sources/discord"></Card>
<Card title="Discourse" href="/components/data-sources/discourse"></Card>
<Card title="Substack" href="/components/data-sources/substack"></Card>
<Card title="Beehiiv" href="/components/data-sources/beehiiv"></Card>
<Card title="Dropbox" href="/components/data-sources/dropbox"></Card>
<Card title="Image" href="/components/data-sources/image"></Card>
<Card title="Custom" href="/components/data-sources/custom"></Card>
</CardGroup>

<br/ >
Expand Down
238 changes: 0 additions & 238 deletions docs/components/vector-databases.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,242 +17,4 @@ Utilizing a vector database alongside Embedchain is a seamless process. All you
<Card title="Weaviate" href="#weaviate"></Card>
</CardGroup>

## ChromaDB

<CodeGroup>

```python main.py
from embedchain import App

# load chroma configuration from yaml file
app = App.from_config(config_path="config1.yaml")
```

```yaml config1.yaml
vectordb:
provider: chroma
config:
collection_name: 'my-collection'
dir: db
allow_reset: true
```
```yaml config2.yaml
vectordb:
provider: chroma
config:
collection_name: 'my-collection'
host: localhost
port: 5200
allow_reset: true
```
</CodeGroup>
## Elasticsearch
Install related dependencies using the following command:
```bash
pip install --upgrade 'embedchain[elasticsearch]'
```

<Note>
You can configure the Elasticsearch connection by providing either `es_url` or `cloud_id`. If you are using the Elasticsearch Service on Elastic Cloud, you can find the `cloud_id` on the [Elastic Cloud dashboard](https://cloud.elastic.co/deployments).
</Note>

You can authorize the connection to Elasticsearch by providing either `basic_auth`, `api_key`, or `bearer_auth`.

<CodeGroup>

```python main.py
from embedchain import App

# load elasticsearch configuration from yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
vectordb:
provider: elasticsearch
config:
collection_name: 'es-index'
cloud_id: 'deployment-name:xxxx'
basic_auth:
- elastic
- <your_password>
verify_certs: false
```
</CodeGroup>
## OpenSearch
Install related dependencies using the following command:
```bash
pip install --upgrade 'embedchain[opensearch]'
```

<CodeGroup>

```python main.py
from embedchain import App

# load opensearch configuration from yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
vectordb:
provider: opensearch
config:
collection_name: 'my-app'
opensearch_url: 'https://localhost:9200'
http_auth:
- admin
- admin
vector_dimension: 1536
use_ssl: false
verify_certs: false
```
</CodeGroup>
## Zilliz
Install related dependencies using the following command:
```bash
pip install --upgrade 'embedchain[milvus]'
```

Set the Zilliz environment variables `ZILLIZ_CLOUD_URI` and `ZILLIZ_CLOUD_TOKEN` which you can find it on their [cloud platform](https://cloud.zilliz.com/).

<CodeGroup>

```python main.py
import os
from embedchain import App

os.environ['ZILLIZ_CLOUD_URI'] = 'https://xxx.zillizcloud.com'
os.environ['ZILLIZ_CLOUD_TOKEN'] = 'xxx'

# load zilliz configuration from yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
vectordb:
provider: zilliz
config:
collection_name: 'zilliz_app'
uri: https://xxxx.api.gcp-region.zillizcloud.com
token: xxx
vector_dim: 1536
metric_type: L2
```
</CodeGroup>
## LanceDB
_Coming soon_
## Pinecone
Install pinecone related dependencies using the following command:
```bash
pip install --upgrade 'embedchain[pinecone]'
```

In order to use Pinecone as vector database, set the environment variable `PINECONE_API_KEY` which you can find on [Pinecone dashboard](https://app.pinecone.io/).

<CodeGroup>

```python main.py
from embedchain import App

# load pinecone configuration from yaml file
app = App.from_config(config_path="pod_config.yaml")
# or
app = App.from_config(config_path="serverless_config.yaml")
```

```yaml pod_config.yaml
vectordb:
provider: pinecone
config:
metric: cosine
vector_dimension: 1536
index_name: my-pinecone-index
pod_config:
environment: gcp-starter
metadata_config:
indexed:
- "url"
- "hash"
```
```yaml serverless_config.yaml
vectordb:
provider: pinecone
config:
metric: cosine
vector_dimension: 1536
index_name: my-pinecone-index
serverless_config:
cloud: aws
region: us-west-2
```
</CodeGroup>
<br />
<Note>
You can find more information about Pinecone configuration [here](https://docs.pinecone.io/docs/manage-indexes#create-a-pod-based-index).
You can also optionally provide `index_name` as a config param in yaml file to specify the index name. If not provided, the index name will be `{collection_name}-{vector_dimension}`.
</Note>

## Qdrant

In order to use Qdrant as a vector database, set the environment variables `QDRANT_URL` and `QDRANT_API_KEY` which you can find on [Qdrant Dashboard](https://cloud.qdrant.io/).

<CodeGroup>
```python main.py
from embedchain import App
# load qdrant configuration from yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
vectordb:
provider: qdrant
config:
collection_name: my_qdrant_index
```
</CodeGroup>

## Weaviate

In order to use Weaviate as a vector database, set the environment variables `WEAVIATE_ENDPOINT` and `WEAVIATE_API_KEY` which you can find on [Weaviate dashboard](https://console.weaviate.cloud/dashboard).

<CodeGroup>
```python main.py
from embedchain import App
# load weaviate configuration from yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
vectordb:
provider: weaviate
config:
collection_name: my_weaviate_index
```
</CodeGroup>

<Snippet file="missing-vector-db-tip.mdx" />
35 changes: 35 additions & 0 deletions docs/components/vector-databases/chromadb.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
title: ChromaDB
---

<CodeGroup>

```python main.py
from embedchain import App

# load chroma configuration from yaml file
app = App.from_config(config_path="config1.yaml")
```

```yaml config1.yaml
vectordb:
provider: chroma
config:
collection_name: 'my-collection'
dir: db
allow_reset: true
```
```yaml config2.yaml
vectordb:
provider: chroma
config:
collection_name: 'my-collection'
host: localhost
port: 5200
allow_reset: true
```
</CodeGroup>
<Snippet file="missing-vector-db-tip.mdx" />
39 changes: 39 additions & 0 deletions docs/components/vector-databases/elasticsearch.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
title: Elasticsearch
---

Install related dependencies using the following command:

```bash
pip install --upgrade 'embedchain[elasticsearch]'
```

<Note>
You can configure the Elasticsearch connection by providing either `es_url` or `cloud_id`. If you are using the Elasticsearch Service on Elastic Cloud, you can find the `cloud_id` on the [Elastic Cloud dashboard](https://cloud.elastic.co/deployments).
</Note>

You can authorize the connection to Elasticsearch by providing either `basic_auth`, `api_key`, or `bearer_auth`.

<CodeGroup>

```python main.py
from embedchain import App

# load elasticsearch configuration from yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
vectordb:
provider: elasticsearch
config:
collection_name: 'es-index'
cloud_id: 'deployment-name:xxxx'
basic_auth:
- elastic
- <your_password>
verify_certs: false
```
</CodeGroup>
<Snippet file="missing-vector-db-tip.mdx" />
Loading

0 comments on commit 38b4e06

Please sign in to comment.