Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs/issue 1661 add tip to source docs and update weaviate docs #1662

Merged
merged 9 commits into from
Aug 23, 2024
17 changes: 16 additions & 1 deletion docs/website/docs/dlt-ecosystem/destinations/lancedb.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,22 @@ lancedb_adapter(
)
```

Bear in mind that you can't use an adapter on a [dlt source](../../general-usage/source.md), only a [dlt resource](../../general-usage/resource.md).
When using the `lancedb_adapter`, it's important to apply it directly to resources, not to the whole source. Here's an example:

```py
products_tables = sql_database().with_resources("products", "customers")

pipeline = dlt.pipeline(
pipeline_name="postgres_to_lancedb_pipeline",
destination="lancedb",
)

# apply adapter to the needed resources
lancedb_adapter(products_tables.products, embed="description")
lancedb_adapter(products_tables.customers, embed="bio")

info = pipeline.run(products_tables)
```

## Write disposition

Expand Down
19 changes: 17 additions & 2 deletions docs/website/docs/dlt-ecosystem/destinations/qdrant.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,10 +106,25 @@ qdrant_adapter(
)
```

:::tip
When using the `qdrant_adapter`, it's important to apply it directly to resources, not to the whole source. Here's an example:

A more comprehensive pipeline would load data from some API or use one of dlt's [verified sources](../verified-sources/).
```py
products_tables = sql_database().with_resources("products", "customers")

pipeline = dlt.pipeline(
pipeline_name="postgres_to_qdrant_pipeline",
destination="qdrant",
)

# apply adapter to the needed resources
qdrant_adapter(products_tables.products, embed="description")
qdrant_adapter(products_tables.customers, embed="bio")

info = pipeline.run(products_tables)
```

:::tip
A more comprehensive pipeline would load data from some API or use one of dlt's [verified sources](../verified-sources/).
:::

## Write disposition
Expand Down
16 changes: 16 additions & 0 deletions docs/website/docs/dlt-ecosystem/destinations/weaviate.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,22 @@ weaviate_adapter(
tokenization={"title": "word", "description": "whitespace"},
)
```
When using the `weaviate_adapter`, it's important to apply it directly to resources, not to the whole source. Here's an example:

```py
products_tables = sql_database().with_resources("products", "customers")

pipeline = dlt.pipeline(
pipeline_name="postgres_to_weaviate_pipeline",
destination="weaviate",
)

# apply adapter to the needed resources
weaviate_adapter(products_tables.products, vectorize="description")
weaviate_adapter(products_tables.customers, vectorize="bio")

info = pipeline.run(products_tables)
```

:::tip

Expand Down
20 changes: 20 additions & 0 deletions docs/website/docs/general-usage/source.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,26 @@ Several data sources are prone to contain semi-structured documents with very de
MongoDB databases. Our practical experience is that setting the `max_nesting_level` to 2 or 3
produces the clearest and human-readable schemas.

:::tip
The `max_table_nesting` parameter at the source level doesn't automatically apply to individual
resources when accessed directly (e.g., using `source.resources["resource_1"])`. To make sure it
works, either use `source.with_resources("resource_1")` or set the parameter directly on the resource.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, add an example how parameter could be set directly in the resource

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

:::


You can directly configure the `max_table_nesting` parameter on the resource level as:

```py
@dlt.resource(max_table_nesting=0)
def my_resource():
...
```
or
```py
my_source = source()
my_source.my_resource.max_table_nesting = 0
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
or
```py
my_source = my_source()
my_source.my_resource.max_table_nesting = 0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

### Modify schema

The schema is available via `schema` property of the source.
Expand Down
Loading