Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.0.0 announcement and release notes #1778

Closed
rudolfix opened this issue Sep 2, 2024 · 0 comments
Closed

1.0.0 announcement and release notes #1778

rudolfix opened this issue Sep 2, 2024 · 0 comments
Assignees
Labels
sprint Marks group of tasks with core team focus at this moment

Comments

@rudolfix
Copy link
Collaborator

rudolfix commented Sep 2, 2024

Why 1.0?

We are releasing 1.0.0 version of dlt. In the last 2 years we've got quite stable (in term of our API, internal migrations and major bugs being rare) and feature complete. There so many production deployments that even with our obsessive approach to testing (you can always write more test cases!) we are pretty confident dlt is now "stable" and ready for production.

What is coming if full release

  1. We move sql database filesystem/buckets and rest api sources to the core library to make them easily available, stabilize the APIs and run tons of additional tests.
  2. Our documentation gets a big update: additional tutorials on syncing the databases, working with buckets and file readers and using rest api toolkit to declare pipelines loading data from REST APIs

On top of that we will plan a few quick follow-up features:

  1. Define hints for nested tables/resources (currently only root table can be conveniently hinted) Simplify schema modification of child tables #1647
  2. Define cross-table references allow to model table and column references in dlt schema #1713
  3. SQL Alchemy destination is coming with SQLLite and MySQL fully tested (and optimized). You'll be able to bring your own settings to finetune other dialects (Sqlalchemy destination #1734 and implement sql alchemy destination #21 )
  4. We will finally stabilize dlt traces, expose a core source and a data contract (schema) so loading dlt metadata is easy and predictable

Deprecations and Breaking Changes

  1. Load packages with failed jobs (terminally) will be automatically aborted with an exception. Currently user had to detect this in code (this behavior will be still available). abort load package and raise exception on terminal errors in jobs #1749
  2. To use iceberg table format on Athena destination, set the table_format to iceberg on all your resources instead of force_iceberg flag in destination configuration. This flag is deprecated but will be still observed for backward compatibility.
  3. complex type is deprecated and superseded by json Rename "complex" data type to "json" #1673

Internal or obscure changes:

  1. A few column hints (foregin_key and index) that were not documented and have no real use, will be removed.
  2. if primary key was used in nested table, linking was not created in relational.py. now linking is skipped when nested row is fitted into table that is not nested (does not have a parent). a rare case of someone that does not want dlt linking
  3. removes generate_dlt_id from json relational normalizer config
  4. deprecates skip_complex_types in dlt Pydantic config, asks to use skip_nested_types
  5. when extracting a list of standalone resources, they will be grouped in smallest possible number of source (previously: each resource was extracted in a single source, including transformers, creates a single source in extract for all resource instances passed as list #1535
  6. secrets (TSecretValue and configs deriving from Credentials) won't be saved to trace dumps dlt pipeline -v <pipeline> trace source password not redacted #1687

dlt schema engine migration
If you run this version against existing dataset in a destination, schema in _dlt_version will be migrated to engine v10. Same applies to local pipeline working dir. You can restore the old schema by deleting the migrated version from the version table.

New Versioning Scheme

We'll follow classical major.minor.patch scheme. Where

  • major means breaking changes and removed deprecations
  • minor new features, sometimes automatic migrations
  • patch bug fixes

Version rollout plan

  1. 0.5.x will be still supported: docs will be available and major bugs fixed
  2. We plan an alpha release with sources merged in the core and docs updates early next week.
  3. We plan 1.0.0 release in the second / third week of September
  4. Each next week we'll release one of follow-up features
  5. Track our progress here: https://github.com/orgs/dlt-hub/projects/9/views/3

0.9.9a1 pre-release available

This pre-release brings sql, filesystem and rest_api sources to the core and introduces 95% of the breaking changes and the deprecations. New documentation is not yet available. ⚠️ do not deploy in production ⚠️ will migrate existing schemas - try on fresh datasets
try

from dlt.sources.sql_database import sql_table

or

dlt init sql_database duckdb

to start a new project

breaking changes and warnings

Deprecations and Breaking Changes

  1. Load packages with failed jobs (terminally) will be automatically aborted with an exception. Currently user had to detect this in code (this behavior will be still available). abort load package and raise exception on terminal errors in jobs #1749
  2. To use iceberg table format on Athena destination, set the table_format to iceberg on all your resources instead of force_iceberg flag in destination configuration. This flag is deprecated but will be still observed for backward compatibility.
  3. Will migrate schemas to engine v. 10. this is irreversible

Internal or obscure features:

  1. A few column hints (foregin_key and index) that were not documented and have no real use, will be removed.
  2. if primary key was used in nested table, linking was not created in relational.py. now linking is skipped when nested row is fitted into table that is not nested (does not have a parent). a rare case of someone that does not want dlt linking
  3. removes generate_dlt_id from json relational normalizer config
  4. deprecates skip_complex_types in dlt Pydantic config, asks to use skip_nested_types
  5. if a list of resources is passed to run method, those will be evaluated in a single ad-hoc source. previously each resource was evaluated separately (serialized). creates a single source in extract for all resource instances passed as list #1535

Other features

@rudolfix rudolfix changed the title 1.0,0 announcement and release notes 1.0.0 announcement and release notes Sep 2, 2024
@rudolfix rudolfix added the sprint Marks group of tasks with core team focus at this moment label Sep 3, 2024
@rudolfix rudolfix moved this from Todo to In Progress in dlt core library Sep 3, 2024
@rudolfix rudolfix self-assigned this Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sprint Marks group of tasks with core team focus at this moment
Projects
Status: Done
Development

No branches or pull requests

1 participant