Skip to content

Releases: man-group/ArcticDB

v4.3.0

07 Feb 13:47
Compare
Choose a tag to compare

Version 4.3.0 was pulled from PyPi and Conda Forge due to a regression. We no longer provide builds for 4.3.0.
Regression is fixed in 4.3.1 release. Please use 4.3.1 instead.

🚀 Features

  • Exposes existing regex filter in lib.list_symbols (#1123)
>>> from arcticdb import Arctic
>>> import pandas as pd
>>> ac.create_library("test")
>>> lib = ac["test"]
>>> lib.write("sym0", pd.DataFrame())
>>> lib.write("sym1", pd.DataFrame())
>>> lib.list_symbols()
['sym0', 'sym1']
>>> lib.list_symbols(regex="1$")
['sym1']
  • Introduce jitter in symbol list compaction threshold (#1174)
  • Sorting speed improvements in SegmentInMemory (#1181)
  • Reduce log level from warn to debug for "Failed to find segment for key" message where appropriate (#1130)
  • Speed up writes by parellising aggregator_set_data over data segments (#1065)
  • Support sortedness checks and maintenance with parallel writes and appends (#1251)
  • #1014 Introduce storage fixtures to easily test ArcticDB against various storage backends. See arcticdb.storage_fixtures package. (#1054)

🐛 Fixes

  • Release the symbol list's storage lock if it has existed for longer than its TTL (#1134)
  • Ensure that the version chain is always updated atomically (#1104)
  • Return empty pd.DataFrame with MultiIndex if originally provided (#1126)
  • conda-build: Explicitly depend on openssl and libcurl (#1244)
  • Reintroduce attrs as a runtime dependency (#1272)
  • Speedup reading wide dataframes that have no empty columns (#1225)
  • Bugfix 1046: Prevent appending/updating numeric columns with non-identical types with static schema (#1205)
  • Bugfix 1173: Correctly apply sortedness checks when calling update with date_range argument (#1238)
  • Fix non-deterministic hashing in Linux conda builds (#1261)
  • Improve date range returned by get info for unordered and range indexed dataframes (#1241)
  • Bugfix 1248 and 1249: compact_incomplete reject incomplete segments that overlap each other, or existing segments in the case of append (#1255)
  • Detailed error in case of S3's libcurl network failure (#1265)
Uncategorized
  • [Aggregation tests] Replace non_zero_numeric_type_strategies with numeric_type_strategies (#968)
  • Fixes reuse_name for azure storage #1061 (#1115)
  • small getting-started-docs tweaks (#1103)
  • Improve fixture reliability (#1116)
  • maint: Define arcticdb::proto::logger in log.hpp (#1117)
  • maint: Remove unneeded includes (#1113)
  • [Column] Move some definitions to cpp file (#1100)
  • maint: Move implementations in memory_segment_impl.hpp to memory_segment_impl.cpp (#1092)
  • Update git blame file (#1118)
  • Flaky test hypothesis mean agg (#496) (#1125)
  • Use same region for S3 and EC2 to avoid data transfer costs (#1128)
  • build: Remove attrs from the dependencies (#1135)
  • Only build on pull request events (#1127)
  • More fixture robustness improvements (#1132)
  • Remove releasing docs as they are now in GitHub wiki (#1136)
  • Update README.md (#1141)
  • Remove test parellism, and speed up test bottleneck (#1143)
  • Fix support for shared/unique S3 prefixes (#1140)
  • maint: Remove headers in types.hpp (#1121)
  • Skip flaky pytests which check log messages (#1161)
  • Update README.md (#1156)
  • Refactor: Move DataError method implementaitons into cpp (#1155)
  • Update .git-blame-ignore-revs for DataError implementation move (#1165)
  • Add MSVC 2022 preset. Tweak MSVC CMake settings. (#1133)
  • Build-time improvments: allocator.hpp, log.hpp, buffer.hpp (#1152)
  • Fix publish.yml workflow (#1167)
  • README - put third party tools in alphabetical order (#1172)
  • Fix persistent tests (#1147)
  • Introduce sorting and merging google benchmarks (#1138)
  • Skip array type tests due to occsional segfaults (#1187)
  • build: Remove some adherence to folly (#1144)
  • Add equity options notebook + data (#1178)
  • maint: Ignore some references (#1190)
  • Added equity options notebook to index (#1193)
  • Use vcpkg for gbench (#1189)
  • Forward port internal PR #1082 (#1180)
  • Bugfix 1191: Propagate storage failures in version map batch methods to calling code (#1194)
  • Link against python explicitly in order to make MSVS builds work (#1192)
  • Final version of equity opts notebook (#1196)
  • Bugfix 1182: Unskip test that is no longer flaky (#1197)
  • Docs that StorageFailureSimulator is not used in all stores (#1203)
  • Clean and reorganize OffsetString and StringPool (#1137)
  • build: Do not depend on protobuf-lite (#1212)
  • docs: Fix documentation links (#1038)
  • Fix recurse_segment forward declaration to match the signature of its implementation (#1217)
  • Update git blame file for OffsetString and StringPool implementation move (#1211)
  • Add frequently used items at the top level of arcticdb (#1219)
  • Switch from arcticdb to adb in the demo Notebooks (#1228)
  • Add a way to handle non-string values for index names (#1170)
  • Pass unmodified argument by const& to FieldCollection::add_field (#1234)
  • Switch from arcticdb to adb in python docstrings (#1236)
  • Remove obsolete test log level environment variable (#1231)
  • Update incorrect docs for validate_index (#1233)
  • Bugfix 1207: Use pandas.Timestamp.max - 1 day in test_read_ts. Remove pointless snapshot. Improve error message when index key reading fails. (#1235)
  • Bugfix invalid library name (#1206)
  • Enhancement/1253/skip temporary allocation when decoding dynamic schema columns (#1259)
  • Expose headers for consumers via arcticdb_core_static (#1257)
  • Update WarnVersionTypeNotHandled::warn() warning message (#1273)
  • Update README correcting spelling (#1275)
  • build: Adapt protobuf compilation (#1199)
  • Enable skipped test_partial_write_hashed (#1215)

The wheels are on PyPI. Below are for debugging:

v4.2.1

15 Dec 14:23
Compare
Choose a tag to compare

This is a patch release to version 4.2.0 which fixes Issue #1157 regarding the defragment_symbol_data method.

🐛 Fixes

  • Defragmenting a symbol no longer invalidates previous versions (#1163)

The wheels are on Pypi. Below are for debugging:

v4.2.0

12 Dec 14:08
Compare
Choose a tag to compare

🚀 Features

  • Remove python deps that are no longer needed (#1005)
  • New row_range argument on read and ReadRequest (#864)
>>> from arcticdb import Arctic
>>> import pandas as pd
>>> df = pd.DataFrame({"col1": np.arange(10), "col2": np.arange(100, 110)}, index=np.arange(10))
>>> ac = Arctic("lmdb://test")
>>> lib = ac.get_library("test_lib", create_if_missing=True)
>>> lib.write("test_symbol", df)
>>> lib.read("test_symbol", row_range=(3,7)).data
   col1  col2
3     3   103
4     4   104
5     5   105
6     6   106

🐛 Fixes

  • Symbol list refactor (#796)
  • Fixed aggregation on sparse grouping columns (#1068).
    Depending on timestamps being accurate in the symbol list has proved to be troublesome. Instead, we should use the most recent version id known to a client as an indication of the client's view of the world at the time as symbol list entry is written. That way, we can identify and correct symbol list entries that refer to conflicting writes.

Notebooks

  • Added AWS blockchain notebook (#1040)
  • Added AWS blockchain to docs index (#1043)
  • Add Snapshot + Equity Notebooks (#1071)
Uncategorized
  • 744 extend real storage tests to run with large lifelike data and all api methods (#989)
  • Update BSL table for 4.1 (#1023)
  • Centralise the pytest marks (#1024)
  • Document the S3 backends that we have tested against and "un-beta" LMDB on Windows (#1016)
  • Sparse aggregation (#1007)
  • Docs versioning (#1008)
  • set-default after deploy so that 'latest' alias can be created first (#1029)
  • build: Remove old Cython configuration and adaptation (#1028)
  • Docs workflow fixes. (#1030)
  • build: Replace emilib with robin_hood (#995)
  • hot-fix: Use previous build of libmongocxx to avoid missing symbols (#1050)
  • Remove unused C++ Wangle dep (#1047)
  • Bugfix 1055: Unflake test_read_batch_time_stamp (#1058)
  • Change tag format in docs build (#1062)
  • Snapshot notebook typos (#1088)
  • Update vcpkg dep (#1091)
  • Enhancement/732/processing unit ecs model (#960)
  • Add xfail to flaky tests (#1087)
  • Add a mechanism to extend storage transaction lifetime to lifetime of… (#975)
  • Tweak release docs (#1019)
  • Change tmpdir to tmp_path (#1093)
  • Remove unnecessary xfails (#1097)
  • Add checks to see whether we should be validating version entries during compaction (#1099)
  • 941 self hosted runners for ci (#997)
  • Make dependency of pymongo optional in running (#1027)
  • Add preliminary change for slowdown error test (#1064)
  • Add mutex to ensure only single thread at pybind->c++ layer (#973)
  • Issue #1017 Only warn if the "base" LMDB env is opened twice (#1022)
  • Fix run-cmake action (#1034)
  • Add a fallback to free GH runners, when there is a problem with the self-hosted ones (#1063)
  • fix: Empty column handling improvements (#1049)
  • Pin all our Github actions deps (#1090)

The wheels are on Pypi. Below are for debugging:

v4.2.0rc0

13 Dec 18:01
Compare
Choose a tag to compare
v4.2.0rc0 Pre-release
Pre-release

🚀 Features

  • Remove useless python deps (#1005)
  • feat: Allow row_range to be treated as a clause (#864)
>>> from arcticdb import Arctic
>>> import pandas as pd
>>> df = pd.DataFrame({"col1": np.arange(10), "col2": np.arange(100, 110)}, index=np.arange(10))
>>> ac = Arctic("lmdb://test")
>>> lib = ac.get_library("test_lib", create_if_missing=True)
>>> lib.write("test_symbol", df)
>>> lib.read("test_symbol", row_range=(3,7)).data
   col1  col2
3     3   103
4     4   104
5     5   105
6     6   106

🐛 Fixes

  • Symbol list refactor (#796)
  • Fixed aggregation on sparse grouping columns (#1068).
    Depending on timestamps being accurate in the symbol list has proved to be troublesome. Instead, we should use the most recent version id known to a client as an indication of the client's view of the world at the time as symbol list entry is written. That way, we can identify and correct symbol list entries that refer to conflicting writes.

Notebooks

  • Added AWS blockchain notebook (#1040)
  • Added AWS blockchain to docs index (#1043)
  • Add Snapshot + Equity Notebooks (#1071)
Uncategorized
  • 744 extend real storage tests to run with large lifelike data and all api methods (#989)
  • Update BSL table for 4.1 (#1023)
  • Centralise the pytest marks (#1024)
  • Document the S3 backends that we have tested against and "un-beta" LMDB on Windows (#1016)
  • Sparse aggregation (#1007)
  • Docs versioning (#1008)
  • set-default after deploy so that 'latest' alias can be created first (#1029)
  • build: Remove old Cython configuration and adaptation (#1028)
  • Docs workflow fixes. (#1030)
  • build: Replace emilib with robin_hood (#995)
  • hot-fix: Use previous build of libmongocxx to avoid missing symbols (#1050)
  • Remove unused C++ Wangle dep (#1047)
  • Bugfix 1055: Unflake test_read_batch_time_stamp (#1058)
  • Change tag format in docs build (#1062)
  • Snapshot notebook typos (#1088)
  • Update vcpkg dep (#1091)
  • Enhancement/732/processing unit ecs model (#960)
  • Add xfail to flaky tests (#1087)
  • Add a mechanism to extend storage transaction lifetime to lifetime of… (#975)
  • Tweak release docs (#1019)
  • Change tmpdir to tmp_path (#1093)
  • Remove unnecessary xfails (#1097)
  • Add checks to see whether we should be validating version entries during compaction (#1099)
  • 941 self hosted runners for ci (#997)
  • Make dependency of pymongo optional in running (#1027)
  • Add preliminary change for slowdown error test (#1064)
  • Add mutex to ensure only single thread at pybind->c++ layer (#973)
  • Issue #1017 Only warn if the "base" LMDB env is opened twice (#1022)
  • Fix run-cmake action (#1034)
  • Add a fallback to free GH runners, when there is a problem with the self-hosted ones (#1063)
  • fix: Empty column handling improvements (#1049)
  • Pin all our Github actions deps (#1090)

The wheels are on Pypi. Below are for debugging:

v4.0.3

23 Nov 16:17
Compare
Choose a tag to compare

This is a patch release to version 4.0 that backports some changes from master.

🚀 Features

  • Add preliminary change for slowdown error test (#1070)

🐛 Fixes

  • Empty column handling improvements (#1079)
  • Use previous build of libmongocxx to avoid missing symbols (#1083)
  • Remove docs publish step so we don't overwrite the docs (#1025)
  • Remove Black and pre-commit setup (#1085)

The wheels are on Pypi. Below are for debugging:

v4.1.0

01 Nov 17:54
Compare
Choose a tag to compare

⭐ New APIs

In-memory Backend

You can now open ArcticDB with an in-memory backend,

from arcticdb import Arctic
ac = Arctic("mem://")
ac.create_library("test")
assert ac.list_libraries() == ["test"]
# Create libraries as normal. Each `Arctic` object manages its own in-memory storage, so the lifetime
# of your libraries and data is the same as the lifetime of the `Arctic` instance that owns them.

ac2 = Arctic("mem://")
assert ac2.list_libraries() == []  # ac2 is backed by different memory to ac so the "test" library is not returned

Query Builder

We now support a new "count" aggregator. You can invoke it with:

q = QueryBuilder()
q = q.groupby("grouping_column").agg({"a": "count"})

⚠️ Breaking and API Changes

LMDB Backend

This release includes a fix for issue #850 : Ensure that LMDB libraries are readable after being moved to a different location. The fix means that LMDB libraries created with arcticdb>=4.1.0 will not be readable by older clients and those clients must update.

This is because the fix stops us from serializing the LMDB library path (instead we always prefer the one in the Arctic URI), but older clients still expect to see the LMDB path serialized. Older clients reading a new LMDB library will in fact ignore the path passed in to the Arctic constructor and instead read the current working directory.

When you exceed the LMDB map size, we now raise a custom exception arcticdb.exceptions.LmdbMapFullError that explains how to re-open LMDB with a larger map size, whereas previously we raised a less helpful arcticdb.exceptions.InternalException.

🚀 Features

  • Support count aggregator with groupby (#948)
  • Warning for LMDB when two Arctic instances open over the same storage (#1000)
  • Small LMDB Fixes: 2GiB map size for Windows, Validation before delete (#918)
  • Custom exception when LMDB map is full (#1006)
  • Memory backed API (#860)
  • Allow ampersand in symbol names (#900)
  • Add querybuilder notebook demo into the docs (#875)
  • Extended testing against real cloud storages (#789)
  • ASV Benchmarking published here (#962 #970)
  • Preparatory work for RocksDB backend (#945)

🐛 Fixes

  • Fix LZ4 decoding error issues that occurred with a mix of empty and non-empty columns (#964)
  • Performance improvement for read_batch when called with many symbols (4-5x improvement) (#870)
  • Convert semimap to switch which has resolved some segmentation fault issues (#912)
  • Upgrade cUrl to 8.4.0 (#977)
  • Cache open libraries in the LibraryManager (#990)
  • Enhancement 914: Improve error messaging when string column encoding fails due to the presence of a non-string object (#933)
  • Fix storage lock mutex implementation (#966)
  • Make azure sdk stick to winhttp if possible (#851)
  • Extra update checks (#539)
  • maint: Indicate the non-support of PyArrow (#882)
  • Add pymongo to the list of install dependencies (#891)
  • conda-build: Run python tests for macos-latest (#873)
  • fix: Change comparison in test_hypothesis_{sum,mean}_agg (#931)
Uncategorized
  • Fixes IFNDR issue due to mismatching inline/non-inline functions depending on the translation unit (fixes #943) (#949)
  • docs: Post 4.0.0 release documentation (#967)
  • Remove Black and pre-commit setup (#972)
  • Improve string writing performance (#969)
  • Skip docs in build.yml (#984)
  • Move api docs from sphinx to mkdocs (#897)
  • Document support for Mac on intel (#982)
  • Update PAT for publish to master (#988)
  • Fix ccache's non-existence exiting the workflow (#996)
  • Refactor get_descriptions lib methods to be more consistent (#994)
  • Fail docs build on sphix failure (#883)
  • docs: Better document publishing release candidates on conda-forge (#901)
  • conda-build: Unpin some dependencies (#888)
  • docs: Reword conda-forge section mentioning libevent-2.1.10 (#905)
  • Update readme to reflect supported/beta status of Windows PyPi/MacOS conda-forge builds (#907)
  • Skip flaky Mac test (#924)
  • docs: Improve section "Building using mamba and conda-forge" (#917)
  • fix interface in ManualClockVersionStore getter (#925)
  • docs: Add high-level documentation of abstractions (#628)
  • Update storage compatibility (#916)
  • Update copyright notice (#939)
  • Use new get_library argument in ArcticDB_demo_lmdb.ipynb (#930)

The wheels are on Pypi. Below are for debugging:

v4.0.2

02 Nov 12:51
Compare
Choose a tag to compare

This is a patch release to version 4.0 that backports some changes from master.

🐛 Fixes

Bugfix for a deadlock issue when using Python multithreading and batch_read (#1021)


The wheels are on Pypi. Below are for debugging:

v4.0.1

02 Nov 16:16
Compare
Choose a tag to compare

This is a patch release to version 4.0 that backports some fixes from master.

🚀 Features

  • Allow ampersand in symbol names (#952)

🐛 Fixes

  • Backport minor fixes to 4.0.x (#954), not user facing

The wheels are on Pypi. Below are for debugging:

v1.6.2

04 Oct 10:51
Compare
Choose a tag to compare

This release is a patch release, backporting bug-fixes to v1.6.1

🐛 Fixes

  • Fixed a bug in key data retrieval, which could lead to incorrect behavior and segmentation faults (#912 )

The wheels are on Pypi. Below are for debugging:

v4.0.0

27 Sep 07:37
Compare
Choose a tag to compare

⚠️ API changes

For Library.get_description_batch, Library.read_metadata_batch and Library.write_batch, a DataError object will now be returned in the position in the list returned corresponding to the symbol/version pair there was an issue reading/writing. Note this may require code changes to support the new error handling behaviour - as a result it is being considered a breaking change as described above.

  • get description batch method: method rationalisation (#814)
  • read metadata batch method: method rationalisation (#814)
  • Write batch method: method rationalisation (#814)

🚀 Features

  • Pandas 2.0 support (#343) (#540) (#804) (#846)
    • Modifications have been made to the normalisation and denormalisation processes for pandas.Series and pandas.DataFrame to match the new defaults in pandas 2.0.
    • Handling of 0-row DataFrames for improved correctness and usability.
    • Empty Column are now properly handled, especially regarding the change of defaults for empty collections for Pandas 1.X and Pandas 2.X.
    • Extended the tests to reflect changes in behaviour due to pandas 2.0's new defaults.
    • Please note, PyArrow remains unsupported in this integration.
  • conda-build: Bring support for Azure Blob Storage (#840) (#854) (#853) (#857)
  • Add uri support for mongodb (#761)
  • Code coverage analysis and report workflow (#783) (#784)
  • Add documentation with doxygen (#736)

🐛 Fixes

  • Update support status: Pandas DataFrame and Series backed by PyArrow are not supported (#882)
  • Added pymongo to the list of installation dependencies (#891)
  • Resolved dependency issues for the mergeability check step (#822)
  • Fixed issue where AWS authentication wasn't used, even though the option was enabled (#843)
  • Resolved issue of early read termination in 'has_symbol' (#836)
  • Test: Ensured that QueryBuilder is pickleable with all possible clauses (#861)
  • Fixed issue with the 'latest_only' option for the 'list_versions' method (#839)
  • Added the ability for users to specify LMDB map size in the Arctic URI (#811)
  • Fixed issue 767: Segfault in batch write with string columns has been resolved (#827)(874)
  • Renamed ArcticNativeNotYetImplemented in a way that maintains backward compatibility, to fix issue #774 (#821)
  • Modified Azure SDK to favour winhttp over libcurl on Windows for improved SSL verification (#851)
  • Updated the maximum batch size for Azure operations (#878)
Uncategorized
  • Maintenance: Added a minimal Security Policy (#823)
  • Fixed documentation following an exception renaming (#824)
  • Resolved issues in the publish step (#825)
  • Added documentation for setting LMDB map size (#826)
  • Incorporated notebooks into the documentation (#844)
  • Maintenance: Removed unused definitions from protocol buffers (#856)
  • Enhanced error handling to fail document build on Sphinx errors (#883)
  • Maintenance: Replaced deprecated ZSTD_getDecompressedSize function (#855)
  • Refactored non-functional library manager, addressing Issue #812 (#828)
  • Made minor improvements to the documentation (#841)
  • Improved handling of the deprecated S3 option "force_uri_lib_config" (#833)
  • Corrected the release date of version 3.0.0 in README.md (#858)

The wheels are on Pypi. Below are for debugging: