Skip to content

Commit

Permalink
Getting ready for release 0.5.0
Browse files Browse the repository at this point in the history
  • Loading branch information
FrancescAlted committed Oct 12, 2022
1 parent 17d7102 commit d067d3a
Show file tree
Hide file tree
Showing 7 changed files with 58 additions and 80 deletions.
14 changes: 5 additions & 9 deletions ANNOUNCE.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,10 @@
Announcing python-blosc2 0.4.1
Announcing python-blosc2 0.5.0
==============================

This is a major release introducing new `pack_array2()` and
`unpack_array2()` functions for packing NumPy arrays.
Also, there are new `Scunk.to_cframe()` and `blosc2.from_cframe()`
methods for serializing/deserialzing `SChunk` instances.

Finally, we have added new `Schunk.get_slice()`, `SChunk.__getitem__()`
and `SChunk.__setitem__()` methods for getting/setting slices from/to
`SChunk` instances.
This is a major release introducing new `pack_tensor`, `unpack_tensor`,
`save_tensor` and `load_tensor` functions for serializing/deserializing
PyTorch and TensorFlow tensor objects. They also understand NumPy arrays,
so these are the new recommended ones for serialization.

For more info, you can have a look at the release notes in:

Expand Down
11 changes: 11 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# Release notes

## Changes from 0.4.1 to 0.5.0

* New `pack_tensor`, `unpack_tensor`, `save_tensor` and `load_tensor` functions for serializing/deserializing PyTorch and TensorFlow tensor objects. They also understand NumPy arrays, so these are the new recommended ones for serialization.

* ``pack_array2`` do not modify the value of a possible `cparams` parameter anymore.

* The `pack_array2` / `save_array` have changed the serialization format to follow the new standard introduced in `pack_tensor`. In the future `pack_array2` / `save_array` will probably be deprecated, so please change to `pack_tensor` / `save_tensor` as soon as you can.

* The new 'standard' for serialization relies on using the '__pack_tensor__' attribute as a `vlmeta` (variable length) metalayer.


## Changes from 0.4.0 to 0.4.1

* Add `msgpack` as a runtime requirement
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.4.1
0.5.0
2 changes: 1 addition & 1 deletion bench/pack_compress.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
import blosc2


NREP = 1
NREP = 3
N = int(2e8)
Nexp = np.log10(N)

Expand Down
8 changes: 3 additions & 5 deletions bench/pack_tensor.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,9 @@


NREP = 1
# N = int(4e8 - 2**27) # larger than 2 GB
# N = int(5e8 + 2**27) # larger than 2 GB
# Using tensors > 2 GB makes tensorflow serialization to raise this error:
# [libprotobuf FATAL google/protobuf/io/coded_stream.cc:831] CHECK failed: overrun <= kSlopBytes:
N = int(1e8)

store = True
Expand Down Expand Up @@ -78,8 +80,6 @@
c = None
ctic = time.time()
for i in range(NREP):
# _in = np.asarray(memoryview(tt))
# c = blosc2.pack_array2(_in, cparams=cparams)
c = blosc2.pack_tensor(in_, cparams=cparams)
ctoc = time.time()
tc = (ctoc - ctic) / NREP
Expand All @@ -96,8 +96,6 @@
c = None
ctic = time.time()
for i in range(NREP):
#_in = np.asarray(th)
#c = blosc2.pack_array2(_in, cparams=cparams)
c = blosc2.pack_tensor(in_, cparams=cparams)
ctoc = time.time()
tc = (ctoc - ctic) / NREP
Expand Down
8 changes: 3 additions & 5 deletions examples/slicing_and_beyond.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"That's much better!\n",
"That looks better!\n",
"\n",
"## Setting data in a SChunk\n",
"\n",
Expand Down Expand Up @@ -194,6 +194,7 @@
"be mindful that you will have to keep a reference to it until you do not\n",
"want the SChunk anymore.\n",
"\n",
"\n",
"## Serializing NumPy arrays\n",
"\n",
"If what you want is to create a serialized, compressed version of a NumPy array, you can use the newer (and faster) functions to store it either in-memory or on-disk. The specification of such a contiguous compressed representation, aka **cframe** can be seen at: https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst.\n",
Expand Down Expand Up @@ -232,10 +233,7 @@
"source": [
"blosc2.save_array(np_array, urlpath=\"ondisk_array.b2frame\", mode=\"w\")\n",
"np_array2 = blosc2.load_array(\"ondisk_array.b2frame\")\n",
"np.array_equal(np_array, np_array2)\n",
"\n",
"# Remove it from disk\n",
"blosc2.remove_urlpath(\"ondisk_array.b2frame\")"
"np.array_equal(np_array, np_array2)"
]
},
{
Expand Down
93 changes: 34 additions & 59 deletions examples/tutorial-basics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@
"storage = {\n",
" \"contiguous\": True,\n",
" \"urlpath\": \"myfile.b2frame\",\n",
" \"mode\": \"w\", # create a file anew\n",
" \"cparams\": cparams,\n",
" \"dparams\": dparams,\n",
"}"
Expand All @@ -56,37 +57,19 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"And let's remove a possible existing serialized super-chunk (frame):"
"Now, we can already create a SChunk instance:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"blosc2.remove_urlpath(\"myfile.b2frame\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we can already create a SChunk!"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<blosc2.SChunk.SChunk at 0x7f71c83a67c0>"
]
"text/plain": "<blosc2.SChunk.SChunk at 0x110def6c0>"
},
"execution_count": 4,
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -100,7 +83,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Great! So you have created your first super-chunk with your desired compression codec and typesize, and it is going to be persistent on disk."
"Great! So you have created your first super-chunk with your desired compression codec and typesize, that is going to be persistent on-disk."
]
},
{
Expand All @@ -114,7 +97,7 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -123,15 +106,15 @@
},
{
"cell_type": "code",
"execution_count": 6,
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 355 ms, sys: 0 ns, total: 355 ms\n",
"Wall time: 69.6 ms\n"
"CPU times: user 312 ms, sys: 790 ms, total: 1.1 s\n",
"Wall time: 333 ms\n"
]
}
],
Expand All @@ -144,14 +127,14 @@
},
{
"cell_type": "code",
"execution_count": 7,
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-rw-rw-r-- 1 faltet2 faltet2 11M jun 28 19:03 myfile.b2frame\r\n"
"-rw-r--r-- 1 francesc staff 10M Oct 3 18:29 myfile.b2frame\r\n"
]
}
],
Expand All @@ -170,7 +153,7 @@
},
{
"cell_type": "code",
"execution_count": 8,
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -179,15 +162,15 @@
},
{
"cell_type": "code",
"execution_count": 9,
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 65.9 ms, sys: 82 ms, total: 148 ms\n",
"Wall time: 39.5 ms\n"
"CPU times: user 200 ms, sys: 65.8 ms, total: 266 ms\n",
"Wall time: 77 ms\n"
]
}
],
Expand All @@ -199,7 +182,7 @@
},
{
"cell_type": "code",
"execution_count": 10,
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -218,7 +201,7 @@
},
{
"cell_type": "code",
"execution_count": 11,
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -228,24 +211,22 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 258 µs, sys: 348 µs, total: 606 µs\n",
"Wall time: 351 µs\n"
"CPU times: user 288 µs, sys: 839 µs, total: 1.13 ms\n",
"Wall time: 1.41 ms\n"
]
},
{
"data": {
"text/plain": [
"100"
]
"text/plain": "100"
},
"execution_count": 12,
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -264,24 +245,22 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 116 µs, sys: 158 µs, total: 274 µs\n",
"Wall time: 173 µs\n"
"CPU times: user 280 µs, sys: 570 µs, total: 850 µs\n",
"Wall time: 687 µs\n"
]
},
{
"data": {
"text/plain": [
"101"
]
"text/plain": "101"
},
"execution_count": 13,
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
Expand Down Expand Up @@ -311,16 +290,14 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'info1': 'This is an example', 'info2': 'of user meta handling'}"
]
"text/plain": "{b'info1': 'This is an example', b'info2': 'of user meta handling'}"
},
"execution_count": 14,
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -340,16 +317,14 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'info2': 'of user meta handling'}"
]
"text/plain": "{b'info2': 'of user meta handling'}"
},
"execution_count": 15,
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
Expand All @@ -363,7 +338,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"That's all for now. There are more examples in the examples directory for you to explore. Enjoy!"
"That's all for now. There are more examples in the `examples/` directory for you to explore. Enjoy!"
]
}
],
Expand Down

0 comments on commit d067d3a

Please sign in to comment.