Getting ready for release 0.5.0

Blosc · Oct 12, 2022 · d067d3a · d067d3a
1 parent 17d7102
commit d067d3a
Show file tree

Hide file tree

Showing 7 changed files with 58 additions and 80 deletions.
diff --git a/ANNOUNCE.rst b/ANNOUNCE.rst
@@ -1,14 +1,10 @@
-Announcing python-blosc2 0.4.1
+Announcing python-blosc2 0.5.0
 ==============================
 
-This is a major release introducing new `pack_array2()` and
-`unpack_array2()` functions for packing NumPy arrays.
-Also, there are new `Scunk.to_cframe()` and `blosc2.from_cframe()`
-methods for serializing/deserialzing `SChunk` instances.
-
-Finally, we have added new `Schunk.get_slice()`, `SChunk.__getitem__()`
-and `SChunk.__setitem__()` methods for getting/setting slices from/to
-`SChunk` instances.
+This is a major release introducing new `pack_tensor`, `unpack_tensor`,
+`save_tensor` and `load_tensor` functions for serializing/deserializing
+PyTorch and TensorFlow tensor objects.  They also understand NumPy arrays,
+so these are the new recommended ones for serialization.
 
 For more info, you can have a look at the release notes in:
 

diff --git a/RELEASE_NOTES.md b/RELEASE_NOTES.md
@@ -1,5 +1,16 @@
 # Release notes
 
+## Changes from 0.4.1 to 0.5.0
+
+* New `pack_tensor`, `unpack_tensor`, `save_tensor` and `load_tensor` functions for serializing/deserializing PyTorch and TensorFlow tensor objects.  They also understand NumPy arrays, so these are the new recommended ones for serialization.
+
+* ``pack_array2`` do not modify the value of a possible `cparams` parameter anymore.
+
+* The `pack_array2` / `save_array` have changed the serialization format to follow the new standard introduced in `pack_tensor`.  In the future `pack_array2` / `save_array` will probably be deprecated, so please change to `pack_tensor` / `save_tensor` as soon as you can.
+
+* The new 'standard' for serialization relies on using the '__pack_tensor__' attribute as a `vlmeta` (variable length) metalayer.
+
+
 ## Changes from 0.4.0 to 0.4.1
 
 * Add `msgpack` as a runtime requirement

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-0.4.1
+0.5.0
diff --git a/bench/pack_compress.py b/bench/pack_compress.py
@@ -16,7 +16,7 @@
 import blosc2
 
 
-NREP = 1
+NREP = 3
 N = int(2e8)
 Nexp = np.log10(N)
 

diff --git a/bench/pack_tensor.py b/bench/pack_tensor.py
@@ -19,7 +19,9 @@
 
 
 NREP = 1
-# N = int(4e8 - 2**27)  # larger than 2 GB
+# N = int(5e8 + 2**27)  # larger than 2 GB
+# Using tensors > 2 GB makes tensorflow serialization to raise this error:
+# [libprotobuf FATAL google/protobuf/io/coded_stream.cc:831] CHECK failed: overrun <= kSlopBytes:
 N = int(1e8)
 
 store = True
@@ -78,8 +80,6 @@
     c = None
     ctic = time.time()
     for i in range(NREP):
-        # _in = np.asarray(memoryview(tt))
-        # c = blosc2.pack_array2(_in, cparams=cparams)
         c = blosc2.pack_tensor(in_, cparams=cparams)
     ctoc = time.time()
     tc = (ctoc - ctic) / NREP
@@ -96,8 +96,6 @@
     c = None
     ctic = time.time()
     for i in range(NREP):
-        #_in = np.asarray(th)
-        #c = blosc2.pack_array2(_in, cparams=cparams)
         c = blosc2.pack_tensor(in_, cparams=cparams)
     ctoc = time.time()
     tc = (ctoc - ctic) / NREP

diff --git a/examples/slicing_and_beyond.ipynb b/examples/slicing_and_beyond.ipynb
@@ -102,7 +102,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "That's much better!\n",
+    "That looks better!\n",
     "\n",
     "## Setting data in a SChunk\n",
     "\n",
@@ -194,6 +194,7 @@
     "be mindful that you will have to keep a reference to it until you do not\n",
     "want the SChunk anymore.\n",
     "\n",
+    "\n",
     "## Serializing NumPy arrays\n",
     "\n",
     "If what you want is to create a serialized, compressed version of a NumPy array, you can use the newer (and faster) functions to store it either in-memory or on-disk.  The specification of such a contiguous compressed representation, aka **cframe** can be seen at: https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst.\n",
@@ -232,10 +233,7 @@
    "source": [
     "blosc2.save_array(np_array, urlpath=\"ondisk_array.b2frame\", mode=\"w\")\n",
     "np_array2 = blosc2.load_array(\"ondisk_array.b2frame\")\n",
-    "np.array_equal(np_array, np_array2)\n",
-    "\n",
-    "# Remove it from disk\n",
-    "blosc2.remove_urlpath(\"ondisk_array.b2frame\")"
+    "np.array_equal(np_array, np_array2)"
    ]
   },
   {

diff --git a/examples/tutorial-basics.ipynb b/examples/tutorial-basics.ipynb
@@ -47,6 +47,7 @@
     "storage = {\n",
     "    \"contiguous\": True,\n",
     "    \"urlpath\": \"myfile.b2frame\",\n",
+    "    \"mode\": \"w\",  # create a file anew\n",
     "    \"cparams\": cparams,\n",
     "    \"dparams\": dparams,\n",
     "}"
@@ -56,37 +57,19 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "And let's remove a possible existing serialized super-chunk (frame):"
+    "Now, we can already create a SChunk instance:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 3,
    "metadata": {},
-   "outputs": [],
-   "source": [
-    "blosc2.remove_urlpath(\"myfile.b2frame\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now, we can already create a SChunk!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
    "outputs": [
     {
      "data": {
-      "text/plain": [
-       "<blosc2.SChunk.SChunk at 0x7f71c83a67c0>"
-      ]
+      "text/plain": "<blosc2.SChunk.SChunk at 0x110def6c0>"
      },
-     "execution_count": 4,
+     "execution_count": 3,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -100,7 +83,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Great! So you have created your first super-chunk with your desired compression codec and typesize, and it is going to be persistent on disk."
+    "Great! So you have created your first super-chunk with your desired compression codec and typesize, that is going to be persistent on-disk."
    ]
   },
   {
@@ -114,7 +97,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -123,15 +106,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 5,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 355 ms, sys: 0 ns, total: 355 ms\n",
-      "Wall time: 69.6 ms\n"
+      "CPU times: user 312 ms, sys: 790 ms, total: 1.1 s\n",
+      "Wall time: 333 ms\n"
      ]
     }
    ],
@@ -144,14 +127,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "-rw-rw-r-- 1 faltet2 faltet2 11M jun 28 19:03 myfile.b2frame\r\n"
+      "-rw-r--r--  1 francesc  staff    10M Oct  3 18:29 myfile.b2frame\r\n"
      ]
     }
    ],
@@ -170,7 +153,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 7,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -179,15 +162,15 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 9,
+   "execution_count": 8,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 65.9 ms, sys: 82 ms, total: 148 ms\n",
-      "Wall time: 39.5 ms\n"
+      "CPU times: user 200 ms, sys: 65.8 ms, total: 266 ms\n",
+      "Wall time: 77 ms\n"
      ]
     }
    ],
@@ -199,7 +182,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 10,
+   "execution_count": 9,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -218,7 +201,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
+   "execution_count": 10,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -228,24 +211,22 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 11,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 258 µs, sys: 348 µs, total: 606 µs\n",
-      "Wall time: 351 µs\n"
+      "CPU times: user 288 µs, sys: 839 µs, total: 1.13 ms\n",
+      "Wall time: 1.41 ms\n"
      ]
     },
     {
      "data": {
-      "text/plain": [
-       "100"
-      ]
+      "text/plain": "100"
      },
-     "execution_count": 12,
+     "execution_count": 11,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -264,24 +245,22 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 13,
+   "execution_count": 12,
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 116 µs, sys: 158 µs, total: 274 µs\n",
-      "Wall time: 173 µs\n"
+      "CPU times: user 280 µs, sys: 570 µs, total: 850 µs\n",
+      "Wall time: 687 µs\n"
      ]
     },
     {
      "data": {
-      "text/plain": [
-       "101"
-      ]
+      "text/plain": "101"
      },
-     "execution_count": 13,
+     "execution_count": 12,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -311,16 +290,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 13,
    "metadata": {},
    "outputs": [
     {
      "data": {
-      "text/plain": [
-       "{'info1': 'This is an example', 'info2': 'of user meta handling'}"
-      ]
+      "text/plain": "{b'info1': 'This is an example', b'info2': 'of user meta handling'}"
      },
-     "execution_count": 14,
+     "execution_count": 13,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -340,16 +317,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 14,
    "metadata": {},
    "outputs": [
     {
      "data": {
-      "text/plain": [
-       "{'info2': 'of user meta handling'}"
-      ]
+      "text/plain": "{b'info2': 'of user meta handling'}"
      },
-     "execution_count": 15,
+     "execution_count": 14,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -363,7 +338,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "That's all for now.  There are more examples in the examples directory for you to explore.  Enjoy!"
+    "That's all for now.  There are more examples in the `examples/` directory for you to explore.  Enjoy!"
    ]
   }
  ],