Skip to content
Draft
27 changes: 27 additions & 0 deletions docs/source/format/CanonicalExtensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -544,6 +544,33 @@ Primitive Type Mappings
| UUID extension type | UUID |
+----------------------+------------------------+

.. _timestamp_with_offset_extension:

Timestamp With Offset
=============
This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes.
This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP WITH TIME ZONE``, which is supported by multiple database engines.

* Extension name: ``arrow.timestamp_with_offset``.

* The storage type of the extension is a ``Struct`` with 2 fields, in order:

* ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).

* ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) representing the offset in minutes from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent east. Offsets range from -779 (-12:59) to +780 (+13:00).

* Extension type parameters:

This type does not have any parameters.

* Description of the serialization:

Extension metadata is an empty string.

.. note::

It is also *permissible* for the ``offset_minutes`` field to be dictionary-encoded or run-end-encoded.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to say anything about what types of dictionary keys are allowed? Like is it ok to use Int8/Int16/Int32 keys -- while Int32 probably doesn't make sense as it would make the column larger, it might be necessary in some cases to encode the full range of Int16 values 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interest of brevity of the spec, I advised @serramatutu to not have recommendations here. I would assume anyone deciding to dictionary-encode an int16 array is considering the gains (or not) that they would have from that encoding. We are allowing but not recommending complicated encoding of an already compact 16-bit-valued array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense


Community Extension Types
=========================

Expand Down