@@ -13,8 +13,6 @@ In this tutorial, we describe
13
13
- What is ``RaggedShape ``?
14
14
- What is ``row_splits `` ?
15
15
- What is ``row_ids `` ?
16
- - What is ``dim0 `` ?
17
- - What is ``tot_size `` ?
18
16
19
17
What are ragged tensors?
20
18
------------------------
@@ -29,7 +27,7 @@ tensors, i.e., regular tensors, look like.
29
27
:lines: 8-20
30
28
31
29
The shape of the 2-D regular tensor ``a `` is ``(3, 4) ``, meaning it has 3
32
- rows and 4 columns. Each row has **exactly ** 4 elements, no more, no less .
30
+ rows and 4 columns. Each row has **exactly ** 4 elements.
33
31
34
32
- 3-D regular tensors
35
33
@@ -38,8 +36,8 @@ tensors, i.e., regular tensors, look like.
38
36
:lines: 24-45
39
37
40
38
The shape of the 3-D regular tensor ``b `` is ``(3, 3, 2) ``, meaning it has
41
- 3 planes. Each plane has **exactly ** 3 rows, no more, no less. Each row has
42
- ** exactly ** two entries, no more, no less.
39
+ 3 planes. Each plane has **exactly ** 3 rows and each row has ** exactly ** two
40
+ entries
43
41
44
42
- N-D regular tensors (N >= 4)
45
43
@@ -89,7 +87,7 @@ tensors in ``k2``.
89
87
A ragged tensor in ``k2 `` has ``N `` (``N >= 2 ``) axes. Unlike regular tensors,
90
88
each axis of a ragged tensor can have different number of elements.
91
89
92
- Ragged tensors are **the most important ** data structures in ``k2 ``. FSAs are
90
+ Ragged tensors are **the most important ** data structure in ``k2 ``. FSAs are
93
91
represented as ragged tensors. There are also various operations defined on ragged
94
92
tensors.
95
93
@@ -113,7 +111,7 @@ Exercise 1
113
111
- Row 1 is empty, i.e., it has no elements.
114
112
- Row 2 has two elements: ``-1.5, 2 ``
115
113
116
- (Click ▶ to see it )
114
+ (Click ▶ to view the solution )
117
115
118
116
.. literalinclude :: code/basics/ragged-tensors.py
119
117
:language: python
@@ -130,11 +128,34 @@ Exercise 2
130
128
131
129
How to create a ragged tensor with only 1 axis?
132
130
133
- (Click ▶ to see it )
131
+ (Click ▶ to view the solution )
134
132
135
133
You **cannot ** create a ragged tensor with only 1 axis. Ragged tensors
136
134
in ``k2 `` have at least 2 axes.
137
135
136
+ dtype and device
137
+ ^^^^^^^^^^^^^^^^
138
+
139
+ Like tensors in PyTorch. ragged tensors in ``k2 `` has attributes ``dtype `` and
140
+ ``device ``. The following code shows that you can specify the ``dtype `` and
141
+ ``device `` while constructing ragged tensors.
142
+
143
+ .. literalinclude :: code/basics/dtype-device.py
144
+ :language: python
145
+ :lines: 3-23
146
+
147
+ .. container :: toggle
148
+
149
+ .. container :: header
150
+
151
+ .. Note ::
152
+
153
+ (Click ▶ to view the output)
154
+
155
+ .. literalinclude :: code/basics/dtype-device.py
156
+ :language: python
157
+ :lines: 25-50
158
+
138
159
Concepts about ragged tensors
139
160
-----------------------------
140
161
@@ -144,18 +165,18 @@ A ragged tensor in ``k2`` consists of two parts:
144
165
145
166
.. Caution ::
146
167
147
- It is assumed that a shape within a ragged tensor in ``k2 `` is a constant.
168
+ It is assumed that a shape within a ragged tensor in ``k2 `` is a constant.
148
169
Once constructed, you are not expected to modify it. Otherwise, unexpected
149
170
things can happen; you will be SAD.
150
171
151
- - ``data ``, which is an **array ** of type ``T ``
172
+ - ``values ``, which is an **array ** of type ``T ``
152
173
153
174
.. Hint ::
154
175
155
- ``data `` is stored ``contiguously `` in memory, whose entries have to be
176
+ ``values `` is stored ``contiguously `` in memory, whose entries have to be
156
177
of the same type ``T ``. ``T `` can be either primitive types, such as
157
178
``int ``, ``float ``, and ``double `` or can be user defined types. For instance,
158
- ``data `` in FSAs contains ``arcs ``, which is defined in C++
179
+ ``values `` in FSAs contains ``arcs ``, which is defined in C++
159
180
`as follows <https://github.com/k2-fsa/k2/blob/master/k2/csrc/fsa.h#L31 >`_:
160
181
161
182
.. code-block :: c++
@@ -167,8 +188,83 @@ A ragged tensor in ``k2`` consists of two parts:
167
188
float score;
168
189
}
169
190
170
- In the following, we describe what is inside a ``shape `` and how to manipulate
171
- ``data ``.
191
+ Before explaining what ``shape `` and ``values `` contain, let us look at an example of
192
+ how to use a ragged tensor to represent the following
193
+ FSA (see :numref: `ragged_basics_simple_fsa_1 `).
194
+
195
+ .. _ragged_basics_simple_fsa_1 :
196
+ .. figure :: code/basics/images/simple-fsa.svg
197
+ :alt: A simple FSA
198
+ :align: center
199
+ :figwidth: 600px
200
+
201
+ An simple FSA that is to be represented by a ragged tensor.
202
+
203
+ The FSA in :numref: `ragged_basics_simple_fsa_1 ` has 3 arcs and 3 states.
204
+
205
+ +---------+--------------------+--------------------+--------------------+--------------------+
206
+ | | src_state | dst_state | label | score |
207
+ +---------+--------------------+--------------------+--------------------+--------------------+
208
+ | Arc 0 | 0 | 1 | 1 | 0.1 |
209
+ +---------+--------------------+--------------------+--------------------+--------------------+
210
+ | Arc 1 | 0 | 1 | 2 | 0.2 |
211
+ +---------+--------------------+--------------------+--------------------+--------------------+
212
+ | Arc 2 | 1 | 2 | -1 | 0.3 |
213
+ +---------+--------------------+--------------------+--------------------+--------------------+
214
+
215
+ When the above FSA is saved in a ragged tensor, its arcs are saved in a 1-D contiguous
216
+ ``values `` array containing ``[Arc0, Arc1, Arc2] ``.
217
+ At this point, you might ask:
218
+
219
+ - As we can construct the original FSA by using the ``values `` array,
220
+ what's the point of saving it in a ragged tensor?
221
+
222
+ Using the ``values `` array alone is not possible to answer the following questions in ``O(1) ``
223
+ time:
224
+
225
+ - How many states does the FSA have ?
226
+ - How many arcs does each state have ?
227
+ - Where do the arcs belonging to state 0 start in the ``values `` array ?
228
+
229
+ To handle the above questions, we introduce another 1-D array, called ``row_splits ``.
230
+ ``row_splits[s] = p `` means for state ``s `` its first outgoing arc starts at position
231
+ ``p `` in the ``values `` array. As a side effect, it also indicates that the last outgoing
232
+ arc for state ``s-1 `` ends at position ``p `` (exclusive) in the ``values `` array.
233
+
234
+ In our example, ``row_splits `` would be ``[0, 2, 3, 3] ``, meaning:
235
+
236
+ - The first outgoing arc for state 0 is at position ``row_splits[0] = 0 ``
237
+ in the ``values `` array
238
+ - State 0 has ``row_splits[1] - row_splits[0] = 2 - 0 = 2 `` arcs
239
+ - The first outgoing arc for state 1 is at position ``row_splits[1] = 2 ``
240
+ in the ``values `` array
241
+ - State 1 has ``row_splits[2] - row_splits[1] = 3 - 2 = 1 `` arc
242
+ - State 2 has no arcs since ``row_splits[3] - row_splits[2] = 3 - 3 = 0 ``
243
+ - The FSA has ``len(row_splits) - 1 = 3 `` states.
244
+
245
+ We can construct a ``RaggedShape `` from a ``row_splits `` array:
246
+
247
+ .. literalinclude :: code/basics/ragged_shape_1.py
248
+ :language: python
249
+ :lines: 3-14
250
+
251
+ Pay attention to the string form of the shape ``[ [x x] [x] [ ] ] ``.
252
+ ``x `` means we don't care about the actual content inside a ragged tensor.
253
+ The above shape has 2 axes, 3 rows, and 3 elements. Row 0 has two elements as there
254
+ are two ``x `` inside the 0th ``[] ``. Row 1 has only one element, while
255
+ row 2 has no elements at all. We can assign names to the axes. In our case,
256
+ we say the shape has axes ``[state][arc] ``.
257
+
258
+ Combining the ragged shape and the ``values `` array, the above FSA can
259
+ be represented using a ragged tensor ``[ [Arc0 Arc1] [Arc2] [ ] ] ``.
260
+
261
+ The following code displays the string from of the above FSA when represented
262
+ as a ragged tensor in k2:
263
+
264
+ .. literalinclude :: code/basics/single-fsa.py
265
+ :language: python
266
+ :lines: 2-14
267
+
172
268
173
269
Shape
174
270
^^^^^
0 commit comments