Skip to content

Commit bd5cea8

Browse files
committed
Finish fleshing out fold section.
Create opt LaTeX/MathJax command to represent optional types
1 parent eb0023d commit bd5cea8

File tree

7 files changed

+102
-51
lines changed

7 files changed

+102
-51
lines changed

doc/ch_conceptual_design/algorithms.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,11 +34,13 @@ For each of these cases, the data product itself remains immutable.
3434
A Python algorithm can receive a `phlex::handle` or a direct reference to the data product.
3535
There is no equivalent language support for read-only access, but it will be enforced where possible.
3636

37-
Whereas data products may to be copied, resources of type :cpp:`R` may not.
37+
Whereas data products may be copied, resources of type :cpp:`R` may not.
3838
The following types are therefore supported:
3939

4040
- :cpp:`R const&` — read-only access to a resource provided through a reference
4141
- :cpp:`R const*` — read-only access to a resource provided through a pointer
42+
- :cpp:`R&` — read-and-write access to a resource provided through a reference (if supported by resource)
43+
- :cpp:`R*` — read-and-write access to a resource provided through a pointer (if supported by resource)
4244

4345
Resources are described in more detail in :numref:`ch_conceptual_design/resources:Resources`.
4446

doc/ch_conceptual_design/hofs/observers.rst

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -32,17 +32,19 @@ Registration interface
3232
^^^^^^^^^^^^^^^^^^^^^^
3333

3434
The below shows how the :cpp:`histogram_hits` operator in :numref:`workflow` would be registered in C++.
35-
It uses the :cpp:`phlex::resource<histogramming>` interface to provide access to a putative histogramming resource (see :numref:`ch_conceptual_design/resources:Resources`).
35+
It uses the :cpp:`resource<histogramming>` interface to provide access to a putative histogramming resource (see :numref:`ch_conceptual_design/resources:Resources`).
3636

3737
.. code:: c++
3838

3939
class hits { ... };
40-
void histogram_hits(hits const&) { ... }
40+
void histogram_hits(hits const&, TH1F&) { ... }
4141

42-
PHLEX_REGISTER_ALGORITHMS(config)
42+
PHLEX_REGISTER_ALGORITHMS(m, config)
4343
{
44-
observe(histogram_hits, concurrency::unlimited)
45-
.sequence("GoodHits"_in("APA"), phlex::resource<histogramming>());
44+
auto h_resource = m.resource<histogramming>();
45+
46+
observe(histogram_hits, concurrency::serial)
47+
.sequence("GoodHits"_in("APA"), h_resource->make<TH1F>(...));
4648
}
4749

4850
.. rubric:: Footnotes

doc/ch_conceptual_design/hofs/partitioned_folds.rst

Lines changed: 82 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Partitioned Folds
77
+========================================================+======================================================================+========================+
88
| :math:`d = \pfold{f}{\textit{init}}{\textit{part}}\ c` | :math:`f: D \times C \rightarrow D` | :math:`|d| \le |c|` |
99
| +----------------------------------------------------------------------+ |
10-
| | :math:`\textit{init}: \one \rightarrow D` | |
10+
| | :math:`\textit{init}: \opt{\iset{d}} \rightarrow D` | |
1111
| +----------------------------------------------------------------------+ |
1212
| | :math:`\textit{part}: \{\iset{c}\} \rightarrow \mathbb{P}(\iset{c})` | |
1313
+--------------------------------------------------------+----------------------------------------------------------------------+------------------------+
@@ -20,93 +20,138 @@ As mentioned in :numref:`ch_preliminaries/functional_programming:Sequences of Da
2020
where the user-defined operation :math:`f` is applied repeatedly between an accumulated value (initialized by :math:`init`) and each element of the input sequence.
2121

2222
In a framework context, however, multiple fold results are often desired in the same program for the same kind of computation.
23-
For example, consider a program that processes :math:`n` runs, each of which contains spills, identified by the tuple :math:`(R\ i, S\ j)`.
24-
The user may wish to create one histogram per run that contains the track multiplicity per spill.
25-
Instead of creating a single fold result, we thus use a *partitioned fold*:
23+
Consider the workflow in :numref:`workflow`, which processes `Spill`\ s, identified by the index :math:`j` or, more specifically, the tuple :math:`(S\ j)`.
24+
Each `Spill` is unfolded into a sequence of `APA`\ s, which are identified by the pair of indices :math:`jk` or, more specifically, the tuple :math:`(S\ j, A\ k)`.
25+
The energies of the :cpp:`"GoodHits"` data products in :numref:`workflow` are summed across `APA`\ s per `Spill` using the :math:`\textit{fold(sum\_energy)}` node.
26+
27+
Instead of creating one fold result, we thus use a *partitioned fold* to create one summed energy data-product per `Spill`:
2628

2729
.. math::
2830
:no-wrap:
2931
3032
\begin{align*}
31-
[h_{(R\ 1)}&,\ \dots,\ h_{(R\ n)}] \\
32-
&= \pfold{\textit{fill}}{\textit{init}}{\textit{into\_runs}}\ [m_{(R\ 1, S\ 1)},\ m_{(R\ 1, S\ 2)},\ \dots,\ m_{(R\ n, S\ 1)},\ m_{(R\ n, S\ 2)},\ \dots]
33+
[E_{(S\ 1)}&,\ \dots,\ E_{(S\ n)}] \\
34+
&= \pfold{\textit{sum\_energy}}{\textit{init}}{\textit{into\_spills}}\ [hs_{(S\ 1,\ A\ 1)},\ hs_{(S\ 1,\ A\ 2)},\ \dots,\ hs_{(S\ n,\ A\ 1)},\ hs_{(S\ n,\ A\ 2)},\ \dots]
3335
\end{align*}
3436
35-
where :math:`h_{(R\ i)}` denotes the histogram for run :math:`i`, and :math:`m_{(R\ i,\ S\ j)}` is the track multiplicity for spill :math:`j` in run :math:`i`.
37+
where :math:`E_{(S\ j)}` denotes the summed good-hits energy for `Spill` :math:`j`, and :math:`hs_{(S\ j,\ A\ k)}` is the good-hits data product :cpp:`"GoodHits"` for `APA` :math:`k` in `Spill` :math:`j`.
3638

3739
The above equation can be expressed more succinctly as:
3840

3941
.. math::
40-
[h_j]_{j \in \iset{\text{out}}} = \pfold{\textit{fill}}{\textit{init}}{\textit{into\_runs}}\ [m_i]_{i \in \iset{\text{in}}}
42+
[E_j]_{j \in \iset{\text{out}}} = \pfold{\textit{sum\_energy}}{\textit{init}}{\textit{into\_runs}}\ [hs_i]_{i \in \iset{\text{in}}}
4143
4244
where
4345

4446
.. math::
4547
:no-wrap:
4648
4749
\begin{align*}
48-
\iset{\text{in}} &= \{(R\ 1,\ S\ 1),\ (R\ 1,\ S\ 2),\ \dots,\ (R\ n,\ S\ 1),\ (R\ n,\ S\ 2), \dots\}, \text{and}\\
49-
\iset{\text{out}} &= \{(R\ 1),\ \dots, (R\ n)\}\ .
50+
\iset{\text{in}} &= \{(S\ 1,\ A\ 1),\ (S\ 1,\ A\ 2),\ \dots,\ (S\ n,\ A\ 1),\ (S\ n,\ A\ 2), \dots\}, \text{and}\\
51+
\iset{\text{out}} &= \{(S\ 1),\ \dots, (S\ n)\}\ .
5052
\end{align*}
5153
5254
Partitions
5355
^^^^^^^^^^
5456

55-
Factorizing a set of data into non-overlapping subsets that collectively span the entire set is called creating a set *partition*. [Wiki-partition]_
57+
Factorizing a set of data into non-overlapping subsets that collectively span the entire set is called creating a set *partition* [Wiki-partition]_.
5658
Each subset of the partition is called a *cell*.
57-
In the above example, the role of the :math:`\textit{into\_runs}` operation is to partition the input sequence into runs so that there is one fold result per run.
59+
In the above example, the role of the :math:`\textit{into\_spills}` operation is to partition the input sequence into `Spill`\ s so that there is one fold result per `Spill`.
5860
In general, however, the partitioning function is of the form :math:`\textit{part}: \{\iset{c}\} \rightarrow \mathbb{P}(\iset{c})`, where:
5961

6062
- the domain is the singleton set that contains only the index set :math:`\iset{c}` (i.e. :math:`\textit{part}` can only be invoked on :math:`\iset{c}`), and
61-
- the codomain is the set of partitions of the index set :math:`\iset{c}`.
63+
- the codomain is the set of partitions of :math:`\iset{c}` or :math:`\mathbb{P}(\iset{c})`; note that the output index set :math:`\iset{d} \in \mathbb{P}(\iset{c})`.
6264

6365
The function :math:`part` also establishes an equivalence relationship on the index set :math:`\iset{c}`, where each element of the index set is mapped to a cell of the partition.
6466
The number of elements in the output sequence :math:`d` corresponds to the number of partition cells.
6567

68+
As of this writing, the only partitions supported are those that correspond to the names of data-product set categories.
69+
The partition :math:`\textit{into\_spills}` can thus be represented by the string :cpp:`"Spill"`, which denotes that there is one partition spell per `Spill`.
70+
6671
Initializing the Accumulator
6772
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
6873

69-
.. todo::
70-
Change the domain type of :math:`\textit{init}`.
71-
7274
A crucial ingredient of the fold is the *accumulator*, which stores the fold result while it is being formed.
73-
Each accumulator is initialized by invoking a user-defined operation :math:`\textit{init}: \one \rightarrow D`, which returns an object that has the same type :math:`D` as the fold result. [#finit]_
74-
Instead of invoking a function, an accumulator is often initialized with a value.
75-
However, in functional programming, a value can be represented by invoking a function that always returns the same result.
76-
Expressing an initializer as a function thus supports value-initialization while retaining the flexibility that may occasionally be required through functions.
75+
Each accumulator is initialized by invoking a user-defined operation :math:`\textit{init}: \opt{\iset{d}} \rightarrow D`, which returns an object that has the same type :math:`D` as the fold result [#finit]_.
76+
The :math:`\opt{\iset{d}}` domain means that:
77+
78+
1. :math:`\textit{init}` can receive an argument corresponding to the identifier of a cell, which is a member of the output index set :math:`\mathcal{I}_d`.
79+
In the example above, the relevant identifier would be that of the `Spill`–i.e. :math:`(S\ j)`.
80+
2. :math:`\textit{init}` can be invoked with no arguments, thus producing the same value each time the accumulator is initialized.
81+
This is equivalent to initializing the accumulator with a constant value.
82+
83+
The implementation of :math:`\textit{init}` for the total good-hits energy fold results is to return the constant :math:`0`.
7784

7885
Fold Operation
7986
^^^^^^^^^^^^^^
8087

8188
A cell's fold result is obtained by repeatedly applying a fold operation to the cell's accumulator and each element of that cell's input sequence.
8289
The fold operation has the signature :math:`f: D \times C \rightarrow D`, where :math:`D` represents the type of the accumulator/fold result, and :math:`C` is the type of each element of the input sequence.
8390

84-
In the above example, the function :math:`\textit{fill}` receives a histogram :math:`h_{(R\ i)}` as the accumulator for run :math:`i` and "combines" it with a track multiplicity object :math:`m_{(R\ i,\ S\ j)}` that belongs to spill :math:`j` in run :math:`i`.
85-
This "combined" value is then returned by :math:`\textit{fill}` as the updated value of the accumulator.
86-
The function :math:`\textit{fill}` is repeatedly invoked to update the accumulator with each track multiplicity value.
87-
Once all track multiplcity values in run :math:`i` have been processed by :math:`\textit{fill}`, the accumulator's value becomes the fold result for that run.
91+
In the above example, the function :math:`\textit{sum\_energy}` receives a floating-point number :math:`E_{(S\ i)}`, representing the accumulated good-hits energy for `Spill` :math:`j` and "combines" it with the good-hits object :math:`hs_{(S\ j,\ A\ k)}` that belongs to `APA` :math:`k` in spill :math:`j`.
92+
This combination involves calculating the energy represented by the good-hits data product :math:`hs_{(S\ j,\ A\ k)}` and adding that to the accumulated value.
93+
This "combined" value is then returned by :math:`\textit{sum\_energy}` as the updated value of the accumulator [#feff]_.
94+
The function :math:`\textit{sum\_energy}` is repeatedly invoked to update the accumulator with good-hits data product.
95+
Once all :cpp:`"GoodHits"` data products in `Spill` :math:`j` have been processed by :math:`\textit{sum\_energy}`, the accumulator's value becomes the fold result for that `Spill`.
96+
97+
Operator signatures
98+
^^^^^^^^^^^^^^^^^^^
99+
100+
.. table::
101+
:widths: 15 13 72
102+
103+
+-----------------------+--------------------------------------------------------------------------------------+
104+
| **Operator** | **Allowed signature** |
105+
+=======================+======================================================================================+
106+
| :math:`f` | :cpp:`void function_name(result_type&, P1, Pn..., Rm...) [qualifiers];` |
107+
+-----------------------+----------------+---------------------------------------------------------------------+
108+
| :math:`\textit{init}` | *as constant:* | :cpp:`result_type{...}` |
109+
| +----------------+---------------------------------------------------------------------+
110+
| | *as function:* | :cpp:`result_type function_name() [qualifiers];` |
111+
| +----------------+---------------------------------------------------------------------+
112+
| | *as function:* | :cpp:`result_type function_name( <cell identifier> ) [qualifiers];` |
113+
+-----------------------+----------------+---------------------------------------------------------------------+
114+
| :math:`\textit{part}` | *Name of data-set category* |
115+
+-----------------------+--------------------------------------------------------------------------------------+
116+
117+
The fold's :cpp:`result_type` must model the created data-product type described in :numref:`ch_conceptual_design/algorithms:Return Types`.
118+
A fold algorithm may also create multiple data products by using a :cpp:`result_type` of :cpp:`std::tuple<T1, ..., Tn>` where each of the types :cpp:`T1, ..., Tn` models a created data-product type.
119+
88120

89121
Registration interface
90122
^^^^^^^^^^^^^^^^^^^^^^
91123

92-
**Result type**: A fold algorithm may create multiple data products through its result by specifying an :cpp:`std::tuple<T1, ..., Tn>` where each of the types :cpp:`T1, ..., Tn` models a data-product created type.
124+
The :math:`\textit{fold(sum\_energies)}` node in :numref:`workflow` would be represented in C++ as:
93125

94-
.. table::
95-
:widths: 15 85
96-
97-
+-----------------------+-------------------------------------------------------------------------+
98-
| **Operator** | **Allowed signature** |
99-
+=======================+=========================================================================+
100-
| :math:`f` | :cpp:`void function_name(result_type&, P1, Pn..., Rm...) [qualifiers];` |
101-
+-----------------------+-------------------------------------------------------------------------+
102-
| :math:`\textit{init}` | :cpp:`result_type{...}` |
103-
+-----------------------+-------------------------------------------------------------------------+
104-
| :math:`\textit{part}` | *Name of data-set category* |
105-
+-----------------------+-------------------------------------------------------------------------+
126+
.. code:: c++
127+
128+
void sum_energy(double& total_hit_energy, hits const& hs) { ... }
129+
130+
PHLEX_REGISTER_ALGORITHMS(config)
131+
{
132+
products("TotalHitEnergy") =
133+
fold(
134+
"sum_hit_energy", // <= Node name for framework
135+
sum_energy, // <= Fold operation
136+
0., // <= Initializer for each fold result
137+
"Spill", // <= Partition level (one fold result per Spill)
138+
concurrency::unlimited // <= Allowed concurrency
139+
)
140+
.sequence("GoodHits"_in("APA"));
141+
}
142+
143+
In order for the user-defined algorithm :cpp:`sum_energy` algorithm to be safely executed concurrently, protections must be in place to avoid data races when updating the :cpp:`total_hit_energy` result object from multiple threads.
144+
Possible solutions include using :cpp:`std::atomic_ref<double>` [#fatomicref]_, placing a lock around the operation that updates :cpp:`total_hit_energy` (less desirable due to inefficiencies), or perhaps using :cpp:`std::atomic<double>` [#fatomic]_ instead of :cpp:`double` to represent the data product.
106145

107146
.. rubric:: Footnotes
108147

109148
.. [#finit] It is acceptable for :math:`\textit{init}` to return a type that is convertible to the accumulator's type.
149+
.. [#feff] Returning an updated accumulated value is generally not the most memory-efficient approach as it requires at least two copies of an accumulated value to be in memory at one time.
150+
The approach adopted by Phlex is to include a reference to the accumulated value as part of the fold operator's signature.
151+
The accumulator can then be updated in place, thus avoiding the extra copies of the data.
152+
.. [#fatomicref] https://en.cppreference.com/w/cpp/atomic/atomic_ref.html
153+
.. [#fatomic] https://en.cppreference.com/w/cpp/atomic/atomic.html
154+
110155
111156
.. only:: html
112157

doc/ch_conceptual_design/hofs/transforms.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,8 @@ Operator signature
3030
| :math:`f` | :cpp:`return_type function_name(P1, Pn..., Rm...) [qualifiers];` |
3131
+--------------+------------------------------------------------------------------+
3232

33-
- The :cpp:`return_type` must model the created data-product type described in :numref:`ch_conceptual_design/algorithms:Return Types`.
34-
An algorithm may also create multiple data products by returning a :cpp:`std::tuple<T1, ..., Tn>` where each of the types :cpp:`T1, ..., Tn` models a created data-product type.
33+
The :cpp:`return_type` must model the created data-product type described in :numref:`ch_conceptual_design/algorithms:Return Types`.
34+
An algorithm may also create multiple data products by returning a :cpp:`std::tuple<T1, ..., Tn>` where each of the types :cpp:`T1, ..., Tn` models a created data-product type.
3535

3636
Registration interface
3737
^^^^^^^^^^^^^^^^^^^^^^

doc/ch_preliminaries/functional_programming.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ Phlex will likely support other higher order functions as well.
121121
+-----------------------------------------------------------------------------------+----------------------------------------+-------------------------------------------------------------+---------------------------+
122122
| :ref:`Fold <ch_conceptual_design/hofs/partitioned_folds:Partitioned Folds>` | :math:`d = \pfold{f}{init}{part}\ c` | :math:`f: D \times C \rightarrow D` | :math:`|d| \le |c|` |
123123
| | +-------------------------------------------------------------+ |
124-
| | | :math:`init: \one \rightarrow D` | |
124+
| | | :math:`init: \opt{\iset{d}} \rightarrow D` | |
125125
| | +-------------------------------------------------------------+ |
126126
| | | :math:`part: \{\iset{c}\} \rightarrow \mathbb{P}(\iset{c})` | |
127127
+-----------------------------------------------------------------------------------+----------------------------------------+-------------------------------------------------------------+---------------------------+

0 commit comments

Comments
 (0)