-
Notifications
You must be signed in to change notification settings - Fork 45
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #112 from jpivarski/jpivarski/many-project-ideas
add 13 new project ideas from Jim
- Loading branch information
Showing
13 changed files
with
596 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
--- | ||
name: Dates and strings in Awkward Array | ||
postdate: 2025-01-20 | ||
categories: | ||
- Analysis tools | ||
durations: | ||
- 3 months | ||
experiments: | ||
- Any | ||
skillset: | ||
- Python | ||
- C++ | ||
status: | ||
- Available | ||
project: | ||
- Any | ||
location: | ||
- Any | ||
commitment: | ||
- Any | ||
program: | ||
- Any | ||
shortdescription: "More date & string functions and NumPy's new varlen string in Awkward Array" | ||
description: > | ||
Awkward Array has a suite of string functions provided by Apache | ||
Arrow (in `ak.str.*`). However, it's missing a few string functions | ||
(see | ||
[awkward#2703](https://github.com/scikit-hep/awkward/issues/2703)) | ||
and it could also be useful to similarly wrap Arrow's date-handling | ||
functions (see | ||
[awkward#2702](https://github.com/scikit-hep/awkward/issues/2702)), | ||
taking care to translate between NumPy's date format (which Awkward | ||
uses) and Arrow's date format. In addition, NumPy added a new | ||
variable-length string format that is different from all other such | ||
formats and it would be useful to convert to and from Awkward Arrays | ||
(see | ||
[awkward#3170](https://github.com/scikit-hep/awkward/issues/3170)). Although | ||
most functionality can be added in Python, there's a slight chance | ||
that accessing NumPy's varlen strings would require C (not C++). | ||
contacts: | ||
- name: Jim Pivarski | ||
email: [email protected] | ||
|
||
mentees: # keep an empty list until the project has started or a student is identified | ||
# when that happens add a list with name: and link: attributes for each students | ||
# - name: Students name | ||
# - link: #url for project page |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
--- | ||
name: ML-ready Awkward Arrays | ||
postdate: 2025-01-20 | ||
categories: | ||
- Analysis tools | ||
durations: | ||
- 3 months | ||
experiments: | ||
- Any | ||
skillset: | ||
- Python | ||
- ML | ||
status: | ||
- Available | ||
project: | ||
- Any | ||
location: | ||
- Any | ||
commitment: | ||
- Any | ||
program: | ||
- Any | ||
shortdescription: "Helper functions to turn Awkward records into array dimensions and PyG indexes" | ||
description: > | ||
Awkward Array has functions to convert to and from TensorFlow and | ||
PyTorch, such as | ||
[ak.from_raggedtensor](https://awkward-array.org/doc/main/reference/generated/ak.from_raggedtensor.html) | ||
and following, with support for TensorFlow's RaggedTensor. However, | ||
there are format conversions that still have to be handled manually, | ||
such as turning an Awkward Array of records (e.g. muon with pT, eta, | ||
phi fields) into an array dimension (e.g. length-3 dimension in the | ||
tensor `shape`). NumPy has a function for this, | ||
[np.lib.recfunctions.structured_to_unstructured](https://numpy.org/doc/2.1/user/basics.rec.html#numpy.lib.recfunctions.structured_to_unstructured), | ||
though the Awkward equivalent can have a different name (since it | ||
has different submodules). The labor-intensive steps described in | ||
[this StackOverflow | ||
answer](https://stackoverflow.com/a/79215978/1623645) and [this | ||
tutorial](https://hsf-training.github.io/deep-learning-intro-for-hep/25-ragged-data-and-graphs.html#building-permutation-invariance-into-the-model) | ||
could be encapsulated as ready-to-use functions. Also, | ||
PyTorch-Geometric (PyG) expects ragged arrays to be represented as | ||
an external array of integers, which Awkward Array could generate | ||
with a function (see | ||
[awkward#3256](https://github.com/scikit-hep/awkward/issues/3256)). Yet | ||
another framework, [PyTorch | ||
Cluster](https://github.com/rusty1s/pytorch_cluster), expects | ||
raggedness to be expressed as a list of tensors (see | ||
[awkward#3265](https://github.com/scikit-hep/awkward/issues/3265)). All | ||
of these helper functions would simplify the conversion of Awkward | ||
Arrays into tensors for fixed-size NNs and GNNs. | ||
contacts: | ||
- name: Jim Pivarski | ||
email: [email protected] | ||
|
||
mentees: # keep an empty list until the project has started or a student is identified | ||
# when that happens add a list with name: and link: attributes for each students | ||
# - name: Students name | ||
# - link: #url for project page |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
--- | ||
name: Custom autodiff in Awkward Array | ||
postdate: 2025-01-20 | ||
categories: | ||
- Analysis tools | ||
durations: | ||
- 3 months | ||
- 1 year | ||
experiments: | ||
- Any | ||
skillset: | ||
- Python | ||
status: | ||
- Available | ||
project: | ||
- Any | ||
location: | ||
- Any | ||
commitment: | ||
- Any | ||
program: | ||
- Any | ||
shortdescription: "Replace JAX with custom autodiff in Awkward Array" | ||
description: > | ||
At an [Analysis Tools](https://indico.cern.ch/event/1387764/) | ||
meeting and in | ||
[awkward#3349](https://github.com/scikit-hep/awkward/discussions/3349), | ||
we've discussed the possibility of switching from JAX to a custom | ||
implementation to implement automatic differentiation (autodiff, | ||
also known as autograd). The problems with JAX are related to its | ||
interface, which is intended to do much more than just | ||
autodiff. Also, implementing eager autodiff is likely not a major | ||
project, especially if we take advantage of [complex-step | ||
differentiation](https://www.hedonisticlearning.com/posts/complex-step-differentiation.html). This | ||
project would either implement autodiff as a module within Awkward | ||
Array or as a new Scikit-HEP library (and possibly as a backend for | ||
Vector, too). | ||
contacts: | ||
- name: Jim Pivarski | ||
email: [email protected] | ||
|
||
mentees: # keep an empty list until the project has started or a student is identified | ||
# when that happens add a list with name: and link: attributes for each students | ||
# - name: Students name | ||
# - link: #url for project page |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
--- | ||
name: Using std::maps in Awkward Array | ||
postdate: 2025-01-20 | ||
categories: | ||
- Analysis tools | ||
durations: | ||
- 3 months | ||
experiments: | ||
- Any | ||
skillset: | ||
- Python | ||
status: | ||
- Available | ||
project: | ||
- Any | ||
location: | ||
- Any | ||
commitment: | ||
- Any | ||
program: | ||
- Any | ||
shortdescription: "Implement sorted_map type in Awkward Array" | ||
description: > | ||
Awkward Array implements some data types as types with equivalent | ||
storage (e.g. lists of uint8 for strings) plus | ||
[ak.behavior](https://awkward-array.org/doc/main/reference/ak.behavior.html) | ||
to provide specialized functionality (e.g. printing as strings and | ||
broadcasting one string as one object). A basic type that has not | ||
been implemented is a key-value mapping, such as C++'s | ||
`std::map`. This is different from Awkward Array's "record" type, | ||
which has a fixed set of field names, each of which can have a | ||
different type. A key-value mapping has keys of one type (often but | ||
not always strings) and values of another, fixed type (not different | ||
for each key), like `std::map<std::string, int>`. When Uproot | ||
encounters C++ `std::map<K, V>` in a ROOT file, it produces an | ||
Awkward Array of lists of pairs of types `K` and `V` with name | ||
`"sorted_map"`. However, "sorted map" behaviors have not yet been | ||
implemented in Awkward Array, which would make this data type useful | ||
(see | ||
[awkward#780](https://github.com/scikit-hep/awkward/issues/780)). This | ||
project would be to add such functionality. | ||
contacts: | ||
- name: Jim Pivarski | ||
email: [email protected] | ||
|
||
mentees: # keep an empty list until the project has started or a student is identified | ||
# when that happens add a list with name: and link: attributes for each students | ||
# - name: Students name | ||
# - link: #url for project page |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
--- | ||
name: Awkward Arrays with physical units | ||
postdate: 2025-01-20 | ||
categories: | ||
- Analysis tools | ||
durations: | ||
- 3 months | ||
experiments: | ||
- Any | ||
skillset: | ||
- Python | ||
status: | ||
- Available | ||
project: | ||
- Any | ||
location: | ||
- Any | ||
commitment: | ||
- Any | ||
program: | ||
- Any | ||
shortdescription: 'Adding "units" as Awkward Array metadata and conversions as behaviors' | ||
description: > | ||
Awkward Arrays already have an | ||
[ak.Array.attrs](https://awkward-array.org/doc/main/reference/generated/ak.Array.html#ak.Array.attrs) | ||
attribute that can carry arbitrary metadata (persistent or | ||
transient) and an | ||
[ak.behavior](https://awkward-array.org/doc/main/reference/ak.behavior.html) | ||
that attaches functionality to arrays. One, the other, or both of | ||
these would be able to implement physical units on arrays and | ||
convert between units when appropriate, such as putting two arrays | ||
into common units before adding | ||
them. [awkward#2468](https://github.com/scikit-hep/awkward/issues/2468) | ||
is a discussion of this feature and possible implementations. | ||
contacts: | ||
- name: Jim Pivarski | ||
email: [email protected] | ||
|
||
mentees: # keep an empty list until the project has started or a student is identified | ||
# when that happens add a list with name: and link: attributes for each students | ||
# - name: Students name | ||
# - link: #url for project page |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
--- | ||
name: Completing the Ragged library | ||
postdate: 2025-01-20 | ||
categories: | ||
- Analysis tools | ||
durations: | ||
- 3 months | ||
- 1 year | ||
experiments: | ||
- Any | ||
skillset: | ||
- Python | ||
status: | ||
- Available | ||
project: | ||
- Any | ||
location: | ||
- Any | ||
commitment: | ||
- Any | ||
program: | ||
- Any | ||
shortdescription: "Implement the remaining functions to make Ragged an Array-API compliant ragged array library" | ||
description: > | ||
Scikit-HEP's | ||
[Ragged](https://github.com/scikit-hep/ragged/discussions/6) library | ||
is an interface over Awkward Array that restricts it to ragged | ||
arrays only (no records, missing data, etc.) and satisfies | ||
DataAPI's [Array API](https://data-apis.org/), which is rapidly | ||
becoming the standard interface for array libraries. As such, the | ||
requirements for Ragged are very precise: all required functions | ||
have already been stubbed out with full docstrings, and about half | ||
of them have been implemented. This project would be to complete it | ||
and promote it as a fully functional, Array API-compliant ragged | ||
array library. | ||
contacts: | ||
- name: Jim Pivarski | ||
email: [email protected] | ||
|
||
mentees: # keep an empty list until the project has started or a student is identified | ||
# when that happens add a list with name: and link: attributes for each students | ||
# - name: Students name | ||
# - link: #url for project page |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
--- | ||
name: Solidify the Scikit-HEP GPU ecosystem | ||
postdate: 2025-01-20 | ||
categories: | ||
- Analysis tools | ||
durations: | ||
- 3 months | ||
- 1 year | ||
experiments: | ||
- Any | ||
skillset: | ||
- Python | ||
- CUDA | ||
status: | ||
- Available | ||
project: | ||
- Any | ||
location: | ||
- Any | ||
commitment: | ||
- Any | ||
program: | ||
- Any | ||
shortdescription: "Test and identify missing capabilities in the Scikit-HEP GPU ecosystem" | ||
description: > | ||
Awkward Array's CUDA kernels and Numba-CUDA support exist (see [this | ||
training](https://hsf-training.github.io/array-oriented-programming/5-gpu.html#awkward-array)), | ||
as well as | ||
[cuda-histogram](https://github.com/scikit-hep/cuda-histogram), but | ||
these features haven't been heavily tested and probably haven't ever | ||
been used in an analysis. This project would be to try using | ||
Scikit-HEP libraries (including Vector and any other relevant | ||
libraries) in an analysis using GPUs to find out what the pain | ||
points are, and either fixing them directly or raising awareness | ||
among the developers. | ||
contacts: | ||
- name: Jim Pivarski | ||
email: [email protected] | ||
|
||
mentees: # keep an empty list until the project has started or a student is identified | ||
# when that happens add a list with name: and link: attributes for each students | ||
# - name: Students name | ||
# - link: #url for project page |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
--- | ||
name: Modifying existing TTrees in Uproot | ||
postdate: 2025-01-20 | ||
categories: | ||
- Analysis tools | ||
durations: | ||
- 3 months | ||
- 1 year | ||
experiments: | ||
- Any | ||
skillset: | ||
- Python | ||
status: | ||
- Available | ||
project: | ||
- Any | ||
location: | ||
- Any | ||
commitment: | ||
- Any | ||
program: | ||
- Any | ||
shortdescription: "Add new columns to existing TTrees (99% done) and/or new rows (new project) in Uproot" | ||
description: > | ||
Uproot can add new objects to existing ROOT files through | ||
[uproot.update](https://uproot.readthedocs.io/en/latest/uproot.writing.writable.update.html), | ||
but it would be even more useful if it could modify existing TTrees | ||
in place. Zoë Bilodeau implemented the ability to add new | ||
columns/TBranches, which is especially useful for backfilling data | ||
(e.g. adding an array of `False` for triggers that didn't exist at | ||
the time of data-taking). This implementation is nearly done (see | ||
[uproot#1155](https://github.com/scikit-hep/uproot5/pull/1155)), | ||
apart from a few corner-cases that need to be tested and | ||
debugged. It would also be useful to be able to add rows/entries, | ||
which would be an entirely new project. Completing the | ||
adding-columns project would provide the experience necessary to | ||
tackle the adding-rows project. | ||
contacts: | ||
- name: Jim Pivarski | ||
email: [email protected] | ||
|
||
mentees: # keep an empty list until the project has started or a student is identified | ||
# when that happens add a list with name: and link: attributes for each students | ||
# - name: Students name | ||
# - link: #url for project page |
Oops, something went wrong.