Skip to content

Introduce FieldInfo#467

Open
mabruzzo wants to merge 12 commits intocholla-hydro:devfrom
mabruzzo:FieldNameMap
Open

Introduce FieldInfo#467
mabruzzo wants to merge 12 commits intocholla-hydro:devfrom
mabruzzo:FieldNameMap

Conversation

@mabruzzo
Copy link
Collaborator

@mabruzzo mabruzzo commented Jan 28, 2026

This is a WIP PR. I need to revisit this to make sure its coherent and tweak a few things


Overview

This PR introduces the FieldInfo type. The Grid3D type stores a FieldInfo instance called field_info.

This type does a few things:

  • it maps strings holding field names to the field's id if present (the field id is the value held by grid_enum)
  • it does the reverse mapping
  • it also does a few other things

To help illustrate the benefits of this type, I partially refactored the Output_Data function.

Motivation

There are 2 main motivations for this type:

  1. More immediately, it simplifies reading/writing field data files. This logic needs to map data to field names.

    • By using the logic implemented in FieldInfo,
      • we can reduce preprocessor statements in serialization logic
      • we can replace the common pattern (that comes up a lot) where code is repeated N times for each field-name/ptr pair with a loop over field names
      • it's worth emphasizing that this is very useful when it comes to (de)serialization of passively advected scalars.
        • For context, I don't think any IO code, outside of the HDF5 (de)serialization logic, works with individual scalar fields.
        • Prior to this PR, if we wanted say the Slice logic to write data out individual, we would have to enumerate all of the slice fields (and update this logic each time we added a new slice field). Now, it's relatively straight-forward to loop over slice fields.
    • to concretely illustrate the benefits of FieldInfo,
    • This logic can also be used to help simplify the slice, projection, and rotated-projection logic.
  2. Longer-term, if we use field_info.field_id(<name>) to infer the field ids of passive scalar, this will help us move toward an architecture where physics modules that work with passive scalars are always compiled and fully controlled at runtime,

    • this becomes useful once we know that the GasEnergy field id is NEVER inferred from grid_enum::num_fields. At that point, we can make the passive scalars always have the highest field ids and the number of passive scalars could be configured at runtime.
    • Why is this useful? For concreteness, let's consider the Dust module (this also applies to chemistry).
      • If the number of passive scalars is settable at runtime and we eliminate grid_enum::dust_density from the code base, then the Dust module will be configurable at runtime.
      • How do we eliminate grid_enum::dust_density. As an example, consider Dust_Kernel. We could modify that function to accept an extra argument called dust_density_index, which would be determined on the host using field_info.field_id("dust_density"). Then, the occurrence of grid_enum::dust_density would be replaced by dust_density_index.

More about FieldInfo

As noted above, the FieldInfo type primarily maps strings holding field names to the field's id if present and vise-versa.

The type also tracks whether a field should be read from the host field-data buffer or the device field-data buffer during serialization.1

Additionally, the FieldInfo type makes use of the idea of field-kinds.
The kinds are defined by enumerators of the field::Kind enumeration.
The enumerators include (these are always present, regardless of how Cholla is compiled):

  • field::Kind::HYDRO (it includes GasEnergy when Cholla is configured to use the DualEnergy formalism)
  • field::Kind::PASSIVE_SCALAR
  • field::Kind::MAGNETIC
    There are 2 related pieces of functionality:
  1. FieldInfo can be used to loop over the field ids of all fields of a given field kind. The following snippet loops over the field ids of all field::Kind::HYDRO fields
      for (int field_id : field_info.get_id_range(field::Kind::HYDRO)) {
        // do-work ...
      }
  2. FieldInfo lets you query the number of fields with a given field-kind. For example, the following snippet queries the number of hydro fields:
    int n_hydro = field_info.n_fields(field::Kind::HYDRO);
    (This feature is present mostly because it was trivial to implement).

The FieldWriter Type

A portion of this PR was dedicated to refactoring Write_Grid_HDF5 and the creation of the FieldWriter type in order to make use of FieldInfo.

At a high-level, FieldInfo tracks all customization for the general field data-output and is a callable that can be invoked to create the format.

It's worth mentioning

  • all customization is remains controlled by compile-time ifdef statements (i.e. OUTPUT_MOMENTUM, OUTPUT_ENERGY, OUTPUT_METALS, OUTPUT_ELECTRONS).
  • I didn't touch the code related to recording Temperature, Gravitational Potential, or magnetic fields (these are all special cases and somewhat beyond the scope of this PR)
  • the output customization ONLY impacts hdf5 files (i.e. NOT binary outputs and NOT ASCII outputs)
  • this is all consistent with the historical behavior.

With that said:

  • the code is written so that it will be trivial to support runtime-configuration. I choose not to convert the existing options because it's not obvious that they are the best choices (and it's not obvious that they are actually used). We might want a set of options that are more consistent with the float32 options.
  • it would be straight-forward to make the ASCII format support configuration, I just had some questions about that first (and I think the plan is to eventually eliminate the Binary Format)

Implementation of FieldInfo: FrozenKeyIdxBiMap

The FieldInfo type is implemented in terms of FrozenKeyIdxBiMap. That type is a custom map type that allows fast mapping from string keys to indices and vise-versa.

It's definitely faster than std::map or std::unordered_map for our purposes. For some context:

  • std::map is typically implemented as a Red-black tree. These operations have the worst average complexity.
  • std::unordered_map is typically implemented as a hash table that uses linked-list chaining for handling key collision.,
  • FrozenKeyIdxBiMap is implemented with a hash table that uses open-addressing with linear probing for key collisions.

For clarity, I'll briefly recap the distinction between hash maps that using chaining and linear probing. Let's consider the process of looking up a key. In both cases keys are organized by a storage array, and a key's hash-value always corresponds to an index of this array.

  • for chaining, each element of the storage array holds the head of a linked list specifying a "chain" of all known key-value pairs that where the key shares the same hash value. During lookup, we need to search for the key through the linked list.
  • for linear probing, the storage array directly stores key-value pairs. During lookup, the hash-value tells us the starting location for performing a linear search through the storage array.

For our purposes (i.e. a relatively small map where the contents easily fit in cache and there is no need for resizing/deletion), the FrozenKeyIdxBiMap has significantly better cache-locality than std::unordered_map. With that said, I haven't actually benchmarked.

It's also worth mentioning that:

  • std::unordered_map would probably require us to store a second copy of all keys if we wanted to make the reverse mapping efficient.
  • if we get rid of the internal usage of std::shared_ptr2, it would be easy to make FrozenKeyIdxBiMap work on GPUs (we probably don't want to do that, but it's worth mentioning the option).

Aside: Originally, I intended to work directly with a FrozenKeyIdxBiMap, but I decided to make FieldInfo as this PR progressed.

Note

A case could be made for re-implementing FrozenKeyIdxBiMap in terms of std::map or std::unordered_map, in the name of simplicity. I'm open to doing that if that's your preference.

Footnotes

  1. It's not entirely clear that this should be tracked as part of FieldInfo, but it is indeed a useful quantity to track.

  2. This is left over from when I implemented a similar data structure in Enzo-E. We don't really get much benefit from it.

#endif // SCALAR
#endif

// 3D case
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused about why this conditional now only exists for the 3D case - is it because MHD writes only exist for that case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure I understand what you are talking about. Can you be a little more specific?

I just checked back, and I don't think anything has changed.

Both before the change and after this PR:

  • hydro fields (density, momentum_[xyz], Energy, & GasEnergy) and passive scalars are written for 1D, 2D, & 3D sims
  • the temperature field can be written for 1D, 2D, and 3D sims
  • the gravitational potential can only be written for 3D sims
  • magnetic fields can only be written for 3D sims (previously, the logic conditionally enabled by #ifdef MHD was definitely nested within if (H.nx > 1 && H.ny > 1 && H.nz > 1) {

As for why magnetic fields are only written 3D cases: your guess is probably correct (but only Bob would know for sure)


/*! A callable method that writes a rotated projection of the grid data to file.
*/
void operator()(Grid3D &G, Parameters P, int nfile, const FnameTemplate &fname_template) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the choice of 'operator' for the name of this function, if it is just for writing rotated projections.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A class member function named operator() overloads the "function call operator."

  • Suppose we have an instance of the type (FieldWriter) called my_field_writer. We would invoke this method by calling my_field_writer(G, P, nfile, fname_template)
  • So, in a sense, instances of FieldWriter act just like configurable functions
  • For added context, you would implement the special __call__ method to get analogous behavior for a python class.

We could name it something else. But the most scalable way to do that involves defining a base-class with a virtual method and each kind of output would implement a subclass that overwrites the virtual method.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants