hipscatalog-gen

This project was created following the LINCC Frameworks Python Project Template (https://lincc-ppt.readthedocs.io/en/latest/).

Overview

hipscatalog-gen is a Python package for building HiPS-compliant catalog hierarchies from large astronomical tables using Dask and LSDB. It is inspired by and extends the logic of the CDS Hipsgen-cat.jar tool, providing a scalable and parallelized Python implementation suitable for large-scale workflows. Documentation: https://linea-it.github.io/hipscatalog_gen/

The pipeline supports three selection modes, configured in the YAML file under algorithm.selection_mode:

mag_global — global magnitude-complete selection.
score_global — global selection driven by an arbitrary score/expression.
score_density_hybrid — density-driven depths 1..density_up_to_depth (default 4) with score-based distribution afterwards.

Quick Start (PyPI)

Install from PyPI into a fresh environment and run with a config file:

conda create -n hipscatalog-gen "python>=3.11"
conda activate hipscatalog-gen
pip install hipscatalog-gen

If you do not have Conda yet, install it first using the official docs:

Conda install guide: https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html
Miniconda install guide: https://www.anaconda.com/docs/getting-started/miniconda/install

Fetch the example template and adapt it to your catalog:

curl -O https://raw.githubusercontent.com/linea-it/hipscatalog_gen/main/examples/configs/config.template.yaml
cp config.template.yaml config.yaml

Run the pipeline:

hipscatalog-gen --config config.yaml
# or: python -m hipscatalog_gen.cli --config config.yaml

Developer Install

For local development (editable install + tooling):

git clone https://github.com/linea-it/hipscatalog_gen.git
cd hipscatalog_gen
conda create -n hipscatalog-gen-dev "python>=3.11"
conda activate hipscatalog-gen-dev
pip install -e .[dev]

Optionally expose the env as a Jupyter kernel:

python -m ipykernel install --user --name hipscatalog-gen --display-name "hipscatalog-gen"

Configuration

The pipeline is fully configured through a YAML file.

A complete annotated template is provided in ./examples/configs folder as:

config.template.yaml

When installed from PyPI, download the template directly:

curl -O https://raw.githubusercontent.com/linea-it/hipscatalog_gen/main/examples/configs/config.template.yaml

To create your own configuration:

cp config.template.yaml config.yaml

Then edit config.yaml to match your input catalog and selection preferences. Additional examples are available under ./examples/configs/.

Selection modes live under algorithm.selection_mode:

mag_global, score_global, score_density_hybrid. Mode-specific parameters live inside blocks algorithm.mag_global, algorithm.score_global, and algorithm.score_density_hybrid (with optional shared defaults in algorithm.selection_defaults).

Cluster memory policy (current behavior):

The pipeline now uses fixed defaults optimized for large catalogs:
- no persistence of large intermediate DataFrames
- avoid early large compute materializations whenever possible
cluster.low_memory_mode is deprecated (accepted only with warning, no effect).
cluster.persist_ddfs and cluster.avoid_computes_wherever_possible are deprecated and ignored.
For streamed stage-2 writes (deeper depths), an active dask.distributed client is required.

Running

The pipeline can be executed either as a Python library or from the command line.

Run as a library

from hipscatalog_gen.config import load_config, load_config_from_dict, display_available_configs
from hipscatalog_gen.pipeline.main import run_pipeline

cfg = load_config("config.yaml")
run_pipeline(cfg)

Run from the command line

List available selection modes:

hipscatalog-gen --list-modes

Run with a config file:

hipscatalog-gen --config config.yaml
# or: python -m hipscatalog_gen.cli --config config.yaml

No dedicated sbatch wrapper script is required. For HPC usage, set cluster.mode: slurm in the YAML and run the same command above.

Validate a config without running:

hipscatalog-gen --check-config config.yaml

Enable JSON logs (process.jsonl) via CLI flag (when running the pipeline):

hipscatalog-gen --config config.yaml --json-logs

Summarize an existing telemetry.json:

hipscatalog-gen --telemetry /path/to/telemetry.json

Output Structure

Each run generates a HiPS-compliant directory structure under output.out_dir:

Norder*/Dir*/Npix*.tsv → Per-depth tiles.
Norder*/Allsky.tsv → Optional all-sky tables.
densmap_o.fits → Density maps for all depths up to level_limit.
Moc.fits / Moc.json → Multi-Order Coverage maps.
properties / metadata.xml → HiPS metadata descriptors.
process.log / arguments → Run logs and configuration snapshot (optional process.jsonl when --json-logs).
telemetry.json → Run summary with per-stage durations and input/output counts.
Existing output.out_dir causes an error; set output.overwrite: true to clear it before writing.

Mode Summary

mag_global: magnitude-complete slices across all depths.
mag_global hist_peak default bounds: when adaptive_range=hist_peak and mag_min/mag_max are not provided, the histogram range clips the global min/max to [-2, 40] (mag_min clipped to >= -2; mag_max from the peak within [-2, min(global_max, 40)]).
score_global: score-based slices across all depths.
score_density_hybrid: density-driven tiles for depths 1..density_up_to_depth (default 4), then score slices for deeper levels.
For deeper streamed depths, bucket processing runs on Dask workers (Client.submit) and keeps the driver lightweight (orchestration only).
Stream merge uses bounded fan-in (auto-tuned from worker concurrency + RLIMIT_NOFILE) to reduce EMFILE (Too many open files) risk.
Ordering and ties: order_desc controls ascending/descending (default ascending); optional tie_column breaks ties before falling back to RA/DEC.
Invalids: keep_invalid_values (per mode or in selection_defaults) can map NaN/Inf to a sentinel when adaptive_range=complete, sending them to the last slice; rejected for hist_peak.

Development and Contributing

This project follows the LINCC Frameworks Python Project Template.

To set up a development environment:

pip install -e .[dev]
pre-commit install
pytest

Contributions, bug reports, and pull requests are welcome via GitHub Issues: https://github.com/linea-it/hipscatalog_gen/issues

Acknowledgments

This project acknowledges the foundational work of the CDS HiPS Catalog Tool (Hipsgen-cat.jar) developed by the Strasbourg Astronomical Data Center (Unistra/CNRS, 2016), which inspired aspects of the software design. More information: https://aladin.cds.unistra.fr/hips/HipsCat.gml.

The mag-global mode builds on an idea originally suggested by Julia Gschwend.

Citation

If you use this package in your research, please cite:

Silva, L. L. C., et al. (2026). hipscatalog-gen: A Python HiPS Catalog Pipeline. LIneA – Laboratório Interinstitucional de e-Astronomia. Available at: https://github.com/linea-it/hipscatalog_gen

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
.github		.github
benchmarks		benchmarks
docs		docs
examples		examples
src/hipscatalog_gen		src/hipscatalog_gen
tests		tests
.copier-answers.yml		.copier-answers.yml
.git_archival.txt		.git_archival.txt
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
.setup_dev.sh		.setup_dev.sh
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hipscatalog-gen

Overview

Quick Start (PyPI)

Developer Install

Configuration

Running

Run as a library

Run from the command line

Output Structure

Mode Summary

Development and Contributing

Acknowledgments

Citation

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

hipscatalog-gen

Overview

Quick Start (PyPI)

Developer Install

Configuration

Running

Run as a library

Run from the command line

Output Structure

Mode Summary

Development and Contributing

Acknowledgments

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages