Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap towards GNU Radio 4.0 Beta release @ FAIR #311

Open
5 of 20 tasks
RalphSteinhagen opened this issue Apr 11, 2024 · 0 comments
Open
5 of 20 tasks

Roadmap towards GNU Radio 4.0 Beta release @ FAIR #311

RalphSteinhagen opened this issue Apr 11, 2024 · 0 comments
Labels
documentation Improvements or additions to documentation

Comments

@RalphSteinhagen
Copy link
Member

RalphSteinhagen commented Apr 11, 2024

Meeting Summary: Short- to Medium Roadmap towards official GNU Radio 4.0 (GR4) Beta release @ FAIR

participants: @wirew0rm, @drslebedev, @RalphSteinhagen
overarching goal: getting GR4 in shape by the end of May (Beta0) and EU GR Day Workshop (BetaN | official release)

CALL#5 see also Etherpad

Main topics to be covered:

  • Role-Based Access Control (RBAC) flesh-out operational integration into GR4 & OpenCMW
    • Start with using keycloak
    • CIT has a local KeyCloak instance that we may use
    • CI/CD, local development needs to set up a generic instance for testing/debugging
    • implement role retrieval using OAuth 2.0 and/or OpenID Connect
    • Important GR4/services do not need to authenticate (i.e. handle personal data <-> EU GDPR) but just validate RABC token (i.e. signed role name, token expiration time).
  • Integration of UI testing framework integration using ImGUI's test engine
    focus on the unit- (individual widget components) to system integration (main UI views and GitHub bot diff screenshot integration)
  • Follow-up and continue development of backlock items that had to be de-prioritised during CALL#4.

Time-Series DB integration

N.B. This will be a separate EU-wide tender process outside of the framework agreement due to different government funding sources.

Main topics to be covered:

  • Time-Series DB integration of VictoriaMetrics (VM) and/or InfluxDB
    • VM is slightly preferred because of licensing and better-value-for-money w.r.t. horizontally scaling (misses 'ns'-level timestamps though)
  • focus on writing and retrieving high-frequency and high-volume streaming data needed for:
    • 'replay' option for improving algorithms and debugging of beam-based services
    • OP crew training (notably during shutdown periods)
    • re-training, re-analysis, and improving algorithms of AI-based models
  • PRE-CHECK (GSI/FAIR internal): => ACTION: Alex|Semen|Ralph
    • CIT kindly provided a VM test setup on a (very) high-performance machine
    • should (a formality) verify as a proof-of-concept solution that it is possible to:
      a) sustainably write > 100 MB/s of streaming data into the DB (e.g. 8 digitizer channels @20 MS/s, one sample: 16-byte integer)
      b) achieve >1-4 GB/s read-performance (multiple users, faster than real-time processing, ...)

EU GNU Radio Days '24 Workshop, 27–31 Aug 2024

Some organisational things & ToDos:

  • confirm/invite (notably 'key-note') speakers & abstracts (invited/contributed talks) => ACTION: GR Days programme committee
  • start with a rough tutorial structure (focus on didactics)
    • N.B. Semen will do the kick-off with a pre-training for our SYS department
  • Need to get UI into shape: -- waiting on Ivan and Frank's progress here -> items below need to be finished by mid-May
    • internal code structure and usability should be improved so that we can widen the implementation and add high-level/specific UI widgets, notably:
      • UI support for => ACTION: Ivan & Frank
        • block connection for non-float streaming data types
          • Check for port compatibility for type, sampling rate, quantity, unit, etc.
          • Select appropriate block class instantiation, e.g. source block produces float -> destination block should select float-compatible class instance
          • differentiation of 'edge' and actual 'connection' (i.e. buffer) -> setting edge preferred buffer size, priority, ... (Graph::init() actually establishes the connection)
        • Sub-Graph (aka. 'hier-blocks' in GR3) - creation and introspection
          • GR4 integration => ACTION: Alex|Semen|Ralph?
        • Optional: management of remote flow graphs in the service (nice as a short-term goal, but needs to be finished by GR Days Workshop)
      • Additional plot types => ACTION: Alex, Semen, Ralph
        • X-Y with markers/indicators
        • mountain-range
        • contour plots
        • history plots
        • ...
        • (all small but relying on new code-structure to avoid/minimise refactoring/rewrite after Ivan/Frank are finished)
      • FAIR context selector => ACTION: Alex|Semen|Ralph
      • LSA integration - parameter tree selector (scalars, functions, timing-structure/selectors) => ACTION: Alex|Semen|Ralph
        • short-term: polling of LSA DB may need service-local caching -> singleton (for further details see below)
        • long-term: LSA drives settings to services/blocks (for further details, see below)

Towards official GNU Radio 4.0 'Beta 0' Version [to be finished by mid-May]

Smaller half- to one-day items to be followed up:

  • rename 'graph-prototype' to 'gnuradio4' & create/update landing README.md (logo, intro, text, ....) => ACTION: Alex (GH actions) & Bailey + keep Josh & Jeff in the loop
    • fair-acc repo will be mirrored/forked by GR's GitHub organisation and kept synchronised (but with disabled issues)
    • initially GR4 issues will be tracked by fair-acc repo until contributions and GR4 usage picked up also to avoid GR3 vs GR4 confusions
  • some general code clean-up/hygiene tasks:
    • move from gcc13->gcc14 & clang17->clang18 (std::print(..), std::format(..), modules, ... support), mostly CI/CD => ACTION: ALEX
      • fix of gcc14 related warnings and errors => ACTION: Ralph
    • revise and minimise/eliminate ToDos in the code base, either:
      • Fix simple ToDos on the spot
      • fix deprecated work(..)-implementing blocks
      • remove and move the content of larger ToDos to proper issues
    • [ ] eliminate {fmt} and move to systematically using std::print(..), std::format(..) (may need additional unit-tests and helpers, especially for pmt/range formatting)
      N.B. need to wait for P2216R3 being available for gcc/clang, since some format-strings need to be constexpr evaluated.
    • put binary code-sizes on an optional diet
      • generated binaries mostly contain block documentation and meta-information -> OK and needed for desktop and, notably UI users
      • add optional EMBEDDED compile flag/target that minimises the binary size (& eliminates documentation)
        • goal: make simple flow-graphs fit in < 1 MB flash (e.g. RP Pico (RP2040, 2 MB) or Arduino Nano (RP2024, 16 MB)
        • excellent real-world demonstrator for industry, embedded users, and educators that GR4 is efficient, fast and could even be used on 5$ micro-controller
    • improved clang-format definition, needs draft for further discussions and evaluation => **ACTION:**Semen
      • focus on keeping intentional line breaks/long-lines & being vertically compact otherwise
    • evalutate boost-ext/reflect replacing refl-cpp & magic enum => ACTION: Alex, Semen, Ralph
      • basic example + [refl-cpp](https://godbolt.org/z/oY68x11ox]- & boost-ext/reflect-based starter examples (note ASM code)
      • N.B. not intended to early-adopt this because it is new, but it eliminates the refl-cpp MACRO annotation that is hard to teach and a common source of programming errors for new users
      • to check in particular: handling of templated classes and inheritance

Medium-Term Plans and Open Modelling Questions [to be finalised until GR Workshop]

Ensure that the following conceptual dimensions are handled adequately by the existing GR4 design:

Runtime Expression Evaluation

Rationale: while many expressions can be composed using graphs of basic blocks, some topologies can rather large, cane be clumsy (example: f(x, y, mu) = 1/(sqrt(2*pi)*|y|)*exp(-0.5*pow((x-mu)/|y|, 2))), and are not always known at compile-time. For more complex problems, we'd opt for some proper scripting language such as Python, Cling(C++), SYCL, or other JIT-compiled solution. However, these are not necessarily fast and carry some runtime and build dependencies that are not necessarily compatible with all applications (e.g. embedded platforms, security/safety aspects, ...). Still, we need to allow the user to express basic math and notably filter expressions often only known during runtime (for example, chunking block). We could write our own expression evaluation engine, but this quickly becomes unwieldy if going beyond basic math and bracket operations and may affect medium- to long-term maintenance.

  • Evaluate whether we/GR4 onboards exprTk as an external dependency, to which level (i.e., 'basic math + brackets', 'functions', 'if-else', ...), and how to integrate this => ACTION: Alex, Semen, Ralph, and others interested (John?).
  • model filter syntax for event-matching use-case that needs to allow to specify
    • trigger (name + context + ...) matches -> yes|no|ignore
    • matching level: exact match ... partial match (see here for examples)
    • -> resulting matcher string??? Needs syntax proposals.
  • other options?

Compile-Time Performance & Optimisations

GR4 is fine w.r.t. runtime performance (see benchmarks, SIMD, planned SYCL integration) but recent CI/CD experience showed that the whole project requires ~1h on a single core to compile. Recent PR optimised this quite a bit but targeting thousands of blocks this needs to be improved. Which path should we pursue:

  • [3pt] graph-prototype: Optimize compilation time #123
    • pre-compiled headers (PCH)?
    • C++ modules? N.B. support in recent gcc14 and clang18 improved a lot
    • Optimise templating structure (repeated instantiation overhead, code-optimisation on the back-end, ...)
    • pmt-optimisations (one of the major compile-time hogs due to the std::visit patterns being used)
    • use of 'ccache' - doesn't improve overall compilation performance but caching improves when changing isolated block implementations or changing code back-and-forth
  • Target: < 20 seconds per block to be compiled

Modelling of Timing

  • invest in integrating WR, GPS, Net, and simulated timing sources
    • goal: same external block interface for WhiteRabbit, WR, GPS, Net, Simulated, ... (i.e. similar to existing block Clock interface)
  • The WR src block needs to be modelled/narrowed/simplified regarding IO capabilities and triggering models (WR IO can be used as inputs and outputs, optional: reference clock generation, etc.).
  • LSA & UI integration, i.e. displaying a BPC cycle with its chain-sequence-process sub-structure + LSA-defined timing events, and selecting the required start-stop event combinations
  • Standardise event structure and filter definitions, i.e., all events must be defined, and a validation function should enforce this for debugging/testing purposes => ACTION:??
    • trigger_name - std::string
    • trigger_time - uint64_t [ns] (UTC or similar)
    • trigger_offset - float [s] (offset/delay of triggering edge w.r.t. generating trigger)
    • trigger_meta_info - pmt-map carring (for GR4 optional) meta info, default for FAIR:
      • WR_RAW_PAYLOAD - 256 bits WR raw timing byte data
      • context - std::string e.g. "FAIR.SELECTOR.C=:S=:P=:T="
      • LSA_context - std::string corresponding LSA context for that event (needs to be injected a posteriori, not part of the WR pay-load)
      • C - int8_t chain ID
      • S - int8_t sequence ID
      • P - int8_t beam process ID
      • T - int8_t timing-group ID (often optional)
      • BPCTS - uint64_t [UTC ns] beam-production-chain-time-stamp (encodes unique beam ID when it was created)

Modelling of LSA (FAIR's setting supply Mgmt. System) Integration

Long-term: LSA will push settings and configurations to the service and GR4 flow-graph blocks. However, this will likely not be in place before 2025. As an intermediate solution, we may thus need to poll (periodic and/or via SSE) LSA and mimic that behaviour.

  • read settings interface for scalars and functions -> needs good abstraction
  • basic trim-interface PoC -> needs good abstraction and RBAC (!!!)
  • open question on multiplexing modelling (both settings and state):
    1. within the block
    • PRO: Existing Transactions.hpp setting implementation -> should we make this the default??
    • CON: Does not handle block state (e.g. history), which should probably be treated differently to settings
    1. within and creating new (Sub)-Graph, i.e. each block in the sub-graph contains settings for a given context
    • PRO: would handle block state more easily/intuitively and potentially allow for different per-multiplexing-context logic
    • CON: needs UI/service integration for creating new sub-graphs
    1. Do both?
  • polling may need some local caching to detect whether settings changed and to emit only new scalar or DataSet values on change
  • the blocks would define the parameters (e.g. 'SIS18BEAM/Energy') to be monitored plus optional FAIR selector context and emit them as unpacked data streams (ints, floats, DataSet)
  • re-packing should be done with separate blocks to generate <key, value> pmt pairs that could be used to drive the settings of other blocks.

Modelling of SYCL Integration

We eventually need to integrate GPU and FPGA support into GR4 since some algorithms cannot be efficiently handled on the CPU alone for throughput (->GPU) or latency (-> FPGA) reasons. There are already existing attempts to use CUDA or vendor-specific FPGA integration ... all have in common that they require quite a bit of boiler-plate, specific non-Python/C++ programming expertise, and are often quite volatile (on time scales of 3-5 years) and vendor-specific. We cannot afford to integrate and reliably maintain such a zoo of solutions long-term with the available manpower and commitment. However: There is SYCL, a vendor-neutral abstraction for integrating heterogeneous accelerator platforms such as CPUs (SIMD, OpenMP, ...), GPUs, FPGAs, and TPUs using high-level C++ standard-driven abstractions.

  • SYCL substantially simplifies portability, development, and the learning curve for all. Can we onboard more support/developers/help with this?
  • We should evaluate whether we/the GR radio community are willing to invest in this stack as an optional dependency (i.e., disabled on embedded platforms) => ACTION: Alex, Josh, Jeff, Semen, Ralph + GR architecture group.
  • contacted Vincent Heuveline and Aksel Alpay (EMCL, Uni Heidelberg) for advice and invited them to collaborate w.r.t. SYCL integration into GR4 (both are core-contributor/developers to SYCL/AdaptiveCpp)
  • for info: SYCL via AdaptiveCpp (formerly hipSYCL) intro material:
  • While digging into the examples, the three things we should focus on:
    • learn, play, and have fun
    • see how this could be smartly integrated and used as an optional dependency into Block<T>, merged sub-graph (N.B. SYCL uses a JIT compiler, or elsewhere to support heterogeneous computing on the CPU (+ distributed machines), GPU, and eventually GPU
    • see how this could be integrated and unit-tested as an optional GR4 CI/CD dependency (i.e. keeping C++'s 'don't pay for what you don't use' mantra).
      N.B. (AdaptiveCPP: ~65 MB + compiler, CUDA (full): ~7 GB)
    • check which dependencies are pulled in and linked with the user code (Alex: watch out for 'boost'-related deps)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: 🔖 Selected (3)
Development

No branches or pull requests

1 participant