Skip to content

Releases: guillaumeast/42_bsq

v4.0.0 - Standard Input (stdin) handling

02 Nov 23:49

Choose a tag to compare

Highlights

  • Reimplemented basic stdin handling.
    (No on-the-fly parsing β€” thus no handling of potential stdin flooding edge cases β€”
    to focus on optimizing execution time when parsing from files.
    Real-time stdin parsing would require more granular parsing steps,
    which would reduce overall performance and conflict with the primary optimization goal.)
  • Added fix for BSQ detection inside row 0 / column 0.
  • Added tests for:
    • BSQ inside row 0 / column 0
    • height = INT_MAX + 1
    • height = SIZE_MAX + 1

Performance

  • 10 000Γ—10 000 map processed in πŸš€ ~77 ms

Measured on macOS / Apple M4 / <time.h> / clock_gettime()
stdout redirected to /dev/null to eliminate potential shell or terminal I/O bottlenecks

v3.2.1

29 Oct 13:25

Choose a tag to compare

Highlights

  • Adaptive I/O buffer: grows dynamically to read headers, then resizes to load the full map in one go
  • Simplified benchmark logic and removed redundant time tracking fields
  • Updated print_result() for cleaner in-place map filling
  • Simplified runtime structures (t_buffer, t_str, t_run) and removed unnecessary pointers
  • Major refactor of the project structure for clarity and maintainability
    • Moved parse_rules.c and parse_map.c into a single parse/ module
    • Moved read.c and read_rules.c and read_map.c into a single read/ module
    • Replaced multiple headers with unified parse.h and objects.h
    • Introduced dedicated object initializers (init_buffer, init_str, init_rules, etc.)

Performance

  • 10 000Γ—10 000 map processed in 🫠 ~200 ms

⚠️ The adaptive I/O buffer improves no-opt builds by β‰ˆ36% (435 ms vs 595 ms) but causes a β‰ˆ150% slowdown in optimized builds (~200 ms vs ~80 ms).
Measured on macOS / Apple M4 / <time.h> / clock_gettime()
stdout redirected to /dev/null to eliminate potential shell or terminal I/O bottlenecks

v3.2.0 - Single-row array DP

26 Oct 22:20

Choose a tag to compare

Highlights

  • Implemented single-row array dp for faster updates:
    • Up-left = dp->prev
    • Up = dp->tab[col]
    • Left = dp->tab[col - 1]
    • dp->prev is set to dp->tab[col] before modification
  • Added comments above each function

Performance

  • 10 000Γ—10 000 map processed in πŸš€ ~77 ms

Measured on macOS / Apple M4 / <time.h> / clock_gettime()
stdout redirected to /dev/null to eliminate potential shell or terminal I/O bottlenecks

v3.1.0 - Faster parsing of the first col of each row

26 Oct 20:31

Choose a tag to compare

Highlights

  • Implemented parse_col_0() to speed up parsing and solving of the first col of each row
  • Fine-tuned parse_map.c for 42 Norm compliance

Performance

  • 10 000Γ—10 000 map processed in πŸš€ ~87 ms

Measured on macOS / Apple M4 / <time.h> / clock_gettime()
stdout redirected to /dev/null to eliminate potential shell or terminal I/O bottlenecks

v3.0.0 - Code cleanup, tests integration, bug fixes, two-row dp implementation…

23 Oct 00:26

Choose a tag to compare

Optimizations

  • Converted full size int * array into two little arrays of size row_len to optimize cache usage
  • Optimized if statements order
  • Removed run_set_width() and added parse_row_0() to avoid double read of row 0
  • Added (commented) bitmask-based and xor-based versions of the original if-based solve_cell()
    • The goal was to reduce branch mispredictions using a branchless comparison method
    • All three versions compile down to a single csel instruction with -O1 or higher
    • if-based and xor-based versions run in similar time without optimization flags
    • bitmask-based version runs about 35 % slower without optimization (due to the extra mask variable)
    • For code readability, the if-based version remains the one used in the project
    • Added bit_masks.md to document the bitmask-based and xor-based approaches

Bug Fixes

  • Fix multiple incorrect rules, maps and file path handling
    -Changed malloc(sizeof(type)) to malloc(sizeof *p) to prevent type mismatch errors and simplify code maintainability
  • Fix memory leaks (run->map was leaked in some map error cases)

Tests and benchmark mode updates

  • Added make test command to automatically run tests
  • Added make bench rule to Makefile to make bench running easier
    • Automatically runs make test before starting the bench
  • Updated command used to run benchmark to improve timings accuracy
    • Old command = ./bsq --bench tests/test_10000 > /dev/null
    • New command = sudo caffeinate nice -n -20 ./bsq --bench tests/test_10000 > /dev/null
  • Removed individual run timings for a more readable output
  • Added CLOCK_BENCH conditionnal definition to bench.h to improve timings accuracy
    • Changed CLOCK_MONOTONIC to CLOCK_UPTIME_RAW for macOS
    • Changed CLOCK_MONOTONIC to CLOCK_MONOTONIC_RAW for Linux

Refactorizations

  • Moved the BUFFER_SIZE definition from types.h to read.h
  • Moved RULES_MIN_LEN and RULES_CHARSET_LEN definitions from types.h to parse_rules.h
  • Added a VERSION definition to bsq.h
  • Split parse.c into parse_rules.c and parse_map.c for 42 norm compliance (5 functions max per file)

Performance

  • 10 000Γ—10 000 map processed in πŸš€ ~100 ms

Measured on macOS / Apple M4 / <time.h> / clock_gettime()
stdout redirected to /dev/null to eliminate potential shell or terminal I/O bottlenecks

v2.4.0 - Parser micro-optimizations and benchmark upgrade

20 Oct 05:01

Choose a tag to compare

Highlights

  • Reordered parser condition checks to reduce branch mispredictions
  • Implemented precomputation of all possible values
  • Minimized dereferencing in hot loops
  • Increased integrated benchmark from 10 to 100 iterations

Performance

  • 10 000Γ—10 000 map processed in πŸš€ ~100 ms

Measured on macOS / Apple M4 / <time.h> / clock_gettime()
stdout redirected to /dev/null to eliminate potential shell or terminal I/O bottlenecks

v2.3.0 - Optimized DP minimum computation

20 Oct 02:37

Choose a tag to compare

Highlights

  • Optimized DP minimum computation to reduce miss-branches

Performance

  • 10 000Γ—10 000 map processed in πŸš€ ~140 ms

Measured on macOS / Apple M4 / <time.h> / clock_gettime()
stdout redirected to /dev/null to eliminate potential shell or terminal I/O bottlenecks

v2.2.0 - Reduced memory accesses during buffer reallocation

19 Oct 22:43

Choose a tag to compare

Highlights

  • Reworked str_grow() to dereference pointer once before the loop
  • Reduced redundant memory accesses during buffer reallocation

Performance

  • 10 000Γ—10 000 map processed in πŸš€ ~190 ms

Measured on macOS / Apple M4 / <time.h> / clock_gettime()
stdout redirected to /dev/null to eliminate potential shell or terminal I/O bottlenecks

v2.1.1 - Benchmark mode

19 Oct 20:40

Choose a tag to compare

Highlights

  • Added integrated benchmark mode
  • Added average timings to the displayed performance metrics

Important benchmark modifications

  • The benchmark is now run with stdout redirected to /dev/null instead of a file to eliminate potential shell or terminal I/O bottlenecks
  • This change alone results in a ~50 ms improvement

Performance

  • 10 000Γ—10 000 map processed in πŸš€ ~200 ms

Measured on macOS / Apple M4 / <time.h> / clock_gettime()
stdout redirected to /dev/null to eliminate potential shell or terminal I/O bottlenecks

v2.1.0 - Optimized output

19 Oct 17:15

Choose a tag to compare

Highlights

  • Modifies the initial map (char *) instead of creating a new one for output
  • Avoids full map copies β€” only updates required characters

Performance

  • 10 000Γ—10 000 map processed in ✈️ ~250 ms (vs ~320 ms in v2.0.0)

Measured on macOS / Apple M4 / <time.h> / clock_gettime()
stdout redirected to a file