Releases: guillaumeast/42_bsq
v4.0.0 - Standard Input (stdin) handling
Highlights
- Reimplemented basic
stdinhandling.
(No on-the-fly parsing β thus no handling of potentialstdinflooding edge cases β
to focus on optimizing execution time when parsing from files.
Real-timestdinparsing would require more granular parsing steps,
which would reduce overall performance and conflict with the primary optimization goal.) - Added fix for BSQ detection inside row 0 / column 0.
- Added tests for:
- BSQ inside row 0 / column 0
- height =
INT_MAX+ 1 - height =
SIZE_MAX+ 1
Performance
- 10 000Γ10 000 map processed in π ~77 ms
Measured on macOS / Apple M4 /
<time.h>/clock_gettime()
stdoutredirected to/dev/nullto eliminate potential shell or terminal I/O bottlenecks
v3.2.1
Highlights
- Adaptive I/O buffer: grows dynamically to read headers, then resizes to load the full map in one go
- Simplified benchmark logic and removed redundant time tracking fields
- Updated
print_result()for cleaner in-place map filling - Simplified runtime structures (
t_buffer,t_str,t_run) and removed unnecessary pointers - Major refactor of the project structure for clarity and maintainability
- Moved
parse_rules.candparse_map.cinto a singleparse/module - Moved
read.candread_rules.candread_map.cinto a singleread/module - Replaced multiple headers with unified
parse.handobjects.h - Introduced dedicated object initializers (
init_buffer,init_str,init_rules, etc.)
- Moved
Performance
- 10 000Γ10 000 map processed in π« ~200 ms
β οΈ The adaptive I/O buffer improves no-opt builds by β36% (435 ms vs 595 ms) but causes a β150% slowdown in optimized builds (~200 ms vs ~80 ms).
Measured on macOS / Apple M4 /<time.h>/clock_gettime()
stdoutredirected to/dev/nullto eliminate potential shell or terminal I/O bottlenecks
v3.2.0 - Single-row array DP
Highlights
- Implemented single-row array
dpfor faster updates:- Up-left =
dp->prev - Up =
dp->tab[col] - Left =
dp->tab[col - 1] dp->previs set todp->tab[col]before modification
- Up-left =
- Added comments above each function
Performance
- 10 000Γ10 000 map processed in π ~77 ms
Measured on macOS / Apple M4 /
<time.h>/clock_gettime()
stdoutredirected to/dev/nullto eliminate potential shell or terminal I/O bottlenecks
v3.1.0 - Faster parsing of the first col of each row
Highlights
- Implemented
parse_col_0()to speed up parsing and solving of the first col of each row - Fine-tuned
parse_map.cfor 42 Norm compliance
Performance
- 10 000Γ10 000 map processed in π ~87 ms
Measured on macOS / Apple M4 /
<time.h>/clock_gettime()
stdoutredirected to/dev/nullto eliminate potential shell or terminal I/O bottlenecks
v3.0.0 - Code cleanup, tests integration, bug fixes, two-row dp implementationβ¦
Optimizations
- Converted full size
int *array into two little arrays of size row_len to optimize cache usage - Optimized
ifstatements order - Removed
run_set_width()and addedparse_row_0()to avoid double read of row 0 - Added (commented) bitmask-based and xor-based versions of the original if-based
solve_cell()- The goal was to reduce branch mispredictions using a branchless comparison method
- All three versions compile down to a single
cselinstruction with-O1or higher - if-based and xor-based versions run in similar time without optimization flags
- bitmask-based version runs about 35 % slower without optimization (due to the extra
maskvariable) - For code readability, the if-based version remains the one used in the project
- Added
bit_masks.mdto document the bitmask-based and xor-based approaches
Bug Fixes
- Fix multiple incorrect rules, maps and file path handling
-Changedmalloc(sizeof(type))tomalloc(sizeof *p)to prevent type mismatch errors and simplify code maintainability - Fix memory leaks (
run->mapwas leaked in somemap errorcases)
Tests and benchmark mode updates
- Added
make testcommand to automatically run tests - Added
make benchrule toMakefileto make bench running easier- Automatically runs
make testbefore starting the bench
- Automatically runs
- Updated command used to run benchmark to improve timings accuracy
- Old command =
./bsq --bench tests/test_10000 > /dev/null - New command =
sudo caffeinate nice -n -20 ./bsq --bench tests/test_10000 > /dev/null
- Old command =
- Removed individual run timings for a more readable output
- Added
CLOCK_BENCHconditionnal definition tobench.hto improve timings accuracy- Changed
CLOCK_MONOTONICtoCLOCK_UPTIME_RAWfor macOS - Changed
CLOCK_MONOTONICtoCLOCK_MONOTONIC_RAWfor Linux
- Changed
Refactorizations
- Moved the
BUFFER_SIZEdefinition fromtypes.htoread.h - Moved
RULES_MIN_LENandRULES_CHARSET_LENdefinitions fromtypes.htoparse_rules.h - Added a
VERSIONdefinition tobsq.h - Split
parse.cintoparse_rules.candparse_map.cfor 42 norm compliance (5 functions max per file)
Performance
- 10 000Γ10 000 map processed in π ~100 ms
Measured on macOS / Apple M4 /
<time.h>/clock_gettime()
stdoutredirected to/dev/nullto eliminate potential shell or terminal I/O bottlenecks
v2.4.0 - Parser micro-optimizations and benchmark upgrade
Highlights
- Reordered parser condition checks to reduce branch mispredictions
- Implemented precomputation of all possible values
- Minimized dereferencing in hot loops
- Increased integrated benchmark from 10 to 100 iterations
Performance
- 10 000Γ10 000 map processed in π ~100 ms
Measured on macOS / Apple M4 /
<time.h>/clock_gettime()
stdoutredirected to/dev/nullto eliminate potential shell or terminal I/O bottlenecks
v2.3.0 - Optimized DP minimum computation
Highlights
- Optimized DP minimum computation to reduce miss-branches
Performance
- 10 000Γ10 000 map processed in π ~140 ms
Measured on macOS / Apple M4 /
<time.h>/clock_gettime()
stdoutredirected to/dev/nullto eliminate potential shell or terminal I/O bottlenecks
v2.2.0 - Reduced memory accesses during buffer reallocation
Highlights
- Reworked
str_grow()to dereference pointer once before the loop - Reduced redundant memory accesses during buffer reallocation
Performance
- 10 000Γ10 000 map processed in π ~190 ms
Measured on macOS / Apple M4 /
<time.h>/clock_gettime()
stdoutredirected to/dev/nullto eliminate potential shell or terminal I/O bottlenecks
v2.1.1 - Benchmark mode
Highlights
- Added integrated benchmark mode
- Added average timings to the displayed performance metrics
Important benchmark modifications
- The benchmark is now run with
stdoutredirected to/dev/nullinstead of a file to eliminate potential shell or terminal I/O bottlenecks - This change alone results in a ~50 ms improvement
Performance
- 10 000Γ10 000 map processed in π ~200 ms
Measured on macOS / Apple M4 /
<time.h>/clock_gettime()
stdoutredirected to/dev/nullto eliminate potential shell or terminal I/O bottlenecks
v2.1.0 - Optimized output
Highlights
- Modifies the initial map (
char *) instead of creating a new one for output - Avoids full map copies β only updates required characters
Performance
- 10 000Γ10 000 map processed in
βοΈ ~250 ms (vs ~320 ms in v2.0.0)
Measured on macOS / Apple M4 /
<time.h>/clock_gettime()
stdoutredirected to a file