Partition jumper

Note

I had a vague idea of these algorithms in september 2024

This is a specialized family of algorithms which require the list s to be partitioned (see also set partition) by an unknown arbitrary predicate, such that these conditions hold true:

The predicate function must not discriminate identical values; that is, it must be deterministic (not necessarily pure).
All occurrences of each value must be grouped into runs of consecutive copies, like this:
- Good example: [0,0,1,1] (2 distinct runs)
- Bad example: [0,1,1,0] (3 runs, 1 is duped)

There are many ways to reword this:

Runs must "consolidate its representative value"
There cannot be more than 1 partition with the same mode
There's a bijection between modes and partitions.

Note

A single partition of N modes (or arbitrary-len runs) is "the same" as N mono-modal partitions. So by asserting injection, we get bijection for free!

Unlike most bisection methods, this algorithm doesn't need to know the predicate, as long as equality is total. IOW, it is predicate-agnostic.

Note

There is an alternative formulation that uses approximate-equality, but since ≈ isn't transitive, the list must be sorted (or at least, partitioned by approximate comparison, which groups "similar" partitions).

The set of all sorted lists is a strict subset of the set of all partitioned lists, assuming the comparison-function is standard numeric (scalar, not vectorial) comparison.

These algorithms exploit the following lemmas (theorems?):

Multiplication is faster than repeated addition
Bin exponentiation is faster than repeated multiplication
Iterating over all elements of a list is O(n), but finding values in a sorted list can be as fast as O(log(n)) (bin-search)

See reference implementations here. Those are designed to "complement each other", because I want them to be examples of various use-cases.

I hope that this proof-of-concept can help optimize many other programs that deal with grouped values of any type, such as a specialized compressor.

Most discussion about partition-jumping was on this disbloat guild, in the #cs-theory channel. I don't like disbloat, but it was my only choice at the time.

Complexity analysis

TLDR: average time is O(part_count * lb(n)), but it's more nuanced than that. Best-case is O(1). Worst-case is O(n * lb(n)) (bin-search), or O(n) (exponential search)

The runtime of these algorithms is dominated by the partition count. So for the overhead to be worthwhile, the number of unique values should be low (relative to the length of s)

It's worth noting that the aforementioned lb (bin logarithm) is misleading. The 1st bisection is O(lb(n)) but the next is O(lb(n - part_len_0)) then O(lb(n - part_len_0 - part_len_1)), and so on, until it becomes O(lb(part_len_last)) (if we ignore the target == s[-1] check, which simplifies the last bisection into O(1)). That's only true for bin-search. For exp-search it's O(lb(part_len_i)) (worst-case), and O(1) (best-case) with target == s[-1].

As for the space complexity, it's O(1) for fixed-precision numbers. The basic implementation doesn't allocate auxiliary memory, but I'm working on one that does (see below)

etc

The impl with aux-mem will use a data-structure to track the known "sub-partitions" that it encounters while bisecting (I call those "witnesses" or "bystanders"). For example:

s := [0,0,1,1,1,1,1] target := 0, overshoot to index 3. After finding the partition-point ("boundary", as I call them) at index 2, we can remember that there are at least 2 instances of 1, even before we set it as our target, just because we happen to visit it while searching for 0

If you only want to track one bystander at a time, your aux-mem will be O(1). But my plan is to remember all bystanders, so I need something like a hash-map. I'm not sure if it's worth it, as maps have overhead

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
.gitignore		.gitignore
.rustfmt.toml		.rustfmt.toml
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Partition jumper

Complexity analysis

etc

About

Uh oh!

Releases

Uh oh!

Languages

License

Rudxain/partition-jumper

Folders and files

Latest commit

History

Repository files navigation

Partition jumper

Complexity analysis

etc

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Languages