Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expanding the internal iteration API #99

Open
tcbrindle opened this issue Jul 13, 2023 · 5 comments
Open

Expanding the internal iteration API #99

tcbrindle opened this issue Jul 13, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@tcbrindle
Copy link
Owner

Currently internal iteration is achieved by a sequence customisation point for_each_while(predicate) -> cursor_t, which instructs the sequence to iterate over all of its elements until such time as the predicate returns false.

Unfortunately there are some cases where we can't use internal iteration even though we'd like to. Algorithms like fold_first (#98), min and max need to inspect the initial element before iterating over the rest of the sequence; more generally, any time we take a slice of a sequence (such as with the elements of chunk, chunk_by, slide etc) then we can't implement for_each_while() on the slice in terms of an equivalent call on the base sequence, and so have to fall back to external iteration.

A solution to this would be to replace the existing for_each_while() customisation point with a new pair of functions taking extra cursor arguments to indicate where iteration should begin and end:

auto iterate_while(auto& seq, auto&& pred, cursor_t from) -> cursor_t;
auto iterate_while(auto& seq, auto&& pred, cursor_t from, cursor_t to) -> cursor_t;

The slightly odd parameter order is to allow some sequence impls to provide just a single overload, with the to parameter defaulted to flux::last(). Implementations wouldn't need to provide both of these functions: in particular, most single-pass sequences aren't going to be able to provide the second one.

With this approach, algorithms like fold_first could examine the first element before proceeding with a call to iterate_while(); and iterate_while() on slices could be implemented in terms of calls to the equivalent function on their parent sequence. The existing flux::for_each_while(seq, pred) wrapper function could be implemented as a call to iterate_while(seq, pred, first(seq)).

This would add (yet) another customisation point that authors would have to worry about when writing a sequence implementation, which I'm somewhat reluctant to do, but I think the codegen benefits of internal iteration are great enough that it's justified in this case.

@brycelelbach @brevzin @jnytra any thoughts?

@brevzin
Copy link
Contributor

brevzin commented Jul 14, 2023

Yeah I ran into the same problem when I was trying to do this, and instead came up with a janky semi-stateful internal iteration solution? Not amazing.

So I guess the implementation of min would look something like... this?

template <sequence Seq>
auto min(Seq& seq) -> Optional<value_t<Seq>> {
    auto cursor = first(seq);
    if (is_last(seq, cursor)) {
        return {};
    }

    Optional<value_t<Seq>> best = read_at(seq, cursor);
    inc(seq, cursor);
    iterate_while(seq,
        [&](auto&& elem){ *best = std::min(*best, elem); return true; },
        cursor);
    return best;
}

@tcbrindle
Copy link
Owner Author

tcbrindle commented Jul 14, 2023

I think min() itself wouldn't need to be changed, as it just calls fold_first(). But we could implement that as something like:

template <sequence Seq, typename Func, typename V = value_t<Seq>>
auto fold_first(Seq&& seq, Func func) -> flux::optional<V>
{
    auto cur = first(seq);

    if (is_last(seq, cur)) {
        return std::nullopt;
    }

    V init(read_at(seq, cur));
    inc(seq, cur);

    iterate_while(seq, [&](auto&& elem) {
        init = std::invoke(func, std::move(init), FLUX_FWD(elem));
        return true;
    }, std::move(cur));

    return optional<V>(std::in_place, std::move(init));
}

(Which is pretty much exactly what you just wrote....)

@brevzin
Copy link
Contributor

brevzin commented Jul 15, 2023

I think this seems like a pretty cool idea. Little awkward having the lambda not last I guess, so maybe do iterate_while(seq, cur, f) and iterate_while(seq, cur, last, f)? Definitely curious to see how this pans out!

@jnytra
Copy link
Contributor

jnytra commented Jul 17, 2023

I think too many customization points can confuse users. I prefer a simpler API, even if it means a small performance loss. But if the compiler can optimize much better, it would be a pity not to use it. Maybe we can do some micro benchmarks to prove the speedup. I also agree with @brevzin that we should use lambda as the last argument.

@tcbrindle
Copy link
Owner Author

I think too many customization points can confuse users.

I agree in general. In this case though I think the argument in favour of adding one more -- in particular, that it would enable internal iteration of slices -- it persuasive enough that it's worth investigating.

I also agree with @brevzin that we should use lambda as the last argument.

As noted above, the odd parameter order is to allow some implementations to combine both overloads into a single function by defaulting the last parameter. For example, the implementation for C arrays could be

template <typename T, index_t N>
struct sequence_traits<T[N]> {
     // ...

    static constexpr auto iterate_while(auto& self, auto&& pred, index_t from, index_t to = N) -> index_t
    {
         for (; from < to; ++from) {
            if (!std::invoke(pred, self[from])) {
                break;
            }
         }

        return from;
    }
};

tcbrindle added a commit that referenced this issue Jul 27, 2023
tcbrindle added a commit that referenced this issue Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants