Make transpose call stack size safe #83

rogeriochaves · 2017-09-25T00:01:48Z

With 10000 elements the transpose function start giving Maximum call stack size exceeded

Chadtech · 2017-11-25T23:08:06Z

This looks pretty good, but I am bench marking it as being ~x8 slower: https://ellie-app.com/kqXGQNgMBa1/1

Im not sure if its worth the trade off, but also, to be honest I am not sure how to weigh the performance hit with the stack safety.

zwilias · 2017-11-26T11:02:06Z

Here's an alternative, based on @rogeriochaves' implementation, that seems to outperform the current List.Extra implementation and is stack-safe.

https://ellie-app.com/68qqkSvkta1/3

The benchmarks test lists of size 10, 100, 1000 and 2000 where each sublist has between 0 and 200 integers. Flipping the comments on the two mains will run a very basic fuzz test.

At @eeue56's suggestion; an attempt to explain why there are 3 functions involved, how they make this fast and how they make this stack-safe.

From a birds eye perspective; the implementation works somewhat like this:

take the first element of every sublist, consing the resulting list to an accumulator
keep doing this until we run out of elements
when every sublist is empty, reverse the accumulator

Why an accumulator?

By using an accumulator rather than building up the result directly, as was done in the original code, we can put the recursive call in the tail position; meaning that the final expression to be evaluated is a call to the same function. When the Elm compiler sees a case where recursion happens in the tail call, it can compile that into a while loop. The resulting code doesn't actually recurse, which means the call-stack depth doesn't increase and the implementation is stack-safe.

Why not appending so there's no need to `reverse`?

Appending to the end of a List has 2 drawbacks:

it needs to traverse the entire list
it can't do any structural sharing

By consing our accumulated results to the front of the accumulator, we only need 1 traversal: reversing the result at the end. Every iteration of the accumulator is a structurally shared list made up of a new head and the existing accumulator as the tail.

This saves both time (less traversals) and memory (structural sharing); which in turn saves more time (less garbage collection interference).

Why that `headsTails` function?

The original code used List.filterMap head list and List.filterMap tail list. This is perfectly fine and perhaps even preferred from a code-reuse perspective, it also means doing 2 traversals where one would suffice. Furthermore, both head and tail return Nothing under the same conditions - we can use a single pattern x::xs to know it has both a head and a tail. In other words, this saves traversals (one rather than two), pattern-matching everything twice, and filtering the result of that call.

gilbertkennen · 2017-11-27T16:47:14Z

Has there been discussion about the proper way to handle non-rectalinear matrices? I dislike the solution of collapsing missing values so that the new list position doesn't reflect the position of the list an element came from.

[ [ 1, 2, 3 ]
, [4, 5]
, [6, 7, 8]
]

-- transposes to
[ [ 1, 4, 6 ]
, [ 2, 5, 7 ]
, [ 3, 8 ]
]

-- transposes back to
[ [ 1, 2, 3 ]
, [ 4, 5, 8 ] -- the 8 has moved up a row
, [ 6, 7 ]
]

We could wrap the whole thing in a Maybe which fails on different-length lists.
We could wrap individual elements in Maybe, which would be a bit awkward to deal with.
We could truncate the matrix to the shortest list size, which seems a bit surprising, but less surprising than the current functionality and it gives us the nicer type that wouldn't need to change.

pzp1997 · 2017-11-28T03:26:18Z

I believe I have a tail recursive solution that has between 3x (for 10 elements) to 8x (for all other sizes) better performance than @zwilias. In regards to @gilbertkennen's comment, my implementation goes with the third approach, truncating to the shortest row length, but could be easily adapted to wrap the entire result in a Maybe as per approach 1.

Here's the code:

transposePalmer : List (List a) -> List (List a)
transposePalmer listOfLists =
    List.foldr (List.map2 (::)) (List.repeat (rowsLength listOfLists) []) listOfLists

rowsLength : List (List a) -> Int
rowsLength listOfLists = case listOfLists of
                            [] ->
                                0

                            x :: _ ->
                                List.length x

and the benchmark: https://ellie-app.com/pw7k2sR8ma1/1. Whipped this up rather quickly, so let me know if I'm missing something.

EDIT

I didn't really look at the benchmarking tests before. @gilbertkennen is correct that when comparing the performance only on rectilinear lists, the performance is about 1.5x better.

Also, I said above that the solution is tail-recursive. That's not the proper terminology though because List.foldr and List.map2 are not even implemented recursively in Elm (they're both Native/Kernel functions). What I really meant is that the implementation is stack safe.

I thought about the possible methods of handling jagged lists and I think that wrapping the entire result in a Maybe makes the most sense.

gilbertkennen · 2017-11-28T12:49:21Z

While this is an improvement, the test is a bit unfair comparing truncating vs. not when using very uneven row lengths. Benchmarks on equal-length rows gives us ~1.5x which is still very good.

https://ellie-app.com/6Rs9Tnr7Ka1/1

jvoigtlaender · 2017-11-28T16:35:19Z

Concerning List.foldr being a Native/Kernel function in Elm: That is the case for the currently released version, but will not be much longer (assuming Elm 0.19 is released). See https://github.com/elm-lang/core/blob/master/src/List.elm.

So that part of the benchmarking here is really against a moving target. Who knows whether with List.foldr as implemented in the next release of the core library, the 1.5x improvement will stand, or whether for the specific use of List.foldr above, the non-native implementation makes things worse.

zwilias · 2017-11-28T16:40:01Z

For what it's worth; the implementation currently in the dev tree of elm-lang/core is pretty fast, with a bit of a drop after 2k elements (due to falling back to foldl + reverse). Once there is an 0.19 alpha, though, I'm hopeful that we can start checking how the codegen and resulting size turns out for the foldr and map implementations here: https://github.com/zwilias/elm-faster-map

Your point still stands, of course, but figured I'd add to it.

gilbertkennen · 2017-11-28T17:03:44Z

Even with a naive implementation of foldr, it's still ~1,4x faster than Ilias's and ~4x faster than list-extra (both using core code for any folds they use).

https://ellie-app.com/6Rs9Tnr7Ka1/1

jvoigtlaender · 2017-11-28T17:28:07Z

Why "Even"? I assume that naive implementation is not stack safe? A stack safe version might be slower. So it's not like a non-naive version is definitely faster than the naive version, so that if "even" the naive version of foldr makes another function fast we could simply assume that any non-naive version of foldr would make the other function still faster.

(I don't remember what is the case for the foldr expected to be shipped with Elm 0.19: whether it is "just" stack safe, but possibly slower than a naive, non stack safe version, or is both stack safe and faster than the naive implementation.)

gilbertkennen · 2017-11-28T17:36:00Z

This naive foldr should be stack safe. It reverses the list and runs a TCO foldl.

jvoigtlaender · 2017-11-28T17:38:24Z

Ah, that's not what I would call "naive". 😄

gilbertkennen · 2017-11-28T19:37:59Z

I have been informed that I clinked to the wrong code previously that didn't include the foldr implementation.

https://ellie-app.com/bQBtPcMvwa1/0

Perhaps that will make things clearer.

rogeriochaves · 2017-11-29T02:32:38Z

@pzp1997 your method fails on this test:

test "short rows are skipped" <|
        \() ->
            Expect.equal
                (transposePalmer [ [ 10, 11 ], [ 20 ], [], [ 30, 31, 32 ] ])
                [ [ 10, 20, 30 ], [ 11, 31 ], [ 32 ] ]

Instead it returns [], in other words, truncating by the smaller one (see https://ellie-app.com/ntkpKfbFCa1/0)

This would be @gilbertkennen 's third suggestion right, and I'm fine with that (personally, I never needed to transpose an uneven matrix)

Is this the behavior we want from now on?

pzp1997 · 2017-12-18T20:27:16Z

Last I recall, there was no consensus on how the function should behave on non-rectangular inputs. I opened #86 so that we can continue discussing that matter.

make transpose call stack size safe

a1bc604

use @pzp1997's faster implementation of transpose

017e824

rogeriochaves force-pushed the master branch from 8868d00 to 017e824 Compare November 29, 2017 02:36

Chadtech merged commit ec6c312 into elm-community:master Dec 18, 2017

pzp1997 mentioned this pull request Dec 18, 2017

Transpose of jagged array #86

Open

lue-bird mentioned this pull request Aug 15, 2021

added transpose elm-community/array-extra#9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make transpose call stack size safe #83

Make transpose call stack size safe #83

rogeriochaves commented Sep 25, 2017

Chadtech commented Nov 25, 2017

zwilias commented Nov 26, 2017 •

edited

Loading

gilbertkennen commented Nov 27, 2017

pzp1997 commented Nov 28, 2017 •

edited

Loading

gilbertkennen commented Nov 28, 2017

jvoigtlaender commented Nov 28, 2017

zwilias commented Nov 28, 2017

gilbertkennen commented Nov 28, 2017

jvoigtlaender commented Nov 28, 2017

gilbertkennen commented Nov 28, 2017

jvoigtlaender commented Nov 28, 2017

gilbertkennen commented Nov 28, 2017

rogeriochaves commented Nov 29, 2017 •

edited

Loading

pzp1997 commented Dec 18, 2017

Make transpose call stack size safe #83

Make transpose call stack size safe #83

Conversation

rogeriochaves commented Sep 25, 2017

Chadtech commented Nov 25, 2017

zwilias commented Nov 26, 2017 • edited Loading

Why an accumulator?

Why not appending so there's no need to reverse?

Why that headsTails function?

gilbertkennen commented Nov 27, 2017

pzp1997 commented Nov 28, 2017 • edited Loading

gilbertkennen commented Nov 28, 2017

jvoigtlaender commented Nov 28, 2017

zwilias commented Nov 28, 2017

gilbertkennen commented Nov 28, 2017

jvoigtlaender commented Nov 28, 2017

gilbertkennen commented Nov 28, 2017

jvoigtlaender commented Nov 28, 2017

gilbertkennen commented Nov 28, 2017

rogeriochaves commented Nov 29, 2017 • edited Loading

pzp1997 commented Dec 18, 2017

zwilias commented Nov 26, 2017 •

edited

Loading

Why not appending so there's no need to `reverse`?

Why that `headsTails` function?

pzp1997 commented Nov 28, 2017 •

edited

Loading

rogeriochaves commented Nov 29, 2017 •

edited

Loading