Skip to content

bufio.Scanner: token too long #299

@chapmanjacobd

Description

@chapmanjacobd

I'm running out of vespene gas or somteh

$ wget https://files.pushshift.io/reddit/submissions/RS_2022-08.zst
$ unzstd --memory=2048MB --stdout RS_2022-08.zst | octosql "SELECT count(*) FROM stdin.json" -o csv
...
Error: couldn't run query: couldn't run source: couldn't run source: bufio.Scanner: token too long

sad :'(

the great octopus god is able to work with this other, smaller, file in 110.6s:

$ unzstd --memory=2048MB --stdout RS_2021-08.zst | octosql "SELECT count(*) FROM stdin.json" -o csv
count
28384220

It does not use much RAM with either file so not sure what's up :? Both are similar-ish file-ish size-ish 7.8G vs 10GB compressed. maybe 200GB uncompressed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions