-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streaming parser #205
base: master
Are you sure you want to change the base?
Streaming parser #205
Conversation
me <- lift CL.head | ||
case me of | ||
Just (EventSequenceStart _ _ a) -> parseArrayStreaming 0 a | ||
_ -> liftIO $ throwIO $ UnexpectedEvent me $ Just $ EventSequenceStart NoTag AnySequence Nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some values here for the "expected" event, hoping that would be more helpful that using Nothing
.
-- In combination with 'Y.decodeFile', it can be used to consume | ||
-- a yaml file that consists in a very large list of values. | ||
parseStream :: ReaderT JSONPath (ConduitM Event Value Parse) () | ||
parseStream = do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change the name of this if there's a better name for it.
@@ -5,7 +5,7 @@ cabal-version: 1.12 | |||
-- see: https://github.com/sol/hpack | |||
|
|||
name: yaml | |||
version: 0.11.7.0 | |||
version: 0.11.8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't change this myself. I don't know why it kept happening.
I don't think this would make a good addition here at this time. Such an approach would make more sense as a separate library to allow the API design to evolve instead of trying to promote it to a stable package immediately. One thought is that it seems to me like the |
OK. It might be a good idea, but I don't have the time right now to create a separate project. Perhaps another time. For now, I will just use my fork and merge any updates from your repository. The reason I added it here instead of on my own code is that it uses internal functions that are not exported, and I also thought it might be useful to other people. I simply copied Thanks for the prompt reply! |
I added the ability to consume yaml files incrementally using conduits. This is exposed by the
parseStream
function in the moduleData.Yaml.Internal
.The motivation was simply wanting to consume large files without having to have them fully in memory. I don't know what the demand for something like this is, but I wanted it.
I was thinking maybe to define a wrapper in
Data.Yaml
that would be easier to use and discover. However, I wasn't sure what that would look like. In any case, exporting the conduit works well enough for myself.I also had to export
ParseState
or otherwise I wasn't able to run the parser.To try it out, I wrote the following code:
And then I created a 1,5Gb file (
big.yaml
). The memory still goes up as the program runs, but does so very slowly and the program finishes before it is a problem (it ends up taking close to 3Gb though). Trying to parse the entire file quickly used all of my available memory (31,2Gb). I was expecting the conduit version to take a roughly constant amount of memory. I wonder what causes the slow build up. Could it be related to the parser state?Thanks in advance for your review.