Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I transform a stream of records? #20

Closed
pudo opened this issue Aug 22, 2015 · 6 comments
Closed

How do I transform a stream of records? #20

pudo opened this issue Aug 22, 2015 · 6 comments
Assignees

Comments

@pudo
Copy link

pudo commented Aug 22, 2015

I was trying yesterday to see if there is a simple way to do SchemaModel(schema).convert(row) but couldn't figure out how to do it. What is the recommended usage?

@pwalsh
Copy link
Member

pwalsh commented Aug 22, 2015

JTSKit doesn't have methods for working with data - only with schemas.

However, this is obviously desirable, and would be one of several things we need for a v1 release.

@rgrp and I talked on the same issue in a different context here.

@jazzido
Copy link

jazzido commented Aug 22, 2015

+1 — I hacked around this limitation with a combination of preprocessing my data and using the cast() method in JTSType.

Would love to have a proper API for this.

@pwalsh pwalsh added this to the Current milestone Aug 23, 2015
@pwalsh pwalsh self-assigned this Aug 23, 2015
@pwalsh
Copy link
Member

pwalsh commented Aug 23, 2015

If we implement a row converter, how would you expect error handling to work?

As mentioned here, I'll be replacing the currently inconsistent return values from cast, and have two different methods:

  • one that just tells you the value is castable
  • one that actually casts

So, it makes sense that we'll raise errors when we actually call cast and we fail, rather than just False, as the hint or test method will return some type of true/false values.

When iterating over a stream, as a consumer, I'd want to be able to control whether an error can be thrown, and even possibly control what happens if a value can't be cast (retain the raw value, something else, etc.).

So, we won't just want the row as cast back. We'll probably want an object with:

  • row as cast
  • an errors object that maps errors to cells that threw them

And, we might want a "retain raw value of cell can't be cast" option, or something else that allows calling call to control what happens to values that can't be cast.

Any thoughts?

@pudo
Copy link
Author

pudo commented Aug 23, 2015

That's an impressive set of cases you've distinguished. I struggle to imagine an instance in which failed type conversion should just return silently with the unconverted values. With virtually any type of storage, this is going to screw you (or you're using MongoDB, so you're screwed already).

But I don't think that all of these cases need to be handled fully. If you just had a convert() function that raised some sort of grouped exception, that would be perfectly fine in terms of building out all of these scenarios. By grouped exception, I mean what colander does for form validation: validate each part of a model and then return one composite exception which details the list of failures.

The exception could also include the values which have been successfully converted, so that a user can continue based on a partial conversion.

FWIW, the two methods are called test() and cast() in messytables & typecast. For type inference it is useful to not count null-valued rows as full failures, so the signature of test() is to return an integer, -1 for failure, 0 for null/empty values and 1 for a successful conversion.

@pwalsh
Copy link
Member

pwalsh commented Aug 24, 2015

@pudo there is not really a "set of use cases" there.

But anyway, in GoodTables we have a use case that has nothing to do with passing the data on to another storage: there, I want to be able to catch cell errors, add them to the report, and keep going. I'm just highlighting that the use case you have mentioned is not the only one.

And, let's see what it will take to replace jsontableschema.types with typecast, as it is too similar. The main issue as I see it, is how far we can support JTS in typecast. Let's discuss here

@pwalsh pwalsh modified the milestones: Backlog, Current Sep 30, 2015
@pwalsh
Copy link
Member

pwalsh commented Oct 19, 2015

@pudo @jazzido We'll be adding this. Track it here #29

Now is a good time for API suggestions so feel free to comment there.

@pwalsh pwalsh closed this as completed Oct 19, 2015
@pwalsh pwalsh mentioned this issue Oct 19, 2015
5 tasks
@roll roll removed this from the Backlog milestone Mar 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants