You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should stop considering lists to be typed arrays, because they're not.
Currently, a list of strings is treated as a typed array of strings. This is difficult as:
Every function needs to scan the list to check that, indeed, the list only contains strings.
Every function also needs to scan the list to check whether the list contains None values to represent missing strings.
It also introduces ambiguity, e.g., is a list of strings to be interpreted as a typed array that can only ever contain strings or as an unstructured list of arbitrary objects? What should we guess [] to be? (This has consequences for singledispatch.)
So I propose that all arrays of strings should now use numpy.array with the string type, which is closer to R's character vectors than Python list. This includes, e.g., the row and column names of the BiocFrame, the levels of the Factor, and so on.
The case is even easier to make for numeric types.
The text was updated successfully, but these errors were encountered:
We should stop considering lists to be typed arrays, because they're not.
Currently, a list of strings is treated as a typed array of strings. This is difficult as:
None
values to represent missing strings.[]
to be? (This has consequences forsingledispatch
.)So I propose that all arrays of strings should now use
numpy.array
with the string type, which is closer to R's character vectors than Pythonlist
. This includes, e.g., the row and column names of theBiocFrame
, the levels of theFactor
, and so on.The case is even easier to make for numeric types.
The text was updated successfully, but these errors were encountered: