Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsec combinators? #15

Open
alanz opened this issue Oct 22, 2014 · 7 comments
Open

Parsec combinators? #15

alanz opened this issue Oct 22, 2014 · 7 comments

Comments

@alanz
Copy link
Contributor

alanz commented Oct 22, 2014

I am writing some Parsec combinators for my own use on top of this.

Do you want a pull request for them when I am done? I am not sure if they belong in this library.

@qrilka
Copy link
Owner

qrilka commented Oct 22, 2014

I guess that's what PRs are for and they could get accepted or not or could lead to some discussion.
Could I take a look at them already now?

@alanz
Copy link
Contributor Author

alanz commented Oct 22, 2014

Its a work in progress, currently in my private project. Will generate a PR
when it stabilises.

On Wed, Oct 22, 2014 at 9:38 AM, Kirill Zaborsky [email protected]
wrote:

I guess that's what PRs are for and they could get accepted or not or
could lead to some discussion.
Could I take a look at them already now?


Reply to this email directly or view it on GitHub
#15 (comment).

@qrilka
Copy link
Owner

qrilka commented Oct 22, 2014

But probably some small gist with their API?
I any case thanks.

@alanz
Copy link
Contributor Author

alanz commented Oct 22, 2014

parseSheparseSheet :: Worksheet -> [DetailLine2]
parseSheet sh = catMaybes $ map (parseRow sh) [3 .. 3]

parseRow :: Worksheet -> RowNum -> Maybe DetailLine2
parseRow sh row = r
    `debug` ("parseRow:cells= " ++ show cells)
  where
    cells = map (\col -> cellsh sh (row,col)) [1..11]
    r = case parseDl cells of
      Left err -> Nothing `debug` ( "parseRow " ++ show row ++ ":" ++ show
err)
      Right dl -> Just dl

-- parse :: Stream s Identity t => Parsec s () a -> SourceName -> s ->
Either ParseError a

parseDl :: [Maybe CellValue] -> Either ParseError DetailLine2
parseDl ss = parse p "source"  ss

type P a = Parsec [Maybe CellValue] () a

p :: P DetailLine2
p = pDLHeading

pDLHeading :: P DetailLine2
pDLHeading = do
  many1 pEmpty
  name <- pText
  many1 pEmpty
  pLabel "Date: Statement For: "
  pEmpty
  date <- pNumber
  return (DLHeading name date)

-- | Return the text from a cell
pText :: P T.Text
pText = tokenPrim show nextPos getMaybeText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeText mt = case mt of
      Just (CellText str) -> Just (T.fromStrict str)
      _                   -> Nothing

-- | Return the value of a cell
pNumber :: P Rational
pNumber = tokenPrim show nextPos getMaybeNumber
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeNumber mt = case mt of
      Just (CellDouble d) -> Just (double2Rational d)
      _                   -> Nothing

-- | Parse an empty cell
pEmpty :: P ()
pEmpty = tokenPrim show nextPos getMaybeCell
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeCell mt = case mt of
      Just _ -> Nothing
      _      -> Just ()

-- | Match a cell with a specific label
pLabel :: T.Text -> P ()
pLabel label = tokenPrim show nextPos matchText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    matchText mt = case mt of
      Just (CellText str) -> if label == T.fromStrict str
                               then Just ()
                               else Nothing
      _                   -> Nothing
et :: Worksheet -> [DetailLine2]
parseSheet sh = catMaybes $ map (parseRow sh) [3 .. 3]

parseRow :: Worksheet -> RowNum -> Maybe DetailLine2
parseRow sh row = r
    `debug` ("parseRow:cells= " ++ show cells)
  where
    cells = map (\col -> cellsh sh (row,col)) [1..11]
    r = case parseDl cells of
      Left err -> Nothing `debug` ( "parseRow " ++ show row ++ ":" ++ show
err)
      Right dl -> Just dl

-- parse :: Stream s Identity t => Parsec s () a -> SourceName -> s ->
Either ParseError a

parseDl :: [Maybe CellValue] -> Either ParseError DetailLine2
parseDl ss = parse p "source"  ss

type P a = Parsec [Maybe CellValue] () a

p :: P DetailLine2
p = pDLHeading

pDLHeading :: P DetailLine2
pDLHeading = do
  many1 pEmpty
  name <- pText
  many1 pEmpty
  pLabel "Date: Statement For: "
  pEmpty
  date <- pNumber
  return (DLHeading name date)

-- | Return the text from a cell
pText :: P T.Text
pText = tokenPrim show nextPos getMaybeText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeText mt = case mt of
      Just (CellText str) -> Just (T.fromStrict str)
      _                   -> Nothing

-- | Return the value of a cell
pNumber :: P Rational
pNumber = tokenPrim show nextPos getMaybeNumber
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeNumber mt = case mt of
      Just (CellDouble d) -> Just (double2Rational d)
      _                   -> Nothing

-- | Parse an empty cell
pEmpty :: P ()
pEmpty = tokenPrim show nextPos getMaybeCell
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeCell mt = case mt of
      Just _ -> Nothing
      _      -> Just ()

-- | Match a cell with a specific label
pLabel :: T.Text -> P ()
pLabel label = tokenPrim show nextPos matchText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    matchText mt = case mt of
      Just (CellText str) -> if label == T.fromStrict str
                               then Just ()
                               else Nothing
      _                   -> Nothing

@qrilka
Copy link
Owner

qrilka commented Oct 22, 2014

Github is a bit strange it did not use markdown for you email I guess, your message should look like this I think:

parseSheparseSheet :: Worksheet -> [DetailLine2]
parseSheet sh = catMaybes $ map (parseRow sh) [3 .. 3]

parseRow :: Worksheet -> RowNum -> Maybe DetailLine2
parseRow sh row = r
    `debug` ("parseRow:cells= " ++ show cells)
  where
    cells = map (\col -> cellsh sh (row,col)) [1..11]
    r = case parseDl cells of
      Left err -> Nothing `debug` ( "parseRow " ++ show row ++ ":" ++ show
err)
      Right dl -> Just dl

-- parse :: Stream s Identity t => Parsec s () a -> SourceName -> s ->
Either ParseError a

parseDl :: [Maybe CellValue] -> Either ParseError DetailLine2
parseDl ss = parse p "source"  ss

type P a = Parsec [Maybe CellValue] () a

p :: P DetailLine2
p = pDLHeading

pDLHeading :: P DetailLine2
pDLHeading = do
  many1 pEmpty
  name <- pText
  many1 pEmpty
  pLabel "Date: Statement For: "
  pEmpty
  date <- pNumber
  return (DLHeading name date)

-- | Return the text from a cell
pText :: P T.Text
pText = tokenPrim show nextPos getMaybeText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeText mt = case mt of
      Just (CellText str) -> Just (T.fromStrict str)
      _                   -> Nothing

-- | Return the value of a cell
pNumber :: P Rational
pNumber = tokenPrim show nextPos getMaybeNumber
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeNumber mt = case mt of
      Just (CellDouble d) -> Just (double2Rational d)
      _                   -> Nothing

-- | Parse an empty cell
pEmpty :: P ()
pEmpty = tokenPrim show nextPos getMaybeCell
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeCell mt = case mt of
      Just _ -> Nothing
      _      -> Just ()

-- | Match a cell with a specific label
pLabel :: T.Text -> P ()
pLabel label = tokenPrim show nextPos matchText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    matchText mt = case mt of
      Just (CellText str) -> if label == T.fromStrict str
                               then Just ()
                               else Nothing
      _                   -> Nothing
et :: Worksheet -> [DetailLine2]
parseSheet sh = catMaybes $ map (parseRow sh) [3 .. 3]

parseRow :: Worksheet -> RowNum -> Maybe DetailLine2
parseRow sh row = r
    `debug` ("parseRow:cells= " ++ show cells)
  where
    cells = map (\col -> cellsh sh (row,col)) [1..11]
    r = case parseDl cells of
      Left err -> Nothing `debug` ( "parseRow " ++ show row ++ ":" ++ show
err)
      Right dl -> Just dl

-- parse :: Stream s Identity t => Parsec s () a -> SourceName -> s ->
Either ParseError a

parseDl :: [Maybe CellValue] -> Either ParseError DetailLine2
parseDl ss = parse p "source"  ss

type P a = Parsec [Maybe CellValue] () a

p :: P DetailLine2
p = pDLHeading

pDLHeading :: P DetailLine2
pDLHeading = do
  many1 pEmpty
  name <- pText
  many1 pEmpty
  pLabel "Date: Statement For: "
  pEmpty
  date <- pNumber
  return (DLHeading name date)

-- | Return the text from a cell
pText :: P T.Text
pText = tokenPrim show nextPos getMaybeText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeText mt = case mt of
      Just (CellText str) -> Just (T.fromStrict str)
      _                   -> Nothing

-- | Return the value of a cell
pNumber :: P Rational
pNumber = tokenPrim show nextPos getMaybeNumber
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeNumber mt = case mt of
      Just (CellDouble d) -> Just (double2Rational d)
      _                   -> Nothing

-- | Parse an empty cell
pEmpty :: P ()
pEmpty = tokenPrim show nextPos getMaybeCell
  where
    nextPos pos _ _ = incSourceColumn pos 1

    getMaybeCell mt = case mt of
      Just _ -> Nothing
      _      -> Just ()

-- | Match a cell with a specific label
pLabel :: T.Text -> P ()
pLabel label = tokenPrim show nextPos matchText
  where
    nextPos pos _ _ = incSourceColumn pos 1

    matchText mt = case mt of
      Just (CellText str) -> if label == T.fromStrict str
                               then Just ()
                               else Nothing
      _                   -> Nothing

@alanz
Copy link
Contributor Author

alanz commented Oct 22, 2014

yep

On Wed, Oct 22, 2014 at 11:22 AM, Kirill Zaborsky [email protected]
wrote:

Github is a bit strange it did not use markdown for you email I guess,
your message should look like this I think:

parseSheparseSheet :: Worksheet -> [DetailLine2]parseSheet sh = catMaybes $ map (parseRow sh) [3 .. 3]
parseRow :: Worksheet -> RowNum -> Maybe DetailLine2parseRow sh row = r
debug ("parseRow:cells= " ++ show cells)
where
cells = map (\col -> cellsh sh (row,col)) [1..11]
r = case parseDl cells of
Left err -> Nothing debug ( "parseRow " ++ show row ++ ":" ++ showerr)
Right dl -> Just dl
-- parse :: Stream s Identity t => Parsec s () a -> SourceName -> s ->Either ParseError a
parseDl :: [Maybe CellValue] -> Either ParseError DetailLine2parseDl ss = parse p "source" ss
type P a = Parsec Maybe CellValue a
p :: P DetailLine2p = pDLHeading
pDLHeading :: P DetailLine2pDLHeading = do
many1 pEmpty
name <- pText
many1 pEmpty
pLabel "Date: Statement For: "
pEmpty
date <- pNumber
return (DLHeading name date)
-- | Return the text from a cellpText :: P T.TextpText = tokenPrim show nextPos getMaybeText
where
nextPos pos _ _ = incSourceColumn pos 1

getMaybeText mt = case mt of
  Just (CellText str) -> Just (T.fromStrict str)
  _                   -> Nothing

-- | Return the value of a cellpNumber :: P RationalpNumber = tokenPrim show nextPos getMaybeNumber
where
nextPos pos _ _ = incSourceColumn pos 1

getMaybeNumber mt = case mt of
  Just (CellDouble d) -> Just (double2Rational d)
  _                   -> Nothing

-- | Parse an empty cellpEmpty :: P ()pEmpty = tokenPrim show nextPos getMaybeCell
where
nextPos pos _ _ = incSourceColumn pos 1

getMaybeCell mt = case mt of
  Just _ -> Nothing
  _      -> Just ()

-- | Match a cell with a specific labelpLabel :: T.Text -> P ()pLabel label = tokenPrim show nextPos matchText
where
nextPos pos _ _ = incSourceColumn pos 1

matchText mt = case mt of
  Just (CellText str) -> if label == T.fromStrict str
                           then Just ()
                           else Nothing
  _                   -> Nothinget :: Worksheet -> [DetailLine2]parseSheet sh = catMaybes $ map (parseRow sh) [3 .. 3]

parseRow :: Worksheet -> RowNum -> Maybe DetailLine2parseRow sh row = r
debug ("parseRow:cells= " ++ show cells)
where
cells = map (\col -> cellsh sh (row,col)) [1..11]
r = case parseDl cells of
Left err -> Nothing debug ( "parseRow " ++ show row ++ ":" ++ showerr)
Right dl -> Just dl
-- parse :: Stream s Identity t => Parsec s () a -> SourceName -> s ->Either ParseError a
parseDl :: [Maybe CellValue] -> Either ParseError DetailLine2parseDl ss = parse p "source" ss
type P a = Parsec Maybe CellValue a
p :: P DetailLine2p = pDLHeading
pDLHeading :: P DetailLine2pDLHeading = do
many1 pEmpty
name <- pText
many1 pEmpty
pLabel "Date: Statement For: "
pEmpty
date <- pNumber
return (DLHeading name date)
-- | Return the text from a cellpText :: P T.TextpText = tokenPrim show nextPos getMaybeText
where
nextPos pos _ _ = incSourceColumn pos 1

getMaybeText mt = case mt of
  Just (CellText str) -> Just (T.fromStrict str)
  _                   -> Nothing

-- | Return the value of a cellpNumber :: P RationalpNumber = tokenPrim show nextPos getMaybeNumber
where
nextPos pos _ _ = incSourceColumn pos 1

getMaybeNumber mt = case mt of
  Just (CellDouble d) -> Just (double2Rational d)
  _                   -> Nothing

-- | Parse an empty cellpEmpty :: P ()pEmpty = tokenPrim show nextPos getMaybeCell
where
nextPos pos _ _ = incSourceColumn pos 1

getMaybeCell mt = case mt of
  Just _ -> Nothing
  _      -> Just ()

-- | Match a cell with a specific labelpLabel :: T.Text -> P ()pLabel label = tokenPrim show nextPos matchText
where
nextPos pos _ _ = incSourceColumn pos 1

matchText mt = case mt of
  Just (CellText str) -> if label == T.fromStrict str
                           then Just ()
                           else Nothing
  _                   -> Nothing


Reply to this email directly or view it on GitHub
#15 (comment).

@olafklinke
Copy link

Sorry to revive this already quite old issue, but what is the preferred way of parsing spreadsheets? Obviously, the traditional stream-based parsers are a bit limited, since xlsx already provides us with a nice CellMap we can traverse at will. If for some reason we were to shoe-horn a Worksheet into a stream-based parser, then the first problem is to define a suitable stream type. My first attempt is something like

data XlsToken = EndOfRow | C Cell
fromSheet :: CellMap -> [XlsToken]
instance Stream [XlsToken] where

(see also this comment)
Forgoing the stream approach, we could use as a parser monad ReaderT CellMap (Either ParseError) which allows us to freely jump across the sheet as we see fit. The only compelling reason to use the stream-based approach is because we can build on libraries with excellent error reporting, like Megaparsec.

If stream-based spreadsheet parsing turns out to be of general interest, I could release a xlsx-megaparsec library. I think this has no place in the xlsx library itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants