Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse chunks from adi file. #467

Open
CutePotatoDev opened this issue Jun 20, 2023 · 2 comments
Open

Parse chunks from adi file. #467

CutePotatoDev opened this issue Jun 20, 2023 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@CutePotatoDev
Copy link

I trying to parse .adi file as chunked data.

I using this parser like this:
let parser = AdifParser(adifilechunk)
let content = parser.parseTopLevel()

This way i have access to parser.adi and parser.cursor. It allow my on the go add new content and read parsed one.

Now issue.
Because data is chunked it means I have slightly corrupt adif structure in some moments. Chunk have incomplete line or some similar thing happening (no bracket or eor in last line).

When this situation happening, here:

const endTag = this.adi.indexOf('>', startTag);

indexOf result is -1 and line 75 then returns nonsense. This situation not processed by any checks or exceptions.
As result parser.cursor become NaN and last element of array returned by parseTopLevel() is corrupted.

It's be nice to have some exceptions or other approach which allow to read not beautifully chunked adif data.
Thank you. Hope my explanation understandable.

@CutePotatoDev CutePotatoDev changed the title Parse chunksfrom adi file. Parse chunks from adi file. Jun 20, 2023
@xylo04 xylo04 self-assigned this Jun 21, 2023
@xylo04
Copy link
Member

xylo04 commented Jun 21, 2023

Ok, interesting. Obviously I didn't design for parsing chunked ADIF data like this. I can certainly attempt to make the parser more defensive and throw exceptions in cases like this.

I'm curious, what is your use case for needing to parse chunked ADIF data? Are you streaming a download, and so you don't have the entire file in memory? Are you trying to parse large files that potentially don't fit all in memory at the same time? My initial feeling is that you might do better with a little pre-parsing in your code to make sure you break cleanly after a field value and before a tag (or better, between records). Even if I add exceptions, you'll still be doing a lot of book-keeping if you don't do any pre-parsing. But that's just a gut feeling without knowing your use case.

@CutePotatoDev
Copy link
Author

I want split adif files, process them and send to my API.
I want avoid loading full file in memory. because i no idea what size of file can load end user.
Also always nice to have resource friendly code.

This is my current solution example:

To receive chunk of file data and process it.

let fr = new FileReader()
let parser = new AdifParser("")

fr.onload = (evt) => {
    parser.adi += chunk

    let rec = this.parser.parseContent().records
    parser.adi = this.parser.adi.substring(this.parser.cursor + 1)
    parser.cursor = 0

    console.log(rec)        
}

fr.readAsText(file.slice(0, 5 * 1024)

Also my modified parser:

    parseContent() {
        const parsed = {}

        if (this.adi.length === 0)
            return parsed

        if (!this.headerparsed) {
            const header = {}
            header["text"] = this.parseHeaderText()

            while (this.cursor < this.adi.length) {
                const endOfHeader = this.parseTagValue(header)
                if (endOfHeader)
                    break
            }
            parsed.header = header
            this.headerparsed = true
        }

        // QSO Records
        const records = new Array()
        let recordstart = undefined

        while (this.cursor < this.adi.length) {
            recordstart = this.cursor
            const record = this.parseRecord()

            if (record !== undefined && Object.keys(record).length > 0) records.push(record)

            if (record === undefined) {
                this.cursor = recordstart
                break
            }
        }

        if (records.length > 0) parsed.records = records
        return parsed
    }

    parseRecord() {
        const record = {}

        while (this.cursor < this.adi.length) {
            if (this.parseTagValue(record))
                return record
        }
    }

    parseTagValue(record) {
        const startTag = this.adi.indexOf("<", this.cursor)
        if (startTag === -1) {
            this.cursor = this.adi.length
            return false
        }

        const endTag = this.adi.indexOf(">", startTag)
        if (endTag === -1) false

        const tagParts = this.adi.substring(startTag + 1, endTag).split(":")

        if (tagParts[0].toLowerCase() === "eor" || tagParts[0].toLowerCase() === "eoh") {
            this.cursor = endTag + 1
            return true
        } else if (tagParts.length < 2) {
            if (this.adi.substring(startTag + 1, endTag) === "APP_LoTW_EOF") {
                this.cursor = endTag + 1
                return true
            }

            throw new Error(
                "Encountered field tag without enough parts near char " +
                startTag +
                ": " +
                this.adi.substring(startTag + 1, startTag + 80) +
                "\n"
            )
        }

        const fieldName = tagParts[0].toLowerCase()
        const width = +tagParts[1]
        record[fieldName] = this.adi.substr(endTag + 1, width)
        this.cursor = endTag + 1 + width
        return false
    }

I don't know or this mock up code is robust in all situations but for reading chunked files it works for my.
Code saving cursor start of line position. Then parsing record and parse of record is success only if eor is found.
In other case cursor is rolled back to saved position and processing completed.

Basically i can use my code what so ever, but want ask may be this lib can have similar possibilities. It allow receive updates and improvements of this lib and not requires my every time applying my changes to it.

If not possible then no problem, i probably in some time implement lines detecting to have not corrupted data prepared for parsing.

Thank you.

@xylo04 xylo04 added the enhancement New feature or request label Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants