Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate epubcfi from unique text #1430

Open
orhnk opened this issue Nov 16, 2024 · 4 comments
Open

Generate epubcfi from unique text #1430

orhnk opened this issue Nov 16, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@orhnk
Copy link

orhnk commented Nov 16, 2024

My attempt: https://github.com/orhnk/Annot2CFI

There are some problems with miscalculated cfi's like:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
  <title>Bir Dinozorun Anıları</title>
  <link href="../Styles/main.css" rel="stylesheet" type="text/css" />
</head>

<body class="part" epub:type="part">
  <h1 class="bolum"><span epub:type="pagebreak" id="page7" title="7"></span>BİRİNCİ BÖLÜM</h1>

  <h2 id="sigil_toc_id_1">Yaşlılık ve Ölüm</h2>
</body>
</html>
EXPECTED: epubcfi(/6/8!/4/2,/3:0,/3:7)
FOUND: epubcfi(/6/8!/4/2,/1:0,/1:7)

The library I use cannot find that last node's possition correctly.

I'm going to use this program to fetch annotations from my kobo device (and hopefully a lot of people are going to use it :D)

What should I do?

Any integration ideas for your epubcfi.js? @johnfactotum

Or maybe foliate can have a feature like fixing up some epubcfi's by itself if it's easy to detect like the above unique text example?

A lot of the text people highlight from the book is unique. It's like... Why do you think they highlight it then?

Yours truly

@orhnk orhnk added the enhancement New feature or request label Nov 16, 2024
@orhnk
Copy link
Author

orhnk commented Nov 16, 2024

I should also say that this was my solution. Anything which makes more sense, makes more sense.

@johnfactotum
Copy link
Owner

Yes, it does make sense to use the text to correct the CFI. (There's actually something similar to this in the CFI spec. It's called a text location assertion, though such assertions contain not the highlighted text itself, but rather the text that precedes or follows it.)

Anyway, to implement this, the main issue is not really related to CFI, which can be straightforwardly generated once you have the correct nodes and offsets. The problem mainly depends on how you would like to match the text, and that depends to some extent on how the text is generated from the book in the first place. foliate-js provides a search.js module, which is used to implement "Find in Book" in Foliate, and one can use it for this purpose as well.

@orhnk
Copy link
Author

orhnk commented Nov 22, 2024

@johnfactotum I have completed my project but the library that it depends on is not fully capable of generating cfi's. It puts /1 to every text node location. But it be different sometimes. For fully capable program, I need to use foliate's own epubcfi generator that I will use it to generate a json file that includes all paragraphs with all cfi's for each. What do you suggest to me?

Also your library is hard to understand for me. I don't know js. I use chatgpt and common sense. thank you.

@johnfactotum
Copy link
Owner

johnfactotum commented Nov 23, 2024

Ah, I see. Well, you're using epub-cfi-generator, which according to the readme, is simply using readium-cfi-js. So the bug your seeing is readium/readium-cfi-js#23, which has been fixed upstream. So you can either ask them to update their copy of readium-cfi-js, or just use readium-cfi-js yourself.

If you want to use foliate-js, it can be indeed a bit more involved. But here's an example:

import { makeBook } from './foliate-js/view.js'
import * as CFI from './foliate-js/epubcfi.js'

const file = /* a Blob or File object, or a string (a URL to the file) */;
const book = await makeBook(file)

const index = /* the index of the spine item, e.g. `0` for the first section */;
const range = /* a DOM Range object of the highlighted text */;

const baseCfi = book.sections[index].cfi ?? CFI.fake.fromIndex(index)
const cfi = CFI.joinIndir(baseCFI, CFI.fromRange(range))

Alternatively, you can simply use the foliate-view element, though I'm not entirely sure whether this would work in Node.js. With this you can also use view.search() to find text and get CFIs directly.

import './foliate-js/view.js'

const document = /* a DOM Document, which if you're in Node.js, you need to create with e.g. JSDOM */;

const view = document.createElement('foliate-view')
const file = /* a Blob or File object, or a string (a URL to the file) */;
await view.open(file)

// get CFI from index and range (same as the previous example)
view.getCFI(index, range)

// or search in the book
const generator = view.search({
    query: 'your text here',
    matchCase: true,
    matchDiacritics: true,
    matchWholeWords: false,
})
for await (const result of generator) {
    console.log(result)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants