Generate epubcfi from unique text #1430

orhnk · 2024-11-16T12:47:21Z

My attempt: https://github.com/orhnk/Annot2CFI

There are some problems with miscalculated cfi's like:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
  <title>Bir Dinozorun Anıları</title>
  <link href="../Styles/main.css" rel="stylesheet" type="text/css" />
</head>

<body class="part" epub:type="part">
  <h1 class="bolum"><span epub:type="pagebreak" id="page7" title="7"></span>BİRİNCİ BÖLÜM</h1>

  <h2 id="sigil_toc_id_1">Yaşlılık ve Ölüm</h2>
</body>
</html>

EXPECTED: epubcfi(/6/8!/4/2,/3:0,/3:7)
FOUND: epubcfi(/6/8!/4/2,/1:0,/1:7)

The library I use cannot find that last node's possition correctly.

I'm going to use this program to fetch annotations from my kobo device (and hopefully a lot of people are going to use it :D)

What should I do?

Any integration ideas for your epubcfi.js? @johnfactotum

Or maybe foliate can have a feature like fixing up some epubcfi's by itself if it's easy to detect like the above unique text example?

A lot of the text people highlight from the book is unique. It's like... Why do you think they highlight it then?

Yours truly

The text was updated successfully, but these errors were encountered:

orhnk · 2024-11-16T13:03:56Z

I should also say that this was my solution. Anything which makes more sense, makes more sense.

johnfactotum · 2024-11-21T11:07:42Z

Yes, it does make sense to use the text to correct the CFI. (There's actually something similar to this in the CFI spec. It's called a text location assertion, though such assertions contain not the highlighted text itself, but rather the text that precedes or follows it.)

Anyway, to implement this, the main issue is not really related to CFI, which can be straightforwardly generated once you have the correct nodes and offsets. The problem mainly depends on how you would like to match the text, and that depends to some extent on how the text is generated from the book in the first place. foliate-js provides a search.js module, which is used to implement "Find in Book" in Foliate, and one can use it for this purpose as well.

orhnk · 2024-11-22T15:26:51Z

@johnfactotum I have completed my project but the library that it depends on is not fully capable of generating cfi's. It puts /1 to every text node location. But it be different sometimes. For fully capable program, I need to use foliate's own epubcfi generator that I will use it to generate a json file that includes all paragraphs with all cfi's for each. What do you suggest to me?

Also your library is hard to understand for me. I don't know js. I use chatgpt and common sense. thank you.

johnfactotum · 2024-11-23T01:37:54Z

Ah, I see. Well, you're using epub-cfi-generator, which according to the readme, is simply using readium-cfi-js. So the bug your seeing is readium/readium-cfi-js#23, which has been fixed upstream. So you can either ask them to update their copy of readium-cfi-js, or just use readium-cfi-js yourself.

If you want to use foliate-js, it can be indeed a bit more involved. But here's an example:

import { makeBook } from './foliate-js/view.js'
import * as CFI from './foliate-js/epubcfi.js'

const file = /* a Blob or File object, or a string (a URL to the file) */;
const book = await makeBook(file)

const index = /* the index of the spine item, e.g. `0` for the first section */;
const range = /* a DOM Range object of the highlighted text */;

const baseCfi = book.sections[index].cfi ?? CFI.fake.fromIndex(index)
const cfi = CFI.joinIndir(baseCFI, CFI.fromRange(range))

Alternatively, you can simply use the foliate-view element, though I'm not entirely sure whether this would work in Node.js. With this you can also use view.search() to find text and get CFIs directly.

import './foliate-js/view.js'

const document = /* a DOM Document, which if you're in Node.js, you need to create with e.g. JSDOM */;

const view = document.createElement('foliate-view')
const file = /* a Blob or File object, or a string (a URL to the file) */;
await view.open(file)

// get CFI from index and range (same as the previous example)
view.getCFI(index, range)

// or search in the book
const generator = view.search({
    query: 'your text here',
    matchCase: true,
    matchDiacritics: true,
    matchWholeWords: false,
})
for await (const result of generator) {
    console.log(result)
}

orhnk added the enhancement New feature or request label Nov 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate epubcfi from unique text #1430

Generate epubcfi from unique text #1430

orhnk commented Nov 16, 2024 •

edited

Loading

orhnk commented Nov 16, 2024

johnfactotum commented Nov 21, 2024

orhnk commented Nov 22, 2024

johnfactotum commented Nov 23, 2024 •

edited

Loading

Generate epubcfi from unique text #1430

Generate epubcfi from unique text #1430

Comments

orhnk commented Nov 16, 2024 • edited Loading

orhnk commented Nov 16, 2024

johnfactotum commented Nov 21, 2024

orhnk commented Nov 22, 2024

johnfactotum commented Nov 23, 2024 • edited Loading

orhnk commented Nov 16, 2024 •

edited

Loading

johnfactotum commented Nov 23, 2024 •

edited

Loading