Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Image occlusion #1

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Feature: Image occlusion #1

wants to merge 8 commits into from

Conversation

krmanik
Copy link
Owner

@krmanik krmanik commented Jul 25, 2022

For keeping tracks of commit hashes.

@glutanimate
Copy link

glutanimate commented Sep 2, 2022

Hey Mani,

Appreciate the work you're doing here! I think it's nice that we're finally moving towards a native and cross-platform implementation of Image Occlusion, and this looks like a promising effort.

However, IMHO, I fear that the approach chosen here is on the wrong track because it follows the add-on's footsteps too closely. Don't get me wrong, I think you've done a great job working within the bounds of the add-on's faulty design. But given that you're still hard at work on this, I thought I'd chime in early, so as to perhaps talk about what direction I was hoping an eventual native implementation could take in order to surpass the add-on's design and not repeat its mistakes.

(apologies in advance for the large text ahead)

Would naturally also very much be interested in your thoughts, @dae and @hgiesel

The Four Fundamental Design Flaws in Image Occlusion Enhanced

As third-party solutions, Image Occlusion Enhanced and its predecessors were forced to make a lot of questionable design decisions that I would hope a first-party solution could side-step. In particular I'd say that the add-on suffers from four fundamental flaws which accordingly are also present in this implementation:

1. Storing Machine-Readable Data in a User-Editable Space

One of the most frequent bug reports I've gotten over the years (literally hundreds of times) is users ending up with an invalid IOE note due to either accidentally editing the ID field and/or one of the mask fields. Given that these fields live among other fields like Extra 1 and Remarks that are meant to be edited by humans, this happens quite easily. Similarly, users are typically unfamiliar with note types that contain data that's meant to be programmatically parsed rather than manually edited. So edits out of not knowing what they're doing happen very frequently.

(IOE does take precautions to at least hide the ID field via hacky edits to the editor, but this doesn't prevent users from happily introducing accidental changes on mobile, or doing so when using cards from other users without IOE installed. And of course it doesn't protect against other types of destructive edits.)

To paint a scenario which I think we'll be seeing quite frequently going forward: Imagine random user Bob creating occlusions on a fairly large screenshot, only to realize that it's unwieldy to review on a smaller screen. So he heads into the note editor to resize the base image, as he's used to doing for his regular notes. Now suddenly the masks no longer line up (and only for one of the shapes) and Bob is confused and sad.

Stories like these are super common. If not about this particular error flow, then about deleting a field by accident, or messing with the templates, and so on and so forth. The more data meant for machines we expose to users, the more frequent they run into situations where they can make mistakes that are not easy to recover from.

I would imagine this problem to be particularly severe with the <embed> approach you mentioned (despite potentially having many advantages in other areas). As elaborated on in the next section, IOE uses the ID field to group multiple notes together. So an accidental edit breaks that link, making the note uneditable which is pretty bad to begin with.

But in a note type where the ID field carries rendering the template on its shoulders, a single accidental character will completely break the note with no easy way to recovery.

2. Generating Multiple Notes Rather than One

Making IO notes editable was one my key contributions to the original IO implementations that Tiago Barroso and Steve AW came up with back in 2012-2013. But to do so I had to use this crutch of grouping multiple notes and files together by correlating them using an ID. As mentioned above, this is an inherently fickle design choice that brings a lot of complications with it, both on the UX and development side of things:

First off, it fundamentally breaks Anki's concept of keeping card-generating information closely grouped together within atomic records. Regular notes and cards are always in sync (after checking for empty cards): If the data to generate a card is there, the card will be present in the collection. Users can't delete a card on its own right.

Not with IOE: Due to being split across multiple notes, users can easily accidentally delete a single note, and then they suddenly end up with a set of cards where Anki no longer asks them to identify one of the image labels. Recovering from this state is not an obvious path: Performing a database check, or looking through the note for an accidentally emptied field won't help as it does with any other note type. Rather, users have to be aware that IOE is treated specially, and think of finding one of the constituent notes, manually draw it up in the IOE editor, and then prompt the add-on to regenerate the note. Needless to say, this is very complicated, especially for users who might already be struggling with Anki's complexity to begin with.

Secondly, splitting card-generating information across multiple notes fundamentally breaks the algorithm's ability to space overlapping cards that would otherwise hint towards each other. Now this is a point that has some nuance to it because depending on how you occlude your image and which card generation mode you choose, sibling burying & spacing might either be desirable or not. I would argue that for Hide All, Guess One, which is the most frequently used occlusion mode, sibling burying will not be desirable in most cases. However, as the image equivalent to text clozes, Hide One, Guess One notes can very much benefit from sibling burying. Alas, with IOE's note fragmentation the scheduler has no idea that it's dealing with cards that give each other away.

This is yet another shortcoming that users have filed feature requests about quite frequently over the years.

Third, with information split across multiple notes, users have to use the IOE editor to even perform simple edits like giving the cards a different title or updating a remark. If they update the information in one note, the changes will not propagate to the others, which is a very obscure behavior compared to any other note type.

From an implementation standpoint, it also introduces a fairly tough conundrum: How do you treat information going out of sync between different notes in the same group? When editing a single note where manual edits have been made outside the IOE editor, should these changes be propagated to all other notes? Or should the content in the most recently edited note take precedence?

In its current implementation, IOE offers two fields that are not synced during edits (Extra 1 and Extra 2), so as to allow users to add label-specific information. But the fact that these fields are treated specially is not immediately obvious and based on the IO notes I've seen in the wild, most users aren't even aware that this feature exists.

Taken together, all of this adds a lot of opaqueness and confusion to how the note type works, and introduces a lot of additional cognitive load that a better implementation (with first-party support) could easily abstract away.

It seems like the implementation here does not support editing existing occlusions, yet. Believe me when I say this, @krmanik, implementation-wise I truly believe this will be annoying to tackle, given the multi-note set-up.

3. Using SVGs to Store Mask Data

(I realize that this section is alleviated somewhat by the clever <embed> solution you proposed, @krmanik, but bear with me for a minute, as I think it's important to recap the status quo for IOE because some of the faults of an SVG-based approach still apply.)

When Tiago decided to go with an SVG-based masking approach back in 2012, it was probably due to three reasons (and please correct me if I'm wrong @tmbb):

  1. It would allow creating a card template that would not rely on JS, thus making it reliably work across all platforms in an ecosystem where JS capabilities vastly differed between apps and where they were just much poorer in general compared to today (e.g. <canvas> not being supported in Android web views until Android 5.0).
  2. It would allow using a powerful existing vector graphics editor like SVG-Edit to implement the editing UI.
  3. There's beauty in the simplicity of just overlaying a transparent mask over a base image.

However, ten years down the line, I would argue that none of these issues still reasonably apply:

  1. Anki desktop, AnkiMobile, AnkiDroid, and AnkiWeb all now offer first-class support for modern JS and add-ons like Closet and Enhanced Cloze have proven that it can be used to reliably design dynamic card templates.
  2. <canvas> and the many helpful libraries assembled around it make it possible to reasonably build a simple graphics editor that's custom-tailored to IO's needs (as you did in this PR!)
  3. Simplicity also comes at the cost of inflexibility, and is crucial to why the capabilities of IO's card design haven't evolved too much over the years

At the same time, ten years of SVG-based image occlusions have made it quite clear that this approach comes with a lot of disadvantages:

  1. Probably the single-most frequent support request I've seen for IOE over the years is users struggling with "broken" cards due to incomplete media syncing between clients. It's understandable given how many small files you end up with as you use IOE.
  2. Once you've used IOE for a bit, these files can actually become a major performance issue on platforms like AnkiDroid where filesystem access is slower (from what I understand this has been even more of a problem since the introduction of Scoped Storage)
  3. Using SVG files makes it tough to implement features like click-to-reveal on single shapes, or scrolling to the prompted-for mask, which have been among the most requested feature additions over the years.
  4. Similarly, SVG files have made it tough for template authors to iterate on IO's card design to enable more interesting card types. Easier access to the underlying mask data and the ability to actually manipulate it during card display could open up a lot of new card designs.

In summary, the current approach of using SVG files, and in particular one SVG file per prompted-for mask, was a suitably solid solution for its time, but not an approach I would be following if I were to rewrite the add-on today.

4. Separate UI Flows for IO and Regular Notes

Let me show you a quick dramatic re-enactment of a UI flow that I've seen users get wrong hundreds of times over the years (using IOE as it's easier to show what's happening):

Screencast-2022-09-02_06.16.14.mp4

So what happened here? Being used to adding notes via the Add button, and expecting a modal on top of a modal not to be its own thing, Bob was assuming he'd have to press Add to actually save the changes he made in the IOE editor. It didn't work, so now Bob is sad and confused.

If you're used to how IO's UI flow works, this seems very hard to get wrong, especially with all the nice confirmation messages that the dialog presents. Yet this is one of the most frequent support requests I see. To me this makes it abundantly clear that this is not intuitive UI design. Don't get me wrong, it's an understandable design choice for an add-on to make, as a separate interface is much easier to maintain than one that's injected into an Anki view. But it's not good UX.

Let me expand more on this as – beyond the confusion that some new users experience – there are also very real shortcomings to the IOE workflow that even experienced long-term users run into every day:

  • If I'm alternating between adding text notes and image occlusions, my stickied fields do not transfer back and forth (without closing the IOE dialog). In fact, at the moment, I can't even sticky a field in IOE
  • If I want to bold a term or introduce other formatting, I can't do so because IOE's editor does not support that
  • If I want to drop in an image in an Extra field, I can't do so, either, because drag-and-drop does not work in the IOE fields tab
  • If I added a tag in AddCards by accident rather than in IOE's tag editor, that tag won't be part of the generated note

All of this is to say: A separate dialog is a reasonable design choice for an add-on, but I would really like us to avoid its faults in a first party solution.

How a First Party Solution Could Do Things Better

So this was a lot of rambling on why IOE's design is bad, but what would a better first-party implementation actually look like? Given the luxury of being able to modify Anki's code, what decisions would we take differently? And what foundations would we have to lay to enable these decisions?

Turns out we can get quite far with Anki's existing tools and some small pinpoint changes! Let's dive into them.

Building a First-Party IO Note Type

At its core, an IO note type needs two features to work:

  • A way to store data about masked areas of an image
  • A way to generate an arbitrary number of cards based on that data

As any avid Anki user can tell, both of these are already feasible because fields allow arbitrary data to be stored as strings, and clozes enable generating a dynamic number of cards per note.

Let's put the two together to think through how a Cloze-derived IO note type could work. In this example we'll be following an SVG-based data storage approach, but as elaborated in the next section, all of what's shown here would also work with text-only, using <canvas>.

Prototyping an Image Cloze

Imagine the following field set-up and content:

Hide One, Guess One Hide All, Guess One
Screenshot_20220902_085613 Screenshot_20220902_091221

Where each field is defined as follows

Name Type of data Protected
Occlusions Cloze string generating an IO card. Each line contains a mask_id
referring to an SVG entity, and an enum-like Cloze set-up with three
states that a particular mask_id can be in: shown, hidden,
or prompted for. This is consumed by a template script
in order to render the resulting occlusions.
yes
Base <img> embed of base image (use your imagination ;) ) yes
Mask <img> embed of full SVG mask (same) yes
Extra Extra info (I know, surprising!) no

What would the generated cards for each occlusion mode look like? Well, something like this:

Hide One, Guess One Hide All, Guess One
Screenshot_20220902_092027 Screenshot_20220902_092046

Look at that – all the information we need to determine which shapes to cover, prompt for, and reveal contained within a simple text cloze, ready to digest by whatever mask renderer we choose to implement. Neat!

(naturally the format above is just an example. I'm sure there are more nicely machine-readable formats that one could come up with. Also, specifying a hint is not necessary to determine the current cloze, but beyond being a nicer illustration than ..., it also showcases that cloze hints could be used to provide, well, label-specific hints)

Rendering the Image Cloze

So where do we go from here to get to a masked picture? Well, it depends on the approach we want to choose, but fundamentally it's the same story in both cases:

  1. Identify the masks to draw, either by correlating something like a mask ID to a referenced SVG, or by looking at actual inline shape data
  2. Identify their state by looking at the rendered cloze text
  3. Render everything!
SVG-based Approach

For the SVG-based approach we would use @krmanik's <embed> solution or svg-inject

<canvas>-based Approach

For the canvas-based approach (and this one's my favorite due to the issues with SVG files stated above), we'd have to slightly alter the cloze-generating text to contain the actual shape data.

This could look as follows for an occlusion that only contains rectangles:

rect(100,200,300,400): {{c1::hidden::prompted}}
rect(34,45,230,400): {{c2::hidden::prompted}}
rect(62,44,677,400): {{c3::hidden::prompted}}
rect(345,200,678,676): {{c4::hidden::prompted}}
rect(234,23,435,34): {{c5::hidden::prompted}}
rect(345,234,300,400): {{c6::hidden::prompted}}

Where rect's would be specified using <canvas> args, i.e. (x, y, w, h). For other shapes we'd use a different identifier and other arguments accordingly. Any entity we need to draw, we'll find a way to stringify.

The key takeaway is this: Using the pre-existing set of tools that Anki's Cloze note type offers, we can easily built an Image Occlusion type that follows our desired single-note paradigm. No hacks involved, and no overhauls to Anki's template system needed.

However, there a few tiny adjustments to Anki's note type system that we do need to make. Let's jump into the next section to see what they are.

Laying the Right Foundations for IO Note Types

So far we've eliminated two key shortcomings of IOE: Generating multiple notes, and storing mask data as a myriad tiny SVG files. One existing pain point still remains open in making too much data user-accessible, and we also managed to introduce a new one by switching to a Cloze note type. Both are easily alleviated:

Pain Point 1: Preventing Accidents

In the first part of this post I put particular emphasis on the issues that come with exposing data to users that's not meant to be modified by users. As stated before, this becomes even more important in a note type where the actual mask generation is dependent on data contained in fields. So let's protect it! Here's how:

I would propose that we introduce a new boolean under Notetype.fields.protected which allows setting a field as protected. Protected fields would behave in the following way:

  • They would either be excluded from Editor instances, or added as an immutable UI element, indicating that they are protected via their styling and e.g. a lock icon
  • (less important) They would be impossible to rename, delete, or move via the fields editor
  • They would remain writable via programmatic means, and of course also be present in deck exports

In the note type design proposed above, all three fields Occlusions, Base, and Mask would be marked as protected because the only way they should be editable is via the IOE interface. Edits outside of that do not seem desirable in any normal usage scenario.

Introducing a flag like that seems conceivably feasible and would eliminate a whole category of support requests with just one fell swoop.

Pain Point 2: Controlling Sibling-Burying and -Spacing Per Note

This is something we've talked about before in other contexts @dae, but I think with the advent of native IO in Anki, it's time that we finally tackle it.

As mentioned before, what's unique about IO is that, depending on the occlusion mode chosen, sibling spacing can either be harmful or helpful:

Hide All, Guess One notes cover all image labels on the front and only reveal one on the back, so answering one card does not immediately hint towards the answer of another. For the large majority of users, especially those used to IO's existing behavior, sibling spacing would be undesirable here.

Hide One, Guess One on the other hand acts exactly like an image version of a regular Cloze: All labels except the one prompted for are visible on the question side, and so switching between different cards of the same note within a session can give answers away. This is where sibling spacing would shine.

So how do we account for both scenarios?

By adding a flag that allows callers to disable sibling spacing on a per-note basis. We could use the notes → data column for this, encoding the flag as a JSON property, e.g. {"siblings": false}. We would then alter the scheduler's logic to exclude notes flagged in this way both from sibling burying (inter day) and sibling spacing (intra day). I.e., cards generated by a note marked as not containing siblings would be reviewable in sequence with no spacing or burying active whatsoever.

The End Result

By making use of Anki's existing toolset in a new way and adding only an ever-so-tiny amount of additional accommodations for image occlusions to Anki's backend, we've come out with an IO note type that is vastly superior to the IOE note type in every conceivable way, setting up a stable foundation for the next ten years of IO's existence.

Why Getting Things Right from the Start Matters

As you might have noticed, I didn't address any UI concerns in the proposal above. This was by design. While I truly believe that IOE's UX issues are a massive pain point for users, it's a deficit that we can easily tackle iteratively. However, I don't think this applies to a couple of other things:

Design Decisions that Need to Be Made Early

Defining the New IO Note Type

Six odd years or so ago when I took over maintenance of IO from Tiago, I was faced with the decision on whether or not to break compatibility with the existing note type in order to implement editing and other essential features. I ended up doing so and this turned out to be exactly the right decision. However, I was patently aware that this is not something you want to do frequently, so getting things right from the beginning was crucial.

I would see us in the exact same spot here. We have to break away from the design decisions of the past, but we also have to make sure that whatever new decisions we settle on are correct from the get-go. We do not want to be in a situation where we have to maintain multiple legacy versions of a new Image Occlusion note type.

It's exactly for that reason that I think it's important not to fall victim to moving through the motions here and actually take some time to re-evaluate whether the design decisions made ten years ago are still valid.

Maintaining Backwards Compatibility

As for maintaining backwards compatibility with the existing IOE note type, I would say that we don't have to. The beauty with image occlusions is that they work without the add-on being present. So the only use case for backwards compatibility, once the new note type lands, would be to to edit legacy IO notes. But compared to reviewing cards, this is a much less frequent action. In the instances where it's necessary, the add-on would still be at users' disposal to perform a quick edit.

Should we want to go a step beyond that, then we could extend either Anki or the add-on with a (scheduling-preserving) note porting feature. I did the same for the switch from IO to IOE back in the day and it ended up working out well.

The important thing to me is that we do not feel encumbered by a supposed need to continue supporting legacy IOE note types, when even the IO add-ons themselves broke compatibility with their older counterparts multiple times in the past.

Defining an Add-on API

Clearly I'm biased here, but given the history of Image Occlusion and it being an mainstay of the add-on community, it would be very important to me to design the native variant in a way that's extensible from the get-go. I would see this as absolutely crucial for the community to keep innovating on the idea of IO.

This is a bit of a personal pain point of mine in general as I feel like the increasing move towards a web frontend + Rust backend architecture has put many areas of Anki out of reasonable reach from add-on developers (consider that to this day, I think there might at most be one or two add-ons that target the new stats screen, or the new deck options).

Let us please avoid this with IO, so that key ways to extend the feature are accessible to add-ons from day 1 (e.g. adding new tools to the toolbar, new occlusion mode buttons, getting and setting the current masks). And I'm of course happy to put in work on my part to help design these interfaces.

Design Decisions that Can Come Later

As mentioned before, I think that switching away from the standalone dialog paradigm could have one of the biggest positive impacts to people's workflows. However, it's something that we can deliver at any point in the future and improve upon iteratively.

There are a lot of ideas I have here as well that I would be happy to share, but I think that's a post for another day, given how long this one's already got.

If we are to focus on UI tasks now, then I would say that ironing out the edges of the new IO editor comes first. The work spent here would not go to waste as even if we are going to switch to a single-dialog design, the editor component is likely going to stay as is.

But generally speaking, to me any UI-related work has to come second to building the foundation of the add-on in terms of a new note type.

tl;dr

All of the above basically comes down to the following takeaway IMHO:

It would be a major shame not to utilize the lessons learned from IO's ten year tenure to look at the feature with a fresh pair of eyes. Settling for an implementation based on the current state of the add-on would be setting us up for complacency and place Anki in a disadvantageous spot next to the Anki clones that release every other month. We have the unique opportunity here to take a bit more time to re-evaluate, re-think, and re-design, so let's use it to build something better than the hacks add-on developers like me came up with over the years. Compared to the benefits that this would offer, I think the extra work we'd have to spend is surprisingly small and well worth the effort.

@glutanimate
Copy link

And just to make sure this meaning didn't get lost along the way of that way-too-long comment: @krmanik I think you've done a great job setting up a very solid foundation here, so nicely done!

@krmanik
Copy link
Owner Author

krmanik commented Sep 2, 2022

Thanks for providing detailed issues and suggestions from existing IOE addon. I will implement and consider the suggestions. I have re-read the comments multiple times to reply to the suggestions. I will comments again if I missed something.

1. Storing Machine-Readable Data in a User-Editable Space

The ID (hidden) field should be avoided. The Cloze Template you mentioned should be used. I will check the implementation to integrate it into the feature.

2. Generating Multiple Notes Rather than One
First off, it fundamentally breaks Anki's concept of keeping card-generating information closely grouped together within atomic records. Users can't delete a card on its own right.
Third, with information split across multiple notes, users have to use the IOE editor to even perform simple edits like giving the cards a different title or updating a remark.

I think the implementation should be like this if Edit button clicked then it check for note types, if note types matched with Image Occlusion then Image Occlusion (SVG) editor should open and load the occlusion in the window with matching images, when edits finished then it updates the matching ids of react and notes.

Secondly, splitting card-generating information across multiple notes fundamentally breaks the algorithm's ability to space overlapping cards that would otherwise hint towards each other.

An id for same images occlusion should be stored in card template (not modifiable) and scheduler should show the card in the order.

3. Using SVGs to Store Mask Data

I closely followed the IOE addon so used SVG but <canvas> should be used. Again this has to be checked and implemented and integrated. I will check the <canvas> implementation.

4. Separate UI Flows for IO and Regular Notes

Again I followed the IOE addon, but I will use a different approach. Maybe the feature should use a different name other than Image Occlusion
In Anki, Tools -> Add Image Occlusion Notes
In AnkiDroid, Options Menu -> Add Image Occlusion Notes

Then it asks for an image to import (on mobile devices it should ask for image capture or select from folder) then the editor window will be loaded with the image.

Building a First-Party IO Note Type

The suggested note type with Cloze Note Type and Occlusion, Image, Mask and Extra (maybe Notes) will be implemented. The benefits of Cloze are that one notes with clozes need to send to the backend and the backend will generate all the notes for us. This solution is better.

Rendering the Image Cloze

I have used <embed> and noticed some issues when loading of svg mask. The svg mask on first load does not scale and transform but in next load it gets corrected. So, I think I should consider the <canvas> (needs to check).
In canvas, the coordinates and size are stored in cloze notes so maybe js needed to re-scale to fit on smaller screen devices.

Pain Point 1: Preventing Accidents

In AnkiDroid, during reviweing, if we edit the notes and change the note type then all edit fields are disabled. So, I think it is better to have protected note fields feature to prevent the deletion of note fields in the note types.

Pain Point 2: Controlling Sibling-Burying and -Spacing Per Note

With backend code available on Anki, AnkiDroid, AnkiMobile, the issues can be resolved, (maybe feature request to show connected notes of same image occlusion in order).

By making use of Anki's existing toolset in a new way and adding only an ever-so-tiny amount of additional accommodations for image occlusions to Anki's backend, we've come out with an IO note type that is vastly superior to the IOE note type in every conceivable way, setting up a stable foundation for the next ten years of IO's existence.

This will be considered in implementing this features.

Defining the New IO Note Type

I think in svg editor window on fields tab, the list of fixed fields will be shown and can not modified only can renamed. And add button to allow the user wants to add more fields. So, the new note types with new fields will be added to Anki. The fixed fields which can be removed are Occlusion, Image, Mask and Extra.

Maintaining Backwards Compatibility

This is new features and I think the Occlusion, Image, Mask and Extra fields are more enough for basic image occlusion notes if user needs more fields then in editor window they can add. If new fields are added then it is still built on top of [ Occlusion, Image, Mask, and Extra ] fields in note types and will not break it.

Defining an Add-on API

I also think that the Add-on API should be better, so that user only have to work on front-end part which is easier. I am new to rust and it is hard to figure out the backend code. But if API exposed then only front end projects as addon should be implemented as I have created feature request [Ideas/Features] Cross platform js addons support using Rust backend and Svelte frontend

Design Decisions that Can Come Later

I have tried to integrate svg-edit but it does not works well on touch-based devices. So, I have started with paper.js to create SVG editor from scratch and then switched to fabric.js and it works well.

I think the SVG editor using fabric js will be better because of more controls over the tools, generated data, and supports on touch-based devices.

The current progress is prototype of the feature that will be implemented and your suggestions greatly improved the features. I will do my best to considered all the suggestions in next commits in the PR.

@glutanimate Thanks again.

@dae
Copy link

dae commented Sep 3, 2022

Indeed, thank you very much for the detailed feedback and suggestions @glutanimate, it all sounds reasonable. Just one note for now: the queue building code can't touch the notes table, as that will tank performance. We'd need some other way of designating cards to exclude from sibling spacing, like either a per-card property, or separate notetypes for reveal-one and reveal-all.

@krmanik
Copy link
Owner Author

krmanik commented Sep 3, 2022

I have generated notes using following data and I think canvas approach is better. It is generated using Cloze Note Type.
This is generated deck Image Cloze.zip, extract the deck from zip file to test it.

In Occlusions field, the x, y, w, h and shape is easier to get from fabric.js and using this the div data will be created.

<div id="io_cloze">
    <div id="c1" shape="rect" ioxywh="0,362,160,104">
        {{c1::hidden::prompted}}
    </div>
    <div id="c2" shape="rect" ioxywh="61,128,245,82">
        {{c2::hidden::prompted}}
    </div>
</div>

In Image field, path to the image, also there is no need of other files like svg for question and answer mask.
human_brain.png

The Card Template for front and back, it is same for both sides, also it works for both shown and hidden.

<div style="display:none;">{{cloze:Occlusions}}</div>

<canvas id="canvas" style="background-size: 100% 100%; max-width: 100%; max-height: 90vh;">
</canvas>

<div style="display:none;">{{Image}}</div>

<script>
    var canvas = document.getElementById("canvas");
    var ctx = canvas.getContext("2d");

    var img = new Image();

    img.onload = function () {
        canvas.width = img.width;
        canvas.height = img.height;
        ctx.drawImage(img, 0, 0);
        drawShapes(ctx);
    };

    img.src = document.getElementById("img").src;

    function drawShapes(ctx) {
        var ioCloze = document.getElementById("io_cloze");
        for (cloze of ioCloze.children) {
            var shape = cloze.getAttribute("shape")
            var xywh = cloze.getAttribute("ioxywh")
            xywh = xywh.split(',').map(Number);

            if (cloze.children.length) {
                if (cloze.children[0].className == "cloze" && cloze.children[0].innerText == "[prompted]") {
                    draw(shape, "red", xywh);
                }
            } else {
                if (cloze.innerText.trim() == "shown") {
                    draw(shape, "green", xywh);
                }
            }
        }
    }

    function draw(shape, color, xywh) {
        ctx.fillStyle = color;

        switch (shape) {
            case "rect":
                ctx.fillRect(xywh[0], xywh[1], xywh[2], xywh[3]);
                break;
        }
    }

</script>

Demo

@glutanimate
Copy link

glutanimate commented Sep 4, 2022

You're very welcome, guys! I'm glad the suggestions were helpful. IO has always been particularly dear to me, so I'm passionate about making sure its native implementation is built ontop of the best foundation yet. Happy to lend my hand where I can to achieve that :)

@krmanik thanks for jumping right on this! The note type looks great. I took it for a spin on Anki desktop, AnkiDroid, and AnkiMobile, and it seems to be working perfectly across all. I also love the simplicity of its design – it really speaks for the canvas solution being the right approach, I feel.

A couple of notes on the notetype:

I think it's clever that you went with HTML for the Occlusions data structure as its in line with the assumption that fields should contain HTML, while also making it very easy to consume by the template script. It does hide the underlying shape data away a bit when looking at the raw note, but that might be a good thing rather than a disadvantage. Given that the field won't be manually editable anyway, at the end of the day its presentation in the editor doesn't matter anyway.

However, one concern I would have is whether these custom attributes could end up being mangled at some point in their lifecycle from creating the note, syncing it, editing it across multiple platforms, and reviewing the cards. I.e. @dae, is there any step along this way where either now or in the future you could imagine something like an HTML cleaning logic dropping unrecognized attributes? Perhaps in that case we could use data-* attributes.

Beyond that, my only recommendation for now would be to move the <div id="io_cloze"></div> into the template as it is static across notes (if you didn't have a particular reason for placing it inside the field that is, e.g. to encode additional data on a per-note level).

I was also thinking about whether there is some way we might be able to reuse the front template on the back, given that it's the same on both sides, but it seems like there isn't really a {{FrontSide}} tag equivalent that works for clozes (or is there @dae? From my tests, {{FrontSide}} will render clozes in the front-side state).

And a few comments on some of your other points @krmanik:

I think the implementation should be like this if Edit button clicked then it check for note types, if note types matched with Image Occlusion then Image Occlusion (SVG) editor should open

I think that directing users directly into the IO editor is the smoothest approach. However, we have to keep in mind that the Fields tab might not have the same capabilities as the regular editor, and so users might still need to access the fields in the regular editor for now (e.g. to insert images or apply text formatting).

So I would vote for opening the regular editor for now when clicking on the Edit button. If we do tackle merging the IO editor into the regular editor (which would be amazing), then that would no longer apply of course.

An id for same images occlusion should be stored in card template (not modifiable) and scheduler should show the card in the order.

Sorry, I think my original comment might have been ambiguous there. Allowing card templates to specify a custom review order is a very cool idea, and something that would benefit quite a few card designs, but not something we need in IO for now IMHO. I would say that all we need to do for now is to make sure that sibling spacing is active for "Hide all", and disabled for "Hide One".

Again I followed the IOE addon, but I will use a different approach. Maybe the feature should use a different name other than Image Occlusion. In Anki, Tools -> Add Image Occlusion Notes

I think that's a reasonable approach to eliminate confusion in the UX paths, but the key disadvantage is that it would make the workflow when alternating between text notes and IO notes much more janky as users would have to switch between the editor and Tools menu. I think that, on desktop at least, most users will typically switch back and forth between creating image occlusions and normal notes (on mobile this might be different). So any additional friction introduced here would not be good.

For now I would vote for sticking with the UX flow that IOE had, even though it's not great. If we introduce a new flow now, only to then switch away from it again once IO is integrated with the regular editor, that would probably be more confusing to users.

This is new features and I think the Occlusion, Image, Mask and Extra fields are more enough for basic image occlusion notes

👍, but I would recommend including a Header field, so that users have a space on the front that they can add free-form text to. This and the multiple different fields on the back (which can just be merged into one Extra field) are probably the most used auxillary fields in IOE.

I also think that the Add-on API should be better, so that user only have to work on front-end part which is easier. I am new to rust and it is hard to figure out the backend code. But if API exposed then only front end projects as addon should be implemented as I have created feature request [Ideas/Features] Cross platform js addons support using Rust backend and Svelte frontend

I agree that building out a great JS API should be our first and foremost priority here. It's the most natural way for desktop add-ons to extend a web view like IO's, and any progress made here would also help set up a foundation for a future cross-platform add-on system.

A few early thoughts on what I think we'd need to tackle to make sure that native IO has first-class support for add-ons:

  • Make sure that add-ons can inject HTML, CSS, and JS into the web view
    • This should already be feasible by hooking into gui_hooks.webview_did_inject_style_into_page
  • Make sure that Python callers can receive messages from the web view
    • Should also already be possible
  • Make sure that add-ons can hook into multiple stages of the editor's lifecycle, e.g.:
    • After the UI components are fully mounted, but before the image and masks are loaded / before the view is visible to the user
    • After UI, image, and masks are set
    • Before an occlusion is saved
  • During the appropriate stages, make sure that add-ons can do things like:
    • Extend the following areas with additional UI elements:
      • SideToolbar
      • StickyFooter
    • Query for the current mask data
    • Access the fabric.js canvas to be able to arbitrarily draw and remove shapes

@dae:

the queue building code can't touch the notes table, as that will tank performance. We'd need some other way of designating cards to exclude from sibling spacing, like either a per-card property, or separate notetypes for reveal-one and reveal-all.

Ah, that's good to know. Hmm, I was going to vote for the per-card property so that we don't create too many new note types, but now that I think of it, a per-notetype approach might be the right choice to start with: Crucially it would make switching between different occlusion modes incredibly easy as field content and mask presentation would be fully decoupled. I.e., we would not need to update the Occlusions field to change the clozes to mark masks as hidden on the front. That behavior would be fully encoded in the different card templates.

@krmanik
Copy link
Owner Author

krmanik commented Sep 4, 2022

Perhaps in that case we could use data-* attributes.

The data-* attributes will be used.

Beyond that, my only recommendation for now would be to move the <div id="io_cloze"></div> into the template as it is static across notes.

I will move the <div id="io_cloze"></div> into templates.

I was also thinking about whether there is some way we might be able to reuse the front template on the back, given that it's the same on both sides, but it seems like there isn't really a {{FrontSide}} tag equivalent that works for clozes (or is there @dae? From my tests, {{FrontSide}} will render clozes in the front-side state).

I tried but it does not work. So on the front side only the {{Occlusion}} field will be shown and on the back, all fields including {{Occlusion}} will be shown.

For now I would vote for sticking with the UX flow that IOE had, even though it's not great. If we introduce a new flow now, only to then switch away from it again once IO is integrated with the regular editor, that would probably be more confusing to users.

The UX implementation will be similar to IOE, if it needs to change it will be changed because it is just a menu option.

So I would vote for opening the regular editor for now when clicking on the Edit button. If we do tackle merging the IO editor into the regular editor (which would be amazing), then that would no longer apply of course.

It will be in consideration and will/may be implemented later, till then the regular editor will be opened on the Edit button click.

👍, but I would recommend including a Header field,

Then only these fields will be added in the Image Cloze note type.

  • Header
    For showing the header for the notes
  • Occlusions (hidden)
    For masking question and answer
  • Image
    For getting image src, sync and include in apkg file when export
  • Extra
    For everything else

I agree that building out a great JS API should be our first and foremost priority here. It's the most natural way for desktop add-ons to extend a web view like IO's, and any progress made here would also help set up a foundation for a future cross-platform add-on system.

The suggestions will greatly help addon developers in developing addons. I am developing this feature similar to csv-import without considering the add-ons.

Fabric js editor

There are some basic shapes in fabric js which will be used in Circle, Ellipse, Line, Polygon, Polyline, Rect, and Triangle. These shapes will be added to the editor.
Also for note type with grouped shapes, the cloze generation is also easy. If two rect are grouped together then the same cloze number will be used. For e.g.

    <div shape="rect" ioxywh="0,362,160,104">
        {{c1::hidden::prompted}}
    </div>
    <div shape="rect" ioxywh="61,128,245,82">
        {{c1::hidden::prompted}}
    </div>

@tmbb
Copy link

tmbb commented Sep 7, 2022

When Tiago decided to go with an SVG-based masking approach back in 2012, it was probably due to three reasons (and please correct me if I'm wrong @tmbb):

It would allow creating a card template that would not rely on JS, thus making it reliably work across all platforms in an ecosystem where JS capabilities vastly differed between apps and where they were just much poorer in general compared to today (e.g. not being supported in Android web views until Android 5.0).

True, this was one of the reasons.

It would allow using a powerful existing vector graphics editor like SVG-Edit to implement the editing UI.

This was another reason. The initial prototype was actually some code (not even written in Python, which I didn't know very well at the time) that read image files created with Inkscape (a desktop app to edit SVG files) and create cards out of that.

There's beauty in the simplicity of just overlaying a transparent mask over a base image.

Yes, simplicity was important at the time. If it were much more complex than it was it probably wouldn't have gotten off the ground.

This is a bit of a personal pain point of mine in general as I feel like the increasing move towards a web frontend + Rust backend architecture has put many areas of Anki out of reasonable reach from add-on developers

As much as I dislike Python as a language and as an application development platform, the extensibility that it allows for something like Anki is unparallelled, as well as quick iteration without a major language compiler, which would be the case with Rust. I can assure you that Image Occlusion would definitely not have happened if Anki were written in Rust (or C++ or whatever the equivalent would be for Rust at that time). The javascript part will probably allow for more extensibility, but AFAIK it seems to run in a sandbox, and the ability we had to bridge javascript into python was very important for the addon's development. I understand that Rust probably brings enormous benefit in maintainability, though

@krmanik
Copy link
Owner Author

krmanik commented Sep 13, 2022

@dae I have implemented the frontend with very basic shapes and three fields Occlusions, Image and Notes.
On first run with no notetype Image Cloze - Anki Ecosystem, the notes added without no errors. But on next run it gives following errors.

anki.errors.CardTypeError: Card template ⁨1⁩ in notetype '⁨Image Cloze - Anki Ecosystem⁩' has a problem.<br>Expected to find a field replacement on the front of the card template.

I have used add_or_update_notetype_with_existing_id method to add update the note types. Here my implementation will be check if note type exists or not, if exist then ignore otherwise add. Also check if templates modified or not.

pub fn add_or_update_io_cloze_note_type(&mut self, note_type_name: &String) -> Result<()> {
let nt = &mut Notetype {
id: NotetypeId(9358342513643),
name: note_type_name.into(),
config: NotetypeConfig::new_cloze(),
..Default::default()
};
let occlusions = "Occlusions";
nt.add_field(occlusions);
let image = "Image";
nt.add_field(image);
let notes = "Notes";
nt.add_field(notes);
let qfmt = DEFAULT_IO_FRONT_TEMPLATE;
let afmt = DEFAULT_IO_BACK_TEMPLATE;
nt.add_template(nt.name.clone(), qfmt, afmt);
return self.add_or_update_notetype_with_existing_id(nt, false);
}

Also what is best way to know current active selected decks and notetypes here.

pub fn get_image_cloze_metadata(&mut self, path: &str) -> Result<ImageClozeMetadata> {
let mut metadata = ImageClozeMetadata {
..Default::default()
};
metadata.data = read(path)?;
metadata.deck_id = 1;
// metadata.notetype = self.notetype_id()
Ok(metadata)

@dae
Copy link

dae commented Sep 14, 2022

I've pushed a fix here: ankitects@1d0b236

Also what is best way to know current active selected decks and notetypes here.

col.defaults_for_adding() provides the default notetype and deck when the add window is opened, but what we really want is the currently selected deck in the add window. The routine in mediasrv.py will need to locate the open Add window, and extract the deck id from it

Since we hard-code the notetype name, I think we only need the deck id, and not the notetype?


message AddImageOcclusionNotesRequest {
string image_path = 1;
bytes notes_data = 2;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels more complicated than it needs to be - instead of packing a bunch of items into a JSON string, why not list out the various fields like occlusion and notes in this proto message? That way there's no need to encode/decode them into JSON, and they can be accessed safely directly from the proto message.

@krmanik
Copy link
Owner Author

krmanik commented Sep 14, 2022

I've pushed a fix here: ankitects@1d0b236

Thanks, I will use the code to implement further.

Since we hard-code the notetype name, I think we only need the deck id, and not the notetype?

I was thinking about letting the user to add more fields to the notes but then decided to let the user input every other data into Notes field. So, only deck id will be used.

This feels more complicated than it needs to be - instead of packing a bunch of items into a JSON string, why not list out the various fields like occlusion and notes in this proto message? That way there's no need to encode/decode them into JSON, and they can be accessed safely directly from the proto message.

I will update the requests to fields and tags so there will no need to json stringify and parse. Thanks

@krmanik
Copy link
Owner Author

krmanik commented Sep 15, 2022

The routine in mediasrv.py will need to locate the open Add window, and extract the deck id from it

I have tried to get deck id from add window using this in imageocculusion.py. It is not the way to locate the Add window, it open up new Add window. So, what will be approach to do the above suggestion?

deck_id = aqt.addcards.AddCards(self.mw).deck_chooser.selected_deck_id

@krmanik krmanik force-pushed the image-occlusion branch 5 times, most recently from e404994 to e6e2999 Compare January 11, 2023 14:56
* implemented proto
- getting image data with path
- returning bytes, file name and current deck id to frontend
- frontend return generated occlusion and notes to backend

* Created image occlusion dialog to serve image-occlusion
* Added image occlusion in editor toolbar to open image occlusion page
* deck selection during addin note
* fields - header, occlusions, images, notes and tags
* basic tools implementation using fabric.js
     - rectangle
     - ellipse
     - group
     - delete
     - multiple select
     - undo, redo
      - triangle
      - circle
      - square
      - fille shape color
* move bottom tools to top bar
* added tool
      - ungroup grouped shapes
optimized the process of stroring and generating shapes data
@dae
Copy link

dae commented Feb 5, 2023

Sorry, I missed that you asked a question above. Are you still looking for an answer for that?

@krmanik
Copy link
Owner Author

krmanik commented Feb 5, 2023

Sorry, I missed that you asked a question above. Are you still looking for an answer for that?

The bind: in input tag for decks selection in Svelte page was not included, so that drop menu value was not set, but now I have fixed that by using bind: option.

This is current view of mask editor and note editor. The note editor have not advance feature (bold/italic/underline...etc) similar to note editor provided in anki. I have tried to include it but there are errors because it bridge pycmd command which is only available to Anki (PyQt5(6)) only. To make it available for mobile device, I think I should re-implement lighter version of note editor from scratch. More tools and features will be added to the mask editor.

@dae
Copy link

dae commented Feb 5, 2023

Given the demand for this feature, I'd rather get a basic solution into the hands of users than delay a release for months more trying to make it perfect. :-)

@krmanik
Copy link
Owner Author

krmanik commented Feb 5, 2023

Then I will create pull request after adding basic tools to note editor window.

@krmanik krmanik marked this pull request as draft February 6, 2023 17:37
@krmanik
Copy link
Owner Author

krmanik commented Feb 6, 2023

PR is create to upstream repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants