-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Image occlusion #1
base: main
Are you sure you want to change the base?
Conversation
e756a9c
to
18ddefe
Compare
Hey Mani, Appreciate the work you're doing here! I think it's nice that we're finally moving towards a native and cross-platform implementation of Image Occlusion, and this looks like a promising effort. However, IMHO, I fear that the approach chosen here is on the wrong track because it follows the add-on's footsteps too closely. Don't get me wrong, I think you've done a great job working within the bounds of the add-on's faulty design. But given that you're still hard at work on this, I thought I'd chime in early, so as to perhaps talk about what direction I was hoping an eventual native implementation could take in order to surpass the add-on's design and not repeat its mistakes. (apologies in advance for the large text ahead) Would naturally also very much be interested in your thoughts, @dae and @hgiesel The Four Fundamental Design Flaws in Image Occlusion EnhancedAs third-party solutions, Image Occlusion Enhanced and its predecessors were forced to make a lot of questionable design decisions that I would hope a first-party solution could side-step. In particular I'd say that the add-on suffers from four fundamental flaws which accordingly are also present in this implementation: 1. Storing Machine-Readable Data in a User-Editable SpaceOne of the most frequent bug reports I've gotten over the years (literally hundreds of times) is users ending up with an invalid IOE note due to either accidentally editing the (IOE does take precautions to at least hide the To paint a scenario which I think we'll be seeing quite frequently going forward: Imagine random user Bob creating occlusions on a fairly large screenshot, only to realize that it's unwieldy to review on a smaller screen. So he heads into the note editor to resize the base image, as he's used to doing for his regular notes. Now suddenly the masks no longer line up (and only for one of the shapes) and Bob is confused and sad. Stories like these are super common. If not about this particular error flow, then about deleting a field by accident, or messing with the templates, and so on and so forth. The more data meant for machines we expose to users, the more frequent they run into situations where they can make mistakes that are not easy to recover from. I would imagine this problem to be particularly severe with the But in a note type where the 2. Generating Multiple Notes Rather than OneMaking IO notes editable was one my key contributions to the original IO implementations that Tiago Barroso and Steve AW came up with back in 2012-2013. But to do so I had to use this crutch of grouping multiple notes and files together by correlating them using an First off, it fundamentally breaks Anki's concept of keeping card-generating information closely grouped together within atomic records. Regular notes and cards are always in sync (after checking for empty cards): If the data to generate a card is there, the card will be present in the collection. Users can't delete a card on its own right. Not with IOE: Due to being split across multiple notes, users can easily accidentally delete a single note, and then they suddenly end up with a set of cards where Anki no longer asks them to identify one of the image labels. Recovering from this state is not an obvious path: Performing a database check, or looking through the note for an accidentally emptied field won't help as it does with any other note type. Rather, users have to be aware that IOE is treated specially, and think of finding one of the constituent notes, manually draw it up in the IOE editor, and then prompt the add-on to regenerate the note. Needless to say, this is very complicated, especially for users who might already be struggling with Anki's complexity to begin with. Secondly, splitting card-generating information across multiple notes fundamentally breaks the algorithm's ability to space overlapping cards that would otherwise hint towards each other. Now this is a point that has some nuance to it because depending on how you occlude your image and which card generation mode you choose, sibling burying & spacing might either be desirable or not. I would argue that for Hide All, Guess One, which is the most frequently used occlusion mode, sibling burying will not be desirable in most cases. However, as the image equivalent to text clozes, Hide One, Guess One notes can very much benefit from sibling burying. Alas, with IOE's note fragmentation the scheduler has no idea that it's dealing with cards that give each other away. This is yet another shortcoming that users have filed feature requests about quite frequently over the years. Third, with information split across multiple notes, users have to use the IOE editor to even perform simple edits like giving the cards a different title or updating a remark. If they update the information in one note, the changes will not propagate to the others, which is a very obscure behavior compared to any other note type. From an implementation standpoint, it also introduces a fairly tough conundrum: How do you treat information going out of sync between different notes in the same group? When editing a single note where manual edits have been made outside the IOE editor, should these changes be propagated to all other notes? Or should the content in the most recently edited note take precedence? In its current implementation, IOE offers two fields that are not synced during edits ( Taken together, all of this adds a lot of opaqueness and confusion to how the note type works, and introduces a lot of additional cognitive load that a better implementation (with first-party support) could easily abstract away. It seems like the implementation here does not support editing existing occlusions, yet. Believe me when I say this, @krmanik, implementation-wise I truly believe this will be annoying to tackle, given the multi-note set-up. 3. Using SVGs to Store Mask Data(I realize that this section is alleviated somewhat by the clever When Tiago decided to go with an SVG-based masking approach back in 2012, it was probably due to three reasons (and please correct me if I'm wrong @tmbb):
However, ten years down the line, I would argue that none of these issues still reasonably apply:
At the same time, ten years of SVG-based image occlusions have made it quite clear that this approach comes with a lot of disadvantages:
In summary, the current approach of using SVG files, and in particular one SVG file per prompted-for mask, was a suitably solid solution for its time, but not an approach I would be following if I were to rewrite the add-on today. 4. Separate UI Flows for IO and Regular NotesLet me show you a quick dramatic re-enactment of a UI flow that I've seen users get wrong hundreds of times over the years (using IOE as it's easier to show what's happening): Screencast-2022-09-02_06.16.14.mp4So what happened here? Being used to adding notes via the Add button, and expecting a modal on top of a modal not to be its own thing, Bob was assuming he'd have to press Add to actually save the changes he made in the IOE editor. It didn't work, so now Bob is sad and confused. If you're used to how IO's UI flow works, this seems very hard to get wrong, especially with all the nice confirmation messages that the dialog presents. Yet this is one of the most frequent support requests I see. To me this makes it abundantly clear that this is not intuitive UI design. Don't get me wrong, it's an understandable design choice for an add-on to make, as a separate interface is much easier to maintain than one that's injected into an Anki view. But it's not good UX. Let me expand more on this as – beyond the confusion that some new users experience – there are also very real shortcomings to the IOE workflow that even experienced long-term users run into every day:
All of this is to say: A separate dialog is a reasonable design choice for an add-on, but I would really like us to avoid its faults in a first party solution. How a First Party Solution Could Do Things BetterSo this was a lot of rambling on why IOE's design is bad, but what would a better first-party implementation actually look like? Given the luxury of being able to modify Anki's code, what decisions would we take differently? And what foundations would we have to lay to enable these decisions? Turns out we can get quite far with Anki's existing tools and some small pinpoint changes! Let's dive into them. Building a First-Party IO Note TypeAt its core, an IO note type needs two features to work:
As any avid Anki user can tell, both of these are already feasible because fields allow arbitrary data to be stored as strings, and clozes enable generating a dynamic number of cards per note. Let's put the two together to think through how a Cloze-derived IO note type could work. In this example we'll be following an SVG-based data storage approach, but as elaborated in the next section, all of what's shown here would also work with text-only, using Prototyping an Image ClozeImagine the following field set-up and content:
Where each field is defined as follows
What would the generated cards for each occlusion mode look like? Well, something like this:
Look at that – all the information we need to determine which shapes to cover, prompt for, and reveal contained within a simple text cloze, ready to digest by whatever mask renderer we choose to implement. Neat! (naturally the format above is just an example. I'm sure there are more nicely machine-readable formats that one could come up with. Also, specifying a hint is not necessary to determine the current cloze, but beyond being a nicer illustration than Rendering the Image ClozeSo where do we go from here to get to a masked picture? Well, it depends on the approach we want to choose, but fundamentally it's the same story in both cases:
SVG-based ApproachFor the SVG-based approach we would use @krmanik's
|
And just to make sure this meaning didn't get lost along the way of that way-too-long comment: @krmanik I think you've done a great job setting up a very solid foundation here, so nicely done! |
Thanks for providing detailed issues and suggestions from existing IOE addon. I will implement and consider the suggestions. I have re-read the comments multiple times to reply to the suggestions. I will comments again if I missed something.
The
I think the implementation should be like this if
An id for same images occlusion should be stored in card template (not modifiable) and scheduler should show the card in the order.
I closely followed the IOE addon so used SVG but
Again I followed the IOE addon, but I will use a different approach. Maybe the feature should use a different name other than Then it asks for an image to import (on mobile devices it should ask for image capture or select from folder) then the editor window will be loaded with the image.
The suggested note type with Cloze Note Type and
I have used
In AnkiDroid, during reviweing, if we edit the notes and change the note type then all edit fields are disabled. So, I think it is better to have
With backend code available on Anki, AnkiDroid, AnkiMobile, the issues can be resolved, (maybe feature request to show connected notes of same image occlusion in order).
This will be considered in implementing this features.
I think in svg editor window on fields tab, the list of fixed fields will be shown and can not modified only can renamed. And
This is new features and I think the
I also think that the Add-on API should be better, so that user only have to work on front-end part which is easier. I am new to rust and it is hard to figure out the backend code. But if API exposed then only front end projects as addon should be implemented as I have created feature request [Ideas/Features] Cross platform js addons support using Rust backend and Svelte frontend
I have tried to integrate I think the SVG editor using fabric js will be better because of more controls over the tools, generated data, and supports on touch-based devices. The current progress is prototype of the feature that will be implemented and your suggestions greatly improved the features. I will do my best to considered all the suggestions in next commits in the PR. @glutanimate Thanks again. |
Indeed, thank you very much for the detailed feedback and suggestions @glutanimate, it all sounds reasonable. Just one note for now: the queue building code can't touch the notes table, as that will tank performance. We'd need some other way of designating cards to exclude from sibling spacing, like either a per-card property, or separate notetypes for reveal-one and reveal-all. |
I have generated notes using following data and I think canvas approach is better. It is generated using Cloze Note Type. In Occlusions field, the <div id="io_cloze">
<div id="c1" shape="rect" ioxywh="0,362,160,104">
{{c1::hidden::prompted}}
</div>
<div id="c2" shape="rect" ioxywh="61,128,245,82">
{{c2::hidden::prompted}}
</div>
</div> In Image field, path to the image, also there is no need of other files like svg for question and answer mask. The Card Template for front and back, it is same for both sides, also it works for both <div style="display:none;">{{cloze:Occlusions}}</div>
<canvas id="canvas" style="background-size: 100% 100%; max-width: 100%; max-height: 90vh;">
</canvas>
<div style="display:none;">{{Image}}</div>
<script>
var canvas = document.getElementById("canvas");
var ctx = canvas.getContext("2d");
var img = new Image();
img.onload = function () {
canvas.width = img.width;
canvas.height = img.height;
ctx.drawImage(img, 0, 0);
drawShapes(ctx);
};
img.src = document.getElementById("img").src;
function drawShapes(ctx) {
var ioCloze = document.getElementById("io_cloze");
for (cloze of ioCloze.children) {
var shape = cloze.getAttribute("shape")
var xywh = cloze.getAttribute("ioxywh")
xywh = xywh.split(',').map(Number);
if (cloze.children.length) {
if (cloze.children[0].className == "cloze" && cloze.children[0].innerText == "[prompted]") {
draw(shape, "red", xywh);
}
} else {
if (cloze.innerText.trim() == "shown") {
draw(shape, "green", xywh);
}
}
}
}
function draw(shape, color, xywh) {
ctx.fillStyle = color;
switch (shape) {
case "rect":
ctx.fillRect(xywh[0], xywh[1], xywh[2], xywh[3]);
break;
}
}
</script> Demo |
You're very welcome, guys! I'm glad the suggestions were helpful. IO has always been particularly dear to me, so I'm passionate about making sure its native implementation is built ontop of the best foundation yet. Happy to lend my hand where I can to achieve that :) @krmanik thanks for jumping right on this! The note type looks great. I took it for a spin on Anki desktop, AnkiDroid, and AnkiMobile, and it seems to be working perfectly across all. I also love the simplicity of its design – it really speaks for the canvas solution being the right approach, I feel. A couple of notes on the notetype: I think it's clever that you went with HTML for the However, one concern I would have is whether these custom attributes could end up being mangled at some point in their lifecycle from creating the note, syncing it, editing it across multiple platforms, and reviewing the cards. I.e. @dae, is there any step along this way where either now or in the future you could imagine something like an HTML cleaning logic dropping unrecognized attributes? Perhaps in that case we could use Beyond that, my only recommendation for now would be to move the I was also thinking about whether there is some way we might be able to reuse the front template on the back, given that it's the same on both sides, but it seems like there isn't really a And a few comments on some of your other points @krmanik:
I think that directing users directly into the IO editor is the smoothest approach. However, we have to keep in mind that the Fields tab might not have the same capabilities as the regular editor, and so users might still need to access the fields in the regular editor for now (e.g. to insert images or apply text formatting). So I would vote for opening the regular editor for now when clicking on the Edit button. If we do tackle merging the IO editor into the regular editor (which would be amazing), then that would no longer apply of course.
Sorry, I think my original comment might have been ambiguous there. Allowing card templates to specify a custom review order is a very cool idea, and something that would benefit quite a few card designs, but not something we need in IO for now IMHO. I would say that all we need to do for now is to make sure that sibling spacing is active for "Hide all", and disabled for "Hide One".
I think that's a reasonable approach to eliminate confusion in the UX paths, but the key disadvantage is that it would make the workflow when alternating between text notes and IO notes much more janky as users would have to switch between the editor and Tools menu. I think that, on desktop at least, most users will typically switch back and forth between creating image occlusions and normal notes (on mobile this might be different). So any additional friction introduced here would not be good. For now I would vote for sticking with the UX flow that IOE had, even though it's not great. If we introduce a new flow now, only to then switch away from it again once IO is integrated with the regular editor, that would probably be more confusing to users.
👍, but I would recommend including a
I agree that building out a great JS API should be our first and foremost priority here. It's the most natural way for desktop add-ons to extend a web view like IO's, and any progress made here would also help set up a foundation for a future cross-platform add-on system. A few early thoughts on what I think we'd need to tackle to make sure that native IO has first-class support for add-ons:
@dae:
Ah, that's good to know. Hmm, I was going to vote for the per-card property so that we don't create too many new note types, but now that I think of it, a per-notetype approach might be the right choice to start with: Crucially it would make switching between different occlusion modes incredibly easy as field content and mask presentation would be fully decoupled. I.e., we would not need to update the |
The
I will move the
I tried but it does not work. So on the front side only the
The UX implementation will be similar to IOE, if it needs to change it will be changed because it is just a menu option.
It will be in consideration and will/may be implemented later, till then the regular editor will be opened on the Edit button click.
Then only these fields will be added in the
The suggestions will greatly help addon developers in developing addons. I am developing this feature similar to Fabric js editorThere are some basic shapes in fabric js which will be used in <div shape="rect" ioxywh="0,362,160,104">
{{c1::hidden::prompted}}
</div>
<div shape="rect" ioxywh="61,128,245,82">
{{c1::hidden::prompted}}
</div> |
18ddefe
to
c14d093
Compare
True, this was one of the reasons.
This was another reason. The initial prototype was actually some code (not even written in Python, which I didn't know very well at the time) that read image files created with Inkscape (a desktop app to edit SVG files) and create cards out of that.
Yes, simplicity was important at the time. If it were much more complex than it was it probably wouldn't have gotten off the ground.
As much as I dislike Python as a language and as an application development platform, the extensibility that it allows for something like Anki is unparallelled, as well as quick iteration without a major language compiler, which would be the case with Rust. I can assure you that Image Occlusion would definitely not have happened if Anki were written in Rust (or C++ or whatever the equivalent would be for Rust at that time). The javascript part will probably allow for more extensibility, but AFAIK it seems to run in a sandbox, and the ability we had to bridge javascript into python was very important for the addon's development. I understand that Rust probably brings enormous benefit in maintainability, though |
@dae I have implemented the frontend with very basic shapes and three fields
I have used anki/rslib/src/image_occlusion/imagedata.rs Lines 99 to 118 in f8e4418
Also what is best way to know current active selected decks and notetypes here. anki/rslib/src/image_occlusion/imagedata.rs Lines 15 to 24 in f8e4418
|
I've pushed a fix here: ankitects@1d0b236
col.defaults_for_adding() provides the default notetype and deck when the add window is opened, but what we really want is the currently selected deck in the add window. The routine in mediasrv.py will need to locate the open Add window, and extract the deck id from it Since we hard-code the notetype name, I think we only need the deck id, and not the notetype? |
proto/anki/image_occlusion.proto
Outdated
|
||
message AddImageOcclusionNotesRequest { | ||
string image_path = 1; | ||
bytes notes_data = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels more complicated than it needs to be - instead of packing a bunch of items into a JSON string, why not list out the various fields like occlusion
and notes
in this proto message? That way there's no need to encode/decode them into JSON, and they can be accessed safely directly from the proto message.
Thanks, I will use the code to implement further.
I was thinking about letting the user to add more fields to the notes but then decided to let the user input every other data into
I will update the requests to fields and tags so there will no need to json stringify and parse. Thanks |
I have tried to get deck id from add window using this in deck_id = aqt.addcards.AddCards(self.mw).deck_chooser.selected_deck_id |
e404994
to
e6e2999
Compare
* implemented proto - getting image data with path - returning bytes, file name and current deck id to frontend - frontend return generated occlusion and notes to backend * Created image occlusion dialog to serve image-occlusion * Added image occlusion in editor toolbar to open image occlusion page
* deck selection during addin note * fields - header, occlusions, images, notes and tags * basic tools implementation using fabric.js - rectangle - ellipse - group - delete - multiple select - undo, redo
e6e2999
to
d1589ea
Compare
- triangle - circle - square - fille shape color * move bottom tools to top bar
* added tool - ungroup grouped shapes
optimized the process of stroring and generating shapes data
Sorry, I missed that you asked a question above. Are you still looking for an answer for that? |
The This is current view of mask editor and note editor. The note editor have not advance feature (bold/italic/underline...etc) similar to note editor provided in anki. I have tried to include it but there are errors because it bridge |
Given the demand for this feature, I'd rather get a basic solution into the hands of users than delay a release for months more trying to make it perfect. :-) |
Then I will create pull request after adding basic tools to note editor window. |
PR is create to upstream repository. |
For keeping tracks of commit hashes.