Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example for output_visualizer is missing #351

Open
vmcj opened this issue Nov 22, 2024 · 24 comments
Open

Example for output_visualizer is missing #351

vmcj opened this issue Nov 22, 2024 · 24 comments

Comments

@vmcj
Copy link

vmcj commented Nov 22, 2024

I would expect an example for the output_visualizer in the maximal problem. This is currently missing. Can this be added before the draft becomes official?

@niemela
Copy link
Member

niemela commented Mar 14, 2025

@RagnarGrootKoerkamp Could you add an example of visualizers?

@RagnarGrootKoerkamp
Copy link
Collaborator

I'll pass it on to either @mpsijm or @mzuenni ; don't have sufficient time at the momen.

@mpsijm
Copy link
Contributor

mpsijm commented Mar 15, 2025

Coincidentally, I was chatting about this with @mzuenni last week 😄 I think I'd like to have a working implementation in BAPCtools as well before the draft becomes final (see also RagnarGrootKoerkamp/BAPCtools#433), but we weren't 100% sure about some things. Forgot to actually ask the people here, but let's do that now then 😛 I'll just write from the top of my head what BAPCtools should do with visualizers (both input and output), and then I'd like to hear from others whether that's indeed what they expect tooling to do, based on the spec:

  • The input visualizer will work roughly the same as the visualizer: key we currently have in generators.yaml. The spec does not prescribe how this program should be called, so for the simplest implementation, we can simply keep the same interface but simply remove the visualizer: key from generators.yaml and always look for a visualizer Program in input_visualizer/ instead.
    • The input visualizer is run as part of bt generate, after generating <tc>.in and <tc>.ans files, and the resulting images (<tc>.png/<tc>.svg/etc.) also end up in data/.
    • Q1: Note that currently, the input visualizer also has access to the answer file. Is that reasonable? @mzuenni thought that in that case, it would be more like a test_case_visualizer rather than an input_visualizer.
  • The output visualizer will work roughly the same as a combination of the input visualizer and the output validator, and it should generate {team,judge}image.<ext> files.
    • When running bt run, the image files stay in the feedback directory of $(bt tmp)/runs/<submission>/<test_case>/, and do not end up in data/.
    • Q2: if we want to use the output visualizer to generate images for submission output, then we may as well use the same program to generate an image for the canonical jury answer (i.e. the .ans file) during bt generate. If so, then we would like this to end up in data/ as well, but that potentially gives name clashes with images generated by the input visualizer. Even when the input visualizer does not have access to the .ans file, it may still be the case that we want to visualize both the input and the answer/output. If so, then how do we resolve the name conflict of the image files in data/? For example, we could have both <tc>.in.<ext> and <tc>.ans.<ext> (with <ext> from png/svg/etc.)?

@niemela
Copy link
Member

niemela commented Mar 15, 2025

Coincidentally, I was chatting about this with @mzuenni last week 😄

Great!

I think I'd like to have a working implementation in BAPCtools as well before the draft becomes final

I'm aiming to declare the draft feature final at the end of the month. (We'll see how that goes...)

(see also RagnarGrootKoerkamp/BAPCtools#433), but we weren't 100% sure about some things. Forgot to actually ask the people here, but let's do that now then 😛 I'll just write from the top of my head what BAPCtools should do with visualizers (both input and output), and then I'd like to hear from others whether that's indeed what they expect tooling to do, based on the spec:

  • The input visualizer will work roughly the same as the visualizer: key we currently have in generators.yaml. The spec does not prescribe how this program should be called,

We definitely should specify how if should be called. My strong preference would be to follow the pattern from the input/static/output validators, so probably something like:
<input_visualizer_program> feedback_dir [additional_arguments] < inputfile
?

  • so for the simplest implementation, we can simply keep the same interface but simply remove the visualizer: key from generators.yaml and always look for a visualizer Program in input_visualizer/ instead.
    • The input visualizer is run as part of bt generate, after generating <tc>.in and <tc>.ans files, and the resulting images (<tc>.png/<tc>.svg/etc.) also end up in data/.

👍

    • Q1: Note that currently, the input visualizer also has access to the answer file. Is that reasonable? @mzuenni thought that in that case, it would be more like a test_case_visualizer rather than an input_visualizer.

Is there any other reason why an input visualizer would need the answer file, then that it want to produce a visualization that indicates the correct solution as well?

My though was always that the input visualizer would not indicate the solution, since you could (should?) run the output visualizer on the answer file (or .out or .ans.statement ?) to get a visualization of the expected solution.

That said, the intended audience for input visualizations are judges, teachers, coaches, spectators, and such, not contestants and students, so it's not obviously broken for it to be more of a "test case visualization" and include the solution.

I would prefer a pure input visualization over a test case visualization though.

  • The output visualizer will work roughly the same as a combination of the input visualizer and the output validator, and it should generate {team,judge}image.<ext> files.

    • When running bt run, the image files stay in the feedback directory of $(bt tmp)/runs/<submission>/<test_case>/, and do not end up in data/.

👍

    • Q2: if we want to use the output visualizer to generate images for submission output, then we may as well use the same program to generate an image for the canonical jury answer (i.e. the .ans file) during bt generate. If so, then we would like this to end up in data/ as well, but that potentially gives name clashes with images generated by the input visualizer. Even when the input visualizer does not have access to the .ans file, it may still be the case that we want to visualize both the input and the answer/output. If so, then how do we resolve the name conflict of the image files in data/? For example, we could have both <tc>.in.<ext> and <tc>.ans.<ext> (with <ext> from png/svg/etc.)?

Wouldn't it make more sense for the tool running the output visualizer to move the resulting filer from feedback_dir/judgeimage.<ext> (would we also use the team image?) to data/? The feedback_dir is always temporary anyway, it's just "where the ouput goes".

@mzuenni
Copy link
Contributor

mzuenni commented Mar 15, 2025

Is there any other reason why an input visualizer would need the answer file, then that it want to produce a visualization that indicates the correct solution as well?

the question is: what do we want. I think visualizing a test case can very much mean visualizing the solution and in the past this has shown to be very useful. Same goes for the "output" visualizer, do we want it to have access to the jury solution? I would again think that this can be useful.

IMO it would be nicer to have:

  • a testcase_visualizer that gets .in and .ans and produces one image. It is then up to the implementation if it visualizes the solution or just the input.
  • an output_visualizer that gets access to .in, ans and the team output. Again, the visualizer can decide what should be shown.

@niemela
Copy link
Member

niemela commented Mar 15, 2025

Is there any other reason why an input visualizer would need the answer file, then that it want to produce a visualization that indicates the correct solution as well?

the question is: what do we want.

Agreed, that is the more important question, and I was discussing it in the section after what you quoted :). My point here was "even if we want a pure input visualization, would there still be a reason for the input visualizer to have the answer file". But, you're right, we should answer the other question first.

I think visualizing a test case can very much mean visualizing the solution and in the past this has shown to be very useful.

Makes sense.

Same goes for the "output" visualizer, do we want it to have access to the jury solution? I would again think that this can be useful.

I already does, it uses the same arguments as the output validator, and I agree.

IMO it would be nicer to have:

  • a testcase_visualizer that gets .in and .ans and produces one image. It is then up to the implementation if it visualizes the solution or just the input.
  • This could still be called an input visualizer. (And I think it should be)
  • I don't think we should just leave the expected purpose unspecified. If we think it's reasonable/acceptable/useful for this to visualize the solution as well, we should say so.
  • The output visualizer can produce both a team and judge image, could that be the difference between "with solution and without"?
  • So you're suggesting something like <input_visualizer_program> input_file answer_file feedback_dir [additional_arguments]? Mirroring the output validator rather than the input validator?

What would the difference be between what is generated by this input visualizer that gets the solution, versus running the output visualizer on the correct solution?

  • an output_visualizer that gets access to .in, ans and the team output. Again, the visualizer can decide what should be shown.

The current spec for the output visualizer, which includes what you say.

@mzuenni
Copy link
Contributor

mzuenni commented Mar 15, 2025

So you're suggesting something like <input_visualizer_program> input_file answer_file feedback_dir [additional_arguments]? Mirroring the output validator rather than the input validator?

yes. (except that output validator is also called with < team output, so its more like something inbetween)

What would the difference be between what is generated by this input visualizer that gets the solution, versus running the output visualizer on the correct solution?

Their purpose is entirely different. The input_visualizer wants to tell me something about the test case. The other one is more focused on the submission output.

A random example would be that the input_visualizer could give me a plot of the degree sequence of the graph (just some random statictic) and the output_visualizer shows me the diff between jury and team output.

@niemela
Copy link
Member

niemela commented Mar 15, 2025

Ok, that works.

Let me verify something else. We all agree that "what the input visualizer produces" is exactly the "Illustrations" described in the test data section, right? Meaning that the expectations for the latter should apply to the output of the former. Specifically:

  • Illustration files are meant to be privileged information.
  • An illustration provides a visualization of the associated test case.

If we have more expectations (for example: "an illustration could contain information indicating the expected solution"), we should add that. Furthermore, since we are discussing this, it is obviously not obvious, so we should definitely (I wanted to say "obviously") specify it one way or the other.

@RagnarGrootKoerkamp
Copy link
Collaborator

So to clarify, there are three types of generated illustrations:

  • input/testcase illustrations, stored as eg data/secret/*.png, not shown to teams. I don't really care what we call it, but clarifying that it illustrates the testcase as a whole, including possible an answer, sgtm.
  • output illustrations, written to {team,judge}image.png. One is shown to teams, one is not.

I'm a bit confused now about public illustrations:

  • We do not have public input/testcase illustrations.
  • We do have public output illustrations.

But what is the expected use case of teamimage.png? Should it only be generated for samples? Or always generated and only shown for samples? Can input/testcase illustrations be shown for samples?

It seems that the public/private figure hinges more on whether the testcase is a sample, than what is rendered in the figure.

  • Is there a use case for showing teamimage.png for secret cases?
  • Can we instead consolidate teamimage.png and juryimage.png into one and only show it for samples (if enabled/supported)?
  • Are there cases where we want to not leak the juryimage.png for the samples? I think that can be, eg when we draw some hints towards the solution.

@niemela
Copy link
Member

niemela commented Mar 15, 2025

I'm a bit confused now about public illustrations:

Sorry if I caused this confusion.

  • We do not have public input/testcase illustrations.

Correct.

  • We do have public output illustrations.

Also correct. This would be the teamimage.<ext>.

But what is the expected use case of teamimage.png? Should it only be generated for samples?

It should be for all full_feedback cases, which by default is the samples.

Or always generated and only shown for samples?

If a tree falls...?

Does it matter whether it's generated, if it's not used? This seems like an implementation detail of a system using the problem package.

Can input/testcase illustrations be shown for samples?

Great question, that we have not answered (or even thought that much about). I would prefer if the answer is "yes". But a good argument can certainly be made for "no".

  • Is there a use case for showing teamimage.png for secret cases?

To judges/teachers? Maybe a choice could be made to show teamimage.<ext> (but not judgeimage.<ext> to spectators?

  • Can we instead consolidate teamimage.png and juryimage.png into one and only show it for samples (if enabled/supported)?

Good point. 👍

  • Are there cases where we want to not leak the juryimage.png for the samples? I think that can be, eg when we draw some hints towards the solution.

But this would be hints that are not conveyed by getting both .in and .ans (or .out?). What kind of information could that be? Why would we want that in judgeimage.<ext>?

@RagnarGrootKoerkamp
Copy link
Collaborator

But this would be hints that are not conveyed by getting both .in and .ans (or .out?). What kind of information could that be? Why would we want that in judgeimage.?

Maybe something like: the figure illustrates the steps of Dijkstra's algorithm towards finding a shortest path, rather than just showing the shortest path. So you would show the solution to the team, but not the way the solution is obtained.

It should be for all full_feedback cases, which by default is the samples.

Makes sense 👍

I don't have strong opinions on this, but I think it would be good to make a little list of some usecases we want to support and what exactly their requirements are. E.g., does one ever want to actually generate both teamimage and judgeimage at the same time? Otherwise, this could also be a setting in problem.yaml (output_visualizations_are_private: bool). (I don't really think that's better, but who knows.)

@niemela
Copy link
Member

niemela commented Mar 15, 2025

This is also somewhat related to my comment in #393.

If output validators get access to full_feedback, then input/output visualizer should as well. then they could choose to not leak privileged information in those cases. What I'm saying is, that if visualizers know when a case is full_feedback, we might not need separate team and judge images?

@mpsijm
Copy link
Contributor

mpsijm commented Mar 16, 2025

To aid in the discussion, RagnarGrootKoerkamp/BAPCtools#438 implements the visualizers as they are currently written in the draft:

  • The input visualizer only reads the .in file from stdin.
    • It writes testcase.<ext>. This is currently not specified, but the closest thing to the implementation that we already had.
  • The output visualizer is executed in the same way as the output validator: in/ans/feedbackdir as args, output is passed to stdin.
    • It writes image files to the feedbackdir, which are then left untouched (but from the discussion above, we may want to do something for the visualizations of the answer, a.k.a. the canonical jury output).

@niemela
Copy link
Member

niemela commented Mar 17, 2025

  • The input visualizer only reads the .in file from stdin.

Could we specify this as "same interface as input validators"? Would it make sense to give it the same arguments as the input validator?

    • It writes testcase.<ext>. This is currently not specified, but the closest thing to the implementation that we already had.

Where does it write them. In the same directory?

  • The output visualizer is executed in the same way as the output validator: in/ans/feedbackdir as args, output is passed to stdin.

👍

    • It writes image files to the feedbackdir, which are then left untouched (but from the discussion above, we may want to do something for the visualizations of the answer, a.k.a. the canonical jury output).

👍

@mpsijm
Copy link
Contributor

mpsijm commented Mar 17, 2025

Could we specify this as "same interface as input validators"? Would it make sense to give it the same arguments as the input validator?

Sure, we can make it have the same interface (it's already the same in BAPCtools, except input_validator_flags are not passed to the visualizer yet). For passing the input_validator_args to the input visualizer, that makes just as much sense as it does to pass the output_validator_args to the output visualizer 🙂

We could draw even more parallels and allow any input validator to write image files, just like the output validator is allowed to do so.

Additionally, we could split up {in,out}put_v{alidato,isualize}r_flags, if we think it doesn't make sense to pass the same flags to both.

Where does it write them. In the same directory?

BAPCtools creates and validates the test cases in $(bt tmp)/<problem_shortname>/data/<hash>/ before they're copied to data/. The images generated by the input visualizer follow the same flow.

@RagnarGrootKoerkamp
Copy link
Collaborator

RagnarGrootKoerkamp commented Mar 17, 2025

Where does it write them. In the same directory?

We went over this before, but probably good to quickly state again how we use. I think as follows but probably I'm wrong at least once:

  • current working directory: /tmp/.../<hash>/
  • feedbackdir: /tpm/.../<hash>/feedbackdir
  • path to .in and .ans: /tmp/.../<hash>/testcase.{in,ans}. Note that testcase here is literal, and fixed for every testcase. (Not sure if passed as absolute or relative path, but doesn't really matter.)

So visualizers would write to testcase.{png,..} in the current working directory.

@mpsijm
Copy link
Contributor

mpsijm commented Mar 17, 2025

Close! 😄 I also double-checked again, and apparently, we don't have a feedbackdir for input visualizers. The third point is correct though, as is the fact that the input visualizer writes testcase.<ext> images to the working directory 🙂

Note that the above three points only hold for the input visualizer, the output visualizer (currently) follows the same flow as the output validator:

  • Working directory: $(bt tmp)/<problem_name>/runs/<submission_rel_path>/<test_case_rel_path>/, which contains:
    • Test case input as testcase.in, symlinked to save space
    • Feedback dir: testcase.feedbackdir/ (this is where the images should be written to)

@RagnarGrootKoerkamp
Copy link
Collaborator

Huh, input visualizers write to CWD, and output visualizers write to the feedbackdir? That's weird/inconsistent.

@mpsijm
Copy link
Contributor

mpsijm commented Mar 17, 2025

In the input stage, there is no feedback_dir yet, because the concept of "feedback" doesn't really exist there, right? In the output stage, the visualization is actually "feedback" 🙂

@niemela
Copy link
Member

niemela commented Mar 31, 2025

@mpsijm @RagnarGrootKoerkamp Where are we with this? Could we get a PR maybe, or is there still disagreement?

@mpsijm
Copy link
Contributor

mpsijm commented Mar 31, 2025

My PR at BAPCtools is still not finished yet (other things keep popping up 😅). There's a list of TODOs at RagnarGrootKoerkamp/BAPCtools#438. I tried to write a summary of my current thoughts in the list below. Most of these are implementation details, but some things should be updated in the spec.

For completeness, the points that I also commented at https://github.com/RagnarGrootKoerkamp/BAPCtools/pull/438:
  • In BAPCtools, the interface of the input visualizer can be the same as that of the input validator, except that it writes a file rather than return exit code 42/43 (this is the current behaviour of my PR at BAPCtools). The spec does not have to specify that it should write the image to testcase.<ext> in the CWD, because the spec does not have a generators framework. In fact, I'm not even sure if it needs to mention that it has the same interface as the input validator, because the spec does not prescribe where the resulting images should be stored anyway. In fact, I'm thinking of changing testcase.<ext> to image.<ext> or judgeimage.<ext>.

  • In BAPCtools, we may want to allow input validators to write image files, similar to the output validator in the spec. The spec does not prescribe where the input illustrations should be stored, so we have this freedom in BAPCtools. (Note that the input visualizer is singular and there can be multiple input validators, so perhaps this is not the best idea 😛 But in theory, this is the same check as when both the output validator and the output visualizer write an image file, so perhaps it's not too bad.)

  • In BAPCtools, we want the <feedback_dir>/judgeimage.<ext> files that result from the output visualizer/validator ran on the canonical jury solution to end up in data/, provided that there is no input illustration already (because there can be at most one image per test case). Now that I think about it, this is the same conflict as when multiple input validators would write test case illustrations.

    (I think the implementation details in the points above should be discussed at BAPCtools, rather than here, hence the <details> block.)

  • In the spec, we may want to split up {in,out}put_v{alidato,isualize}r_flags, if we think it doesn't make sense to pass the same flags to both. I'd be in favour of this, because the output validator flags are typically about float tolerance and case-/whitespace-sensitiveness, but the output visualizer may need completely different flags.
  • In the spec, we indeed still need to add visualizers to the maximal problem. I think this only depends on the outcome of the previous point.

@thorehusfeldt
Copy link
Contributor

thorehusfeldt commented Apr 11, 2025

After some discussion with BAPCtool-people and playing around with the developing implementation on the draft-visualizer branch at https://github.com/RagnarGrootKoerkamp/BAPCtools/tree/draft-visualizer, let me voice my opinion on two smaller issues that are mentioned upthread:

Test case visualiser

There should be two visualisers. One is called the test case visualizer, it resides in <problem_dir>/test_case_visualizer. It is run during problem development (during “development-time” of the problem life cycle); it can be used to populate data/{sample, secret}**/<test_case>.{png, pdf, jpg, etc} with informative images of a test case. It has access to both <testcase>.in and <testcase>.ans, and can (for instance) be used to illustrate an intended solution, or a problem instance.

Here’s an example of a test case with a possible solution for BAPC2024:levellinglocks. It is, in fact, data/sample/001.pdf.

Image

Here’s an example of a test case “without solution” for WCFD24:alcohol. In fact, data/secret/alcohols/23-glycerol-dimethylester.png.

Image

Output visualiser

The other is called output visualizer and placed in <problem_dir>/output_visualizer. It can be run at submission-time by the judge-program. It requires submission output and typically shows the result of running a submission (faulty or not) on a test case. It makes a lot of sense to put the result of this visualizer below data/valid_output**/<testcase>.<extension> and data/invalid_output**/<testcase>.<extension>, and a problem development framework can populate data/{sample, secret} with its images when there is no test case visualizer

Arguments

The files <testcase>.yaml and test_group.yaml (by whatever names they will get) should allow the keys

test_case_visualizer_args?: *"" | string
output_viusalizer_args?: *"" | string

This is particularly useful for flags like --disable, to selectively disable the validator for test cases that cannot be visualised easily.

@mpsijm
Copy link
Contributor

mpsijm commented Apr 11, 2025

Agreed with all of the above, thanks for the summary! 😄

Probably, your mentions of populating the image files to data/ are BAPCtools-specific details. In the spec, the input visualizer has no specified interface, and the images that are created by the output visualizer stay in the feedback directory. The fact that the bt generate command uses the two visualizers to generate images in data/, is then BAPCtools-specific 🙂

@thorehusfeldt
Copy link
Contributor

… are BAPCtools-specific details.

Yes, my formulations

it can be used to populate

and

a problem development framework can populate

were sufficiently broad to allow a problem development framework (be it parts of BAPCtools or testdatatools or your mom’s bash scripts) to do what they want. In extremis, I am fine with not mentioning test case visualisers at all in the problem package format specification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants