Skip to content

rubyy-y/BachelorThesis

Repository files navigation

Highlighting Differences in Data Visualizations

"This project aims to identify and annotate differences between two data visualizations created using the Vega-Lite grammar. Vega-Lite uses a JSON syntax to define mappings from data to properties of graphical marks. Given two Vega-Lite specifications, the project will compare the JSON and resulting visualizations and highlight any differences by annotating them within the visualization itself, making it easy to spot and understand the changes. [...]"

Research Question: How can updates to data or specifications of visualizations be explained?

Implementation: Web Application using React, JavaScript, Python, HTML and CSS.

The (Altair) Visualization offers an option called "View Source". This source code will later serve to compare two visualizations!
note to self: Make file where you document exploration of JSON files, or special characteristics to later include in thesis.

Project Must-Haves

  • Data randomization Function
    • Input: JSON file, probability of change
    • Output: same JSON, some datapoints different (added, removed, skewed, recolored)
    • is saved as [dataset name][probability of change].json
      • example: cars30.json is car dataset where every datapoint has 35% probability of being altered (with Modification Function).
  • Datapoint Modification Function
    • Input: JSON datapoint
    • Output:
    • with probability of 1/3:
      • add: same datapoint and a second, random datapoint
      • remove: None
      • skew: datapoint with similar values (maybe based on density of all datapoints)
  • JSON Comparison Function
    • Input: two compiled Vega JSON files
    • Output:
      • general statistics: how many datapoints in each dataset, amount of difference (in %)
      • datapoints missing
      • datapoints added
      • datapoints differently colored (= Label changed)
  • Usable Web Application
    • Showing Dataset
    • Showing randomized Version of Dataset
    • Third Visualization of clear differences

Project Nice-To-Haves

  • User decides what Dataset
  • User decides Randomization
  • User decides if color counts as difference -> Boolean Value in input
  • In comparison function:
    • Recognize datapoints that might have been shifted,
    • based on some Threshold (hyperparameter) of distance between a deleted and an added datapoint.

About

Highlighting Differences in Data Visualizations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published