Skip to content

xangregg/data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

data

AmericanMoviesByYear.csv

Billion-Dollar Distasters.csv

Cape Town rainfall averages

crossword_times.csv

  • Four years of my own results for the New York Times daily crossword. Includes, time spent and whether I successfully completed the puzzle or not.
  • I made some graphs in a Twitter thread
  • Latest graphs with all four years of data

Du Bois Georgia Map 1899

greenland_surface_melt.csv

internet_speeds_summer_2018.csv

journal.pone.0216362.s003.csv

  • Data from Battle for the thermostat: Gender and the effect of temperature on cognitive performance
  • Converted form Stata DTA from to CSV
  • Added a few derived fields
    • Added rounded version of temp field, so "19.1000003814697" is "19.1". I'm guessing the Stata file was compressed to use single precision floating point numbers.
    • Computed wordscore for the word problem, since the participants were rewarded by total score rather than number complete.
    • Added Gender=F|M from Male=0|1
    • Added "Temp Category" to combine indicator variables: cool, normal, warm, hot

MosquitoTrendsData.csv

pubdelays100.csv

scented candle reviews.csv

  • Selected features from 50,000 scented candle reviews across 20+ products
  • Features: 0 or 1 indicating whether review title or text included selected terms
    • no smell narrow: no smell|scent
    • no smell wide: no|zero|0|nothing smell|scent|fragrance|aroma
    • bad smell: bad|awful|terrible smell|scent|fragrance|aroma
    • broken: broken|shattered
    • overwhelm: overwhelm|overpower
    • artificial: artificial|synthetic
  • Related tweeted charts
  • Note: Amazon merges reviews for some candles.
    • Yankee Candle Balsam & Cedar includes White Christmas
    • Yankee Candle Vanilla Cupcake includes Apple Pumpkin
    • Yankee Candle Pineapple Cilantro includes Alfresco Afternoon
    • Yankee Candle Sage & Citrus includes Honey Clementine
    • Yankee Candle Pink Sands includes Calm & Quiet Place
    • Chesapeake Bay Balance+Harmony includes Strength+Energy

shirt_sponsorship.csv

  • Inferred data on Premier league shirt sponsorship over time from this FT article. Use for experimenting with alternates to streams.

travel_motivation_recoded_stacked.csv

  • Recoded subset of survey responses from /r/travel.
  • Includes a few demographic fields and responses to the question "What motivates you to travel?"
  • This was a free-form text field. I recoded the responses into common terms and expanded multiple responses into multiple rows (with same respondent id).
  • For instance, "new cultures" could have been originally "different cultures", "see new cultures", "other lifestyles", ...
  • Surely, some bias or error was introduced during the recoding process.
  • I made a slopegraph of the responses by gender.

tukey_EDA_7.8_coal.csv

  • Coal production data from John Tukey's 1977 book Exploratory Data Analysis, for time series smoothing
  • Includes two other overlapping sources: coal census -1956 and EIA 1949-.
  • The "combined" column uses data from the other sources for years where Tukey's table doesn't match his graphs

vaccine_safety_expressing, vaccine_safety_tcell_secretions

  • Data shown in Figure 2 of Covid-19 vaccine safety experiment paper: https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)31208-3/fulltext
  • scraped from PDF after converting to SVG, so should be high resolution and complete, but...
  • For the secretions scatter plots, each combination should have 36 dots, but the Day 0 groups are missing a few. Most values for those groups are clustered at the bottom of the scale so possibly they are clipped or omitted for space. Or maybe I made a mistake.
  • The pies show 7 of the 8 possible combinations of expressions. Not sure if the all negative category is omitted because it's empty or because it's uninteresting.
  • Proportions multiplied by 36 do not generally match integers. Seems like they should.

vitamin d survival.csv.csv

wiseparatext.csv

World rail line colors

About

Data for sharing

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published