class: center, middle, inverse, title-slide # Lec01: Welcome! ## Stat41: Data Viz ### Prof Amanda Luby ### Swarthmore College ### 2021/01/04 (updated: 2021-01-04) --- class: center, middle # Today: ## 1. Overview of the class ## 2. Basics of Data Viz ## 3. Meet your group! --- class: inverse, center, middle # Overview of the class --- # Syllabus -- Preliminary/visual version and full "rules and regulations" version available [on the course website](https://aluby.domains.swarthmore.edu/stat041/syllabus.html). -- Please read **both** in detail at some point this week -- We'll only hit the highlights today -- Please interrupt me with any questions! --- # Three main pieces of this course: -- **Lecture/Discussion:** + M-Th 11a-12p ET + synchronous (exceptions for extreme time zone differences) + attendance expected -- **Labs:** + M-Th 1p-2p ET + asynchronous option + I'll be on Slack (and maybe zoom) -- **Projects:** + Weekly mini-projects + Final project -- Everything will be posted on the [course website](https://aluby.domains.swarthmore.edu/stat041), except for a few readings which will be posted on our private drive, and you will turn things in via moodle. --- # Lecture/Discussion It's *extremely* easy to fall behind in these intense class experiences. I think it's important to have regular meeting time together to check in, stay on track, and get help if needed. .pull-left[My goals for our lecture/discussion time: 1. Talk to other humans we don't live with 2. Synthesize readings 3. Share cool stuff with each other 4. Apply concepts + discuss in small groups 5. Ask questions about the lab ] .pull-right[ Your responsibilities: **Before "class"**: Short reading or video, prepare any Q's **During "class"**: Ask Q's, share visualizations you've come across, keep track of discussion in Jamboards **After "class"**: Do the lab, keep track of questions ] --- # Labs The best way to learn is by doing. The daily labs will give you guided practice making graphs in R, and most will include an open-ended portion to challenge yourself as much or as little as you like. -- If you want to build a large portfolio, the labs are a great place to do so! -- While we'll have a lab each class day, they are not "due" until Sunday, so if you run out of steam it is perfectly OK to finish them later. -- The labs will often involve replicating existing graphs or tweaking them in some way. My goal for this class is that you all learn skills for making graphs *after* this class, which often involves searching for and reading others' work. This is a hard skill to learn, so we're going to practice it! -- **A quick note on time commitment:** My expectation is that you will spend 3-4 hours on class days, plus 6-10 hours on projects per week, so somewhere in the 20-25 hours per week range. If you're spending much more or less time than that, please let me know. --- # Projects .pull-left[ ### Weekly Mini-Projects (1) Replicating a published graph + Improving it + Ruining it (2) "Blog post"-style walk-through of fitting a model (3) Explanation of a statistical concept from a previous class using interactivity ] .pull-right[ ### Final Project Build data visualizations of a dataset of your choice! Will include *both* a collection of visualizations and a 10-ish page paper. Each Sunday, you'll have a "milestone" due: 1. Find topic + data, making sure you can load it into R 2. EDA + introduction written 3. Rough Draft (minus interactivity) More details to come on Thursday. **Please reserve Thurs, Jan 28 11a-2:30p ET for presentations/demos** ] --- # Contacting me The best way to get your questions answered is to reach out to me via **slack** (you should have received an invitation this morning): + R Help + Clarifications about labs or projects + Logistical questions + Setting up a meeting -- If you have questions about the course material, there's a good chance others do too! -- Sensitive/private topics, or anything you want a more formal record of, can still come to my email (it just might take me longer to respond) --- ### Other guidelines: We're *still* in a pandemic -- Almost all of you are learning from home -- You are all in different places when it comes to bandwidth you can commit to this class -- ### **THAT'S OKAY** -- We're going to normalize **not apologizing** for things that aren't our fault: -- - Computer/Internet/Server problems -- - Sick family members -- - Helping kids with virtual school -- Instead, we'll say *thank you for understanding*. -- I will sometimes get over-excited about what we are doing in this course - **please don't mistake this excitement for pressure or expectation.** --- class: center, middle # Questions? --- class: inverse, center, middle # Basics of Data Viz
01
:
30
--- # Let's "look" at some data:  --- # This is called "Anscombe's Quartet":  All four datasets have the **exact same** linear model (minus the residuals). Source: Tufte's *The Visual Display of Quantitative Information* (a Data Viz classic) ---  -- + **Always graph your data before you calculate a statistic** + **Understand the impact of outliers on statistical properties** + **Computers are dumb!** --- # More gems from Tufte * induce viewer to think about substance, not graphical methodology -- * avoid **distorting** the data or letting **decoration** get in the way -- * make large, complicated datasets more coherent -- * encourage comparison of different pieces of data -- * reveal data at several levels of detail -- * describe, explore, tabulate, identify relationships -- * be closely integrated with statistical/verbal descriptions -- * use **consistent graph design** -- **Avoid graphs that lead viewers to make misleading conclusions** --- # Example 1 <center>  --- # Example 2  Source: GA Department of Health --- # Example 3  --- # Example 4  --- # Example 5 real graphic from a talk ([`crabs` dataset](https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/crabs.html) in `MASS` package) <!-- --> --- class: inverse, center, middle # Meet your group! --- # Your to-do list: ### 1. Introductions: name, year, favorite class (so far 😉) ### 2. Class expectations (first page on jamboard) ### 3. Talk through the first 4 examples using the *Five Qualities of Great Visualizations* and Tufte's guidelines on your [jamboard](https://drive.google.com/drive/folders/10Qc2mvHq8SHw2JsXWAUU5a0u-9X0YAmZ?usp=sharing) --- # Looking forward: * [Lab 01](https://aluby.domains.swarthmore.edu/stat041/labs.html) this afternoon, due on moodle on Sunday - Come back here at 1pm ET to work synchronously with your group - Ask Q's on Slack -- * Tomorrow: - Historical data viz - Grammar of Graphics -- * [Project 1 prompt](https://aluby.domains.swarthmore.edu/stat041/projects.html) on course website