class: center, middle, inverse, title-slide # Lec03: Map Day! ## Stat41: Data Viz ### Prof Amanda Luby ### Swarthmore College --- class: center, middle # Today: (1) Historical examples (2) Map basics (3) Critiques in your groups (4) Time for Q's (5) `ggplot2` examples --- # John Snow's Cholera map 1854: *Cholera* outbreak killed 127 people in 3 days in a London neighborhood -- At the time, people thought that Cholera was an airborne disease (now, we know it's waterborne) -- John Snow was a physician who was critical of the airborne theory -- Let's think about what hospital data might have looked like: -- ``` ## # A tibble: 3 x 6 ## date last_name first_name address age cause_of_death ## <chr> <chr> <chr> <chr> <dbl> <chr> ## 1 Aug 31, 1854 Jones Thomas 26 Broad St. 37 cholera ## 2 Aug 31, 1854 Jones Mary 26 Broad St. 11 cholera ## 3 Sept 1, 1854 Warwick Martin 14 Broad St. 23 cholera ``` (more in [MDSR](https://mdsr-book.github.io/mdsr2e/ch-spatial.html#motivation-whats-so-great-about-geospatial-data)) ---  --- class: center, middle The genius comes *not* from simply plotting the cases, but from *combining* two sets of data that share common features. -- (1) Dataset 1: Cholera cases from hospital -- (2) Dataset 2: London water pumps -- (3) Common features: Latitude and Longitude --- class: center John Snow's map (and water pump) are now "famous" among epidemiologists and statisticians. --  --- class:center ## W.E.B. DuBois: *Visualizing Black America*  --- When I think about **Insightful** and **Enlightening** data visualizations, these are what come to mind.  --- class: center  --- class: inverse, center, middle # Basics of Maps
01
:
30
--- # Important things to keep in mind: -- (1) Coordinate system and projection onto Euclidean space -- (2) How data is represented (eg `aesthetics`) -- (3) Level of aggregation (eg state vs district vs county vs individuals) --- class: center, middle Geospatial data exists on the globe and is generally described with a *latitude* and *longitude*. Any *projection* from the globe to euclidean space (X-Y plane) is going to cause some distortion.  --- class: center, middle  ---  --- # How data is represented + Chloropleth + Proportional symbol + Dot density --- # Chloropleth (or Thematic maps) Color or shading shows statistical data on previously-defined regions (e.g. states, countries), not on regions derived from the data. -- * Best used for densities or rates, not totals -- * Limit number of color classes; higher data values should be "more intense" -- * May mislead: large but low-value areas may stand out more than small but high-value areas -- * Mapping uncertainty is difficult -- Example to keep in mind: election maps -- You'll make chloropleth maps in lab today! --- # Proportional Symbol Add regions defined by the data to an existing map using their lat/long values -- * Better for totals -- * Need to be **proportional** (be careful with circular areas) -- * Pay attention to symbol placement (town/city vs center of state) -- * Sometimes called a "bubble map" -- Example to keep in mind: museums (example code at the end of these slides) --- class:center, middle <!-- --> --- # Dot density maps Add points representing individuals or groups of a fixed size (from the data) to an existing map using their lat/long values -- * Can be very beautiful -- * Need to be careful with point size (not so big to cover up other points; not too small that you have a mostly-empty graph) -- * Choosing what each point represents can mislead -- Example to keep in mind: Evergreen forest dot plot (from the Cairo book) ---  --- class:center, middle  --- class: inverse, center, middle # Group Time! --- ```r library(ggmap) museums <- read_csv("https://raw.githubusercontent.com/mateyneykov/315_code_data/master/data/museums.csv") # Read museum data from github us <- c(left = -125, bottom = 25.75, right = -67, top = 49) #define lat/long box for US map map <- get_stamenmap(us, zoom = 4, maptype = "toner-lite") # open-source map based on the lat/long coordinates we just defined nat_hist_museums <- filter(museums, Type == "NATURAL HISTORY MUSEUM", Income != 0) # filter museum data to Natural History museums only ggmap(map) + # Plot the open-source map geom_point(aes(x = Longitude, y = Latitude, size = Income), data = nat_hist_museums, alpha = .5, col = "darkblue") + # Layer the points from the museum data labs(title = "Natural History Museums by Income") + theme_void() # Since we're using a pre-existing image of a map, we don't want any theme elements ``` <!-- -->