class: center, middle, inverse, title-slide # Data Visualization in R ## Tidying your data and model into a nice plot ### Tiago Ventura ### University of Maryland --- class: center, middle # My approach **Always better to present your results with a graph** And keep all (or most of) your models and descriptive tables in the appendix. --- # Main Goals for Today By the end of the workshop, I expect you to know how to: - Prepare your data for easy visualization. - Run and extract information from statistical models. - Use ggplot2 to build an informative visualization. --- ## How to get there? - Basics of ggplot - Tidy Data with TidyR - Broom for Model Quantities - Case studies (from my own work) --- class:inverse, center, middle # Basics of ggplot2 --- # How ggplot works - Data Visualization involves connecting (mapping) variables from your data to graphical representations. - ggplot2 provides you with a language to map data to a plot. - ggplot2 works by connecting data and visual components through a function called __aesthethics mapping__ (aes). - Every graph is built layer by layer starting with your: 1) data, 2) aesthetics mappings, 3) geometric decisions, and then 4) embelisshment of the plot. --- # Summary .pull-left[ <img src="ggplot_flow1.png" width="70%" /> ] .pull-right[ <img src="ggplot_flow2.png" width="62%" /> ] --- class:inverse, center, middle # Tidy your data --- # Starting some coding Let's first call our packages. I am using the package [packman](http://trinker.github.io/pacman/vignettes/Introduction_to_pacman.html) to help me manage my libraries. ```r pacman::p_load(tidyverse, gapminder, kableExtra, tidyr, ggthemes, patchwork, broom) ``` This is a tidy data: ```r knitr::kable(head(gapminder), format = 'html') ``` <table> <thead> <tr> <th style="text-align:left;"> country </th> <th style="text-align:left;"> continent </th> <th style="text-align:right;"> year </th> <th style="text-align:right;"> lifeExp </th> <th style="text-align:right;"> pop </th> <th style="text-align:right;"> gdpPercap </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1952 </td> <td style="text-align:right;"> 28.801 </td> <td style="text-align:right;"> 8425333 </td> <td style="text-align:right;"> 779.4453 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1957 </td> <td style="text-align:right;"> 30.332 </td> <td style="text-align:right;"> 9240934 </td> <td style="text-align:right;"> 820.8530 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1962 </td> <td style="text-align:right;"> 31.997 </td> <td style="text-align:right;"> 10267083 </td> <td style="text-align:right;"> 853.1007 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1967 </td> <td style="text-align:right;"> 34.020 </td> <td style="text-align:right;"> 11537966 </td> <td style="text-align:right;"> 836.1971 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1972 </td> <td style="text-align:right;"> 36.088 </td> <td style="text-align:right;"> 13079460 </td> <td style="text-align:right;"> 739.9811 </td> </tr> <tr> <td style="text-align:left;"> Afghanistan </td> <td style="text-align:left;"> Asia </td> <td style="text-align:right;"> 1977 </td> <td style="text-align:right;"> 38.438 </td> <td style="text-align:right;"> 14880372 </td> <td style="text-align:right;"> 786.1134 </td> </tr> </tbody> </table> --- # Tidy Data There are three interrelated rules which make a [dataset tidy](https://r4ds.had.co.nz/tidy-data.html). - Each variable must have its own column. - Each observation must have its own row. - Each value must have its own cell. <img src="tidy.png" width="100%" /> --- # Getting our own data ```r data <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vQ56fySJKLL18Lipu1_i3ID9JE06voJEz2EXm6JW4Vh11zmndyTwejMavuNntzIWLY0RyhA1UsVEen0/pub?gid=0&single=true&output=csv") ``` #### Is this data tidy? (You can say just checking colnames) ``` ## [1] "state" "pollster" "sponsor" ## [4] "start.date" "end.date" "entry.date.time..et." ## [7] "number.of.observations" "population" "mode" ## [10] "biden" "trump" "biden_margin" ## [13] "other" "undecided" "url" ## [16] "include" "notes" ``` --- # Pivoting to make the data tidy (Pivot Longer) ```r data_long <- data %>% * pivot_longer(cols=c(biden,trump), * names_to="Vote_Choice", * values_to="Vote") %>% mutate(end.date=as.Date(str_replace_all(end.date, "/", "-"), format = "%m-%d-%Y")) ``` We have three inputs in `pivot_longer`: - `cols`: the variables you want to convert from wide to long. - `names_to`: new variable for the columns names from the wide data. - `values_to`: new variable for the values from the wide data. --- ## And now: ```r data_long %>% select(Vote_Choice, Vote) %>% slice(1:5) ``` ``` ## # A tibble: 5 x 2 ## Vote_Choice Vote ## <chr> <dbl> ## 1 biden 45 ## 2 trump 50 ## 3 biden 52 ## 4 trump 40 ## 5 biden 47 ``` <br> <br> ### .center[With this tidy dataset, we can start our visualizations.] --- class:inverse, center, middle # Basics of ggplot --- ## Linking the Data to Visuals (Mapping). Mapping is how you connect your data and variables with the visual representations of a graph. We will do this in three steps. -- - **The Data Step**: Tell ggplot what your data is. -- - **The Mapping Step:** Tell ggplot **what** variables -> visuals you want to see. -- - **The Geom Step:** Tell ggplot **how** you want to see -- --- # An Abstract Example of ggplot ```r knitr::include_graphics("gg-syntax.png") ``` <img src="gg-syntax.png" width="579" /> --- # Polls Over Time (Ugly but with the basics) ```r ggplot(data=national_polls, # the data step aes(x=end.date, y=Vote)) + # the map step geom_point() # the geom step ``` <img src="slides_files/figure-html/unnamed-chunk-11-1.png" width="60%" /> --- ### Which Aesthetics I can use? ![](aes.png) --- # Aesthethics: Color ```r ggplot(data=national_polls, # the data step aes(x=end.date, y=Vote, color=Vote_Choice)) + # the map step geom_point() ``` <img src="slides_files/figure-html/unnamed-chunk-12-1.png" width="60%" /> --- # Aesthethics: Shape ```r ggplot(data=national_polls, # the data step aes(x=end.date, y=Vote, color=Vote_Choice, shape=Vote_Choice)) + # the map step geom_point() ``` <img src="slides_files/figure-html/unnamed-chunk-13-1.png" width="60%" /> --- # Aesthethics: Alpha ```r ggplot(data=national_polls, # the data step aes(x=end.date, y=Vote, color=Vote_Choice, shape=Vote_Choice, alpha=end.date)) + # the map step geom_point() ``` <img src="slides_files/figure-html/unnamed-chunk-14-1.png" width="60%" /> --- # Aesthethics: Linetype ```r ggplot(data=national_polls, # the data step aes(x=end.date, y=Vote)) + # the map step geom_smooth(aes(linetype=Vote_Choice)) + geom_point(aes(color=Vote_Choice), alpha=.2) ``` <img src="slides_files/figure-html/unnamed-chunk-15-1.png" width="60%" /> --- # Some notes - You can use multiple aesthetics together. - One variables for each aesthethic (that's why your data should be tidy) - Outside of Aes, the aesthetics work with a simple value, not a variable linking data and geoms. --- class:inverse, center, middle # Geoms --- # Smooting the data ```r ggplot(data=national_polls, aes(x=end.date, y=Vote, color=Vote_Choice, fill=Vote_Choice)) + geom_point(alpha=.2) + geom_smooth() ``` <img src="slides_files/figure-html/unnamed-chunk-16-1.png" width="60%" /> --- # Density ```r ggplot(data=national_polls, aes(x=end.date)) + geom_density(fill="steelblue") ``` <img src="slides_files/figure-html/unnamed-chunk-17-1.png" width="60%" /> --- # Nice Trick: facet wrap ```r ggplot(data=state_polls %>% filter(state%in%swing_states)) + geom_density(aes(x=end.date, fill=state), alpha=.3) + facet_wrap(~state, ncol=3) ``` <img src="slides_files/figure-html/unnamed-chunk-18-1.png" width="60%" /> --- # Bars ```r ggplot(data=data_long, aes(x=end.date, fill=mode)) + geom_bar() + scale_fill_brewer(palette = "Set3") ``` <img src="slides_files/figure-html/unnamed-chunk-19-1.png" width="60%" /> --- # Box Plot ```r # Another example ggplot(state_polls %>% filter(state%in%swing_states), aes(x=Vote,y=fct_rev(state), fill=Vote_Choice)) + geom_boxplot() + scale_fill_manual(values=c("biden"="blue","trump"="red")) ``` <img src="slides_files/figure-html/unnamed-chunk-20-1.png" width="60%" /> --- class:inverse, center, middle # And many, many, many more options. ### See [here](https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf) --- ### Exercise: What do you see? <img src="res_no_band.png" width="100%" /> --- ## Part 4: Adjust scales, labels, titles, and more. After you are set on the mapping and geoms, the next step is to adjust the scale of your the graph. These functions are usually on the form: `scale_aesthethic_type`. - `scale_x_log10`: To convert the numeric axis to the log scale - `scale_y_reverse`: To reverse the scale - `scale_fill_manual`: To create your own discrete set of fill. - `scale_colour_brewer()`: Change the Pallet of Colours <br> *After you understand the first three steps, the last one is all about googling.* --- # Putting all together ```r p <- ggplot(data=national_polls, aes(x=end.date, y=Vote, color=Vote_Choice, shape=Vote_Choice, fill=Vote_Choice)) + geom_point(alpha=.2) + geom_smooth() + scale_shape_manual(values =c(21, 23)) + scale_fill_manual(values=c("red", "blue")) + scale_color_manual(values=c("red", "blue"), labels=c("Biden", "Trump"), name= "Vote Choice") + scale_x_date(date_breaks = "1 month", date_labels = "%b %d") + guides(fill=FALSE, shape=FALSE) + labs(x = "End of the Poll", y = "Results", title = "Polls US Presidential Election", subtitle = "", caption = "Source: The Economist") ``` --- class: center, middle ![](slides_files/figure-html/unnamed-chunk-23-1.png)<!-- --> --- # Part 5: Embellishment as an consistent workflow - Most of the adjustments you can make on your plot go inside of the `theme` function. - When working in a paper, you should be consistent with your graphs. - Create your own theme, and apply to all your codes. --- # Build Your Theme ```r # Set up my theme ------------------------------------------------------------ my_font <- "Palatino Linotype" my_bkgd <- "#f5f5f2" pal <- RColorBrewer::brewer.pal(9, "Spectral") my_theme <- theme(text = element_text(family = my_font, color = "#22211d"), rect = element_rect(fill = my_bkgd), plot.background = element_rect(fill = my_bkgd, color = NA), panel.background = element_rect(fill = my_bkgd, color = NA), panel.border = element_rect(color="black"), strip.background = element_rect(color="black", fill="gray85"), legend.background = element_rect(fill = my_bkgd, color = NA), legend.key = element_rect(size = 6, fill = "white", colour = NA), legend.key.size = unit(1, "cm"), legend.text = element_text(size = 10, family = my_font), legend.title = element_text(size=10), plot.title = element_text(size = 22, face = "bold", family=my_font), plot.subtitle = element_text(size=16, family=my_font), axis.title= element_text(size=14), axis.text = element_text(size=8, family=my_font), axis.title.x = element_text(hjust=1), strip.text = element_text(family = my_font, color = "#22211d", size = 10, face="italic")) ``` ```r # This sets up for all your plots *theme_set(theme_bw() + my_theme) ``` --- ## Or by hand for each plot .pull-left[ ```r p + theme_bw() + my_theme ``` ] -- .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-28-1.png" width="100%" /> ] -- --- # Pre-built plots There are several packages in R and built-in in gpplot with pre-built themes. Some examples from `ggthemes` and `hrbrthemes` - theme_minimal() - theme_economist() - theme_fivethirtyeight() - theme_ipsum() --- .pull-left[ ```r p + theme_minimal(base_size=12) ``` ![](slides_files/figure-html/unnamed-chunk-29-1.png)<!-- --> ] .pull-right[ ```r p + theme_fivethirtyeight() ``` ![](slides_files/figure-html/unnamed-chunk-30-1.png)<!-- --> ] --- class:inverse, center, middle # Modelling with Broom and Purrr --- # Broom - We have discussed how to go from your raw data to informative visualization plots. - From this section forward, we will use the same logic to go from your statistical models outputs to plots. - We will use David Robinson’s `broom` package to help us out, and the tidyverse package `purrr` to run the same thing multiple times (without loops) --- ### A Simple Model ```r # Separate the data biden <- national_polls %>% filter(Vote_Choice=="biden") %>% mutate(first_day=min(end.date, na.rm=TRUE), days=as.numeric(end.date-first_day)) # simple linear model lm_time <- lm(Vote~ days, data=biden) summary(lm_time) ``` ``` ## ## Call: ## lm(formula = Vote ~ days, data = biden) ## ## Residuals: ## Min 1Q Median 3Q Max ## -9.2236 -1.7612 0.0033 1.7480 7.5289 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 46.671697 0.317704 146.9 <2e-16 *** ## days 0.016358 0.001558 10.5 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.706 on 493 degrees of freedom ## Multiple R-squared: 0.1828, Adjusted R-squared: 0.1811 ## F-statistic: 110.2 on 1 and 493 DF, p-value: < 2.2e-16 ``` --- # Extract Quantities with Broom - `tidy`: to extract the model main parameters - `augment`: to extract observation-level statistics (predictions) - `glance`: to extract model-level statistics. --- ## Tidy: Extract Quantities ```r # a data frame results <- tidy(lm_time) results ``` ``` ## # A tibble: 2 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 46.7 0.318 147. 0 ## 2 days 0.0164 0.00156 10.5 2.06e-23 ``` --- # Augment: Predicted Values ```r augment(lm_time) ``` ``` ## # A tibble: 495 x 8 ## Vote days .fitted .resid .std.resid .hat .sigma .cooksd ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 52 305 51.7 0.339 0.126 0.00653 2.71 0.0000519 ## 2 52 304 51.6 0.355 0.132 0.00645 2.71 0.0000564 ## 3 51 304 51.6 -0.645 -0.239 0.00645 2.71 0.000185 ## 4 52 306 51.7 0.323 0.120 0.00661 2.71 0.0000476 ## 5 48 304 51.6 -3.64 -1.35 0.00645 2.70 0.00593 ## 6 50 305 51.7 -1.66 -0.616 0.00653 2.71 0.00125 ## 7 53 305 51.7 1.34 0.497 0.00653 2.71 0.000810 ## 8 53 306 51.7 1.32 0.491 0.00661 2.71 0.000800 ## 9 53 305 51.7 1.34 0.497 0.00653 2.71 0.000810 ## 10 50 306 51.7 -1.68 -0.622 0.00661 2.71 0.00129 ## # … with 485 more rows ``` --- # Glance: model-level statistics. ```r glance(lm_time) ``` ``` ## # A tibble: 1 x 12 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.183 0.181 2.71 110. 2.06e-23 1 -1194. 2394. 2407. ## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int> ``` ### Why is this so cool? - Tidyverse approach - It can be combined with a whole set of packages from tidyverse. - Returns a clean tibble. --- # Example: Plot The Predicted Values ```r # Plot augment(lm_time, se_fit = TRUE) %>% mutate(lb=.fitted - 1.96*.se.fit,ub=.fitted + 1.96*.se.fit) %>% ggplot(data=.) + geom_ribbon(aes(y=.fitted, ymin=lb, ymax=ub, x=days), alpha=.2) + geom_line(aes(y=.fitted, x=days), color="blue") + geom_point(aes(y = Vote, x=days), alpha=.2) ``` <img src="slides_files/figure-html/unnamed-chunk-35-1.png" width="50%" /> --- ## Running Multiple Models What if I want to run the same model for multiple subgroups? Or multiple different models? Use `purrr` for functional programming. This is where R and tidyverse gets really beautiful. The logic is simple. We will nest our data, run models in the subgroups, tidy the results, and unnest everything in a tidy format dataset. --- ## Nest the Data ```r # Step 1: Nest your data nested_data <- state_polls %>% filter(Vote_Choice=="biden") %>% mutate(first_day=min(end.date,na.rm=TRUE), days=as.numeric(end.date-first_day)) %>% * group_by(state) %>% * nest() ``` --- ## Nest the Data .pull-left[ ```r nested_data ``` ``` ## # A tibble: 47 x 2 ## # Groups: state [47] ## state data ## <chr> <list> ## 1 MT <tibble[,18] [18 × 18]> ## 2 ME <tibble[,18] [19 × 18]> ## 3 IA <tibble[,18] [32 × 18]> ## 4 WI <tibble[,18] [95 × 18]> ## 5 PA <tibble[,18] [112 × 18]> ## 6 NC <tibble[,18] [104 × 18]> ## 7 MI <tibble[,18] [112 × 18]> ## 8 FL <tibble[,18] [105 × 18]> ## 9 AZ <tibble[,18] [88 × 18]> ## 10 MN <tibble[,18] [34 × 18]> ## # … with 37 more rows ``` ] .pull-right[ - The data column is called a [list-column](https://jennybc.github.io/purrr-tutorial/ls13_list-columns.html) because it works as a list where every element has a entire dataset inside of it. - With a list of datasets, we can use functional programming in `purrr` to run the same models for each dataset. ] --- ## Run the Models ```r nested_data <- nested_data %>% * mutate(model=map(data, ~ lm(Vote~days, .x))) nested_data ``` ``` ## # A tibble: 47 x 3 ## # Groups: state [47] ## state data model ## <chr> <list> <list> ## 1 MT <tibble[,18] [18 × 18]> <lm> ## 2 ME <tibble[,18] [19 × 18]> <lm> ## 3 IA <tibble[,18] [32 × 18]> <lm> ## 4 WI <tibble[,18] [95 × 18]> <lm> ## 5 PA <tibble[,18] [112 × 18]> <lm> ## 6 NC <tibble[,18] [104 × 18]> <lm> ## 7 MI <tibble[,18] [112 × 18]> <lm> ## 8 FL <tibble[,18] [105 × 18]> <lm> ## 9 AZ <tibble[,18] [88 × 18]> <lm> ## 10 MN <tibble[,18] [34 × 18]> <lm> ## # … with 37 more rows ``` --- ## Unnest (All back to normal) ```r nested_data <- nested_data %>% mutate(results=map(model, tidy)) %>% * unnest(results) nested_data ``` ``` ## # A tibble: 94 x 8 ## # Groups: state [47] ## state data model term estimate std.error statistic p.value ## <chr> <list> <list> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 MT <tibble[,18] [1… <lm> (Interc… 35.5 1.58 22.5 1.59e- 13 ## 2 MT <tibble[,18] [1… <lm> days 0.0369 0.00725 5.09 1.09e- 4 ## 3 ME <tibble[,18] [1… <lm> (Interc… 50.6 2.73 18.5 1.06e- 12 ## 4 ME <tibble[,18] [1… <lm> days 0.00517 0.0127 0.407 6.89e- 1 ## 5 IA <tibble[,18] [3… <lm> (Interc… 42.6 1.53 27.8 5.68e- 23 ## 6 IA <tibble[,18] [3… <lm> days 0.0172 0.00690 2.50 1.81e- 2 ## 7 WI <tibble[,18] [9… <lm> (Interc… 44.8 0.603 74.2 6.76e- 84 ## 8 WI <tibble[,18] [9… <lm> days 0.0259 0.00294 8.83 6.60e- 14 ## 9 PA <tibble[,18] [1… <lm> (Interc… 46.4 0.486 95.4 7.18e-107 ## 10 PA <tibble[,18] [1… <lm> days 0.0172 0.00224 7.68 7.20e- 12 ## # … with 84 more rows ``` --- ## Outputs from Unnest ```r # first, remove the intercept to_plot <- nested_data %>% filter(term!="(Intercept)") %>% mutate(ub=estimate+1.96*std.error, lb=estimate-1.96*std.error) %>% drop_na() # graph ggplot(to_plot, aes(x=fct_rev(state),y=estimate, ymin=lb, ymax=ub)) + geom_pointrange(shape=21, fill="blue", color="black", alpha=.8) + geom_hline(yintercept = 0, linetype="dashed", color="gray") + coord_flip() + theme_minimal() + labs(x = "Linear Time Trend by State", y= "Biden Support in the Polls") ``` --- class: center, middle ![](slides_files/figure-html/unnamed-chunk-41-1.png)<!-- --> --- class:inverse, center, middle # Case Study: ## Partisanship, Covid and Risk Perceptions in Brazil. --- # An Example of my Workflow -- - To conclude our workshop, I will show you the code of my recent paper (co-authored with Ernesto Calvo) forthcoming at the Latin American Politics and Society. - The paper is about partisanship and risk perceptions about COVID-19. I will focus on the descriptive analysis and the simple regression models we use to show partisan difference of risk perceptions in Brazil. - The paper and replication files can be found [here](https://github.com/TiagoVentura/Calvo_Ventura_LAPS_2021). - **Our Goal**: A model of partisanship on three different outcomes. -- --- ## Step1: Tidy Your Data ```r load("CV_data.Rdata") library(tidyverse) library(tidyr) # Untidy d %>% select(covid_job, covid_health, covid_government) ``` ``` ## # A tibble: 2,362 x 3 ## covid_job covid_health covid_government ## <fct> <fct> <fct> ## 1 Very unlikely Somewhat unlikely Somewhat Unappropriate ## 2 Very unlikely Somewhat Likely Somewhat Appropriate ## 3 Very Likely Very Likely Very Appropriate ## 4 Very Likely Very Likely Somewhat Unappropriate ## 5 Somewhat Likely Somewhat Likely Somewhat Appropriate ## 6 Somewhat Likely Somewhat unlikely Somewhat Appropriate ## 7 Very unlikely Somewhat Likely Very Appropriate ## 8 Somewhat unlikely Somewhat Likely Very Appropriate ## 9 Very unlikely Somewhat Likely Somewhat Unappropriate ## 10 Very Likely Somewhat unlikely Somewhat Unappropriate ## # … with 2,352 more rows ``` --- ## Make it tidy ```r d_pivot <- d %>% pivot_longer(cols=c(covid_job, covid_health, covid_government), names_to="covid", values_to="covid_values") ``` --- ## What do I have now? ``` ## # A tibble: 7,086 x 2 ## covid covid_values ## <chr> <fct> ## 1 covid_job Very unlikely ## 2 covid_health Somewhat unlikely ## 3 covid_government Somewhat Unappropriate ## 4 covid_job Very unlikely ## 5 covid_health Somewhat Likely ## 6 covid_government Somewhat Appropriate ## 7 covid_job Very Likely ## 8 covid_health Very Likely ## 9 covid_government Very Appropriate ## 10 covid_job Very Likely ## # … with 7,076 more rows ``` --- ## Nest and Models ```r data_nested <- d_pivot %>% group_by(covid) %>% nest() %>% mutate(model=map(data, ~ * lm(as.numeric(covid_values) ~ * runoff_haddad + * runoff_bolsonaro + * income + gender + work + * as.numeric(education) + age , data=.x)), res=map(model,tidy)) %>% unnest(res) %>% mutate(lb=estimate - 1.96*std.error, up= estimate + 1.96*std.error) ``` **Everything we need is here: group_by, nest, model, unnest. ** .center[ ![](https://media.giphy.com/media/wbcMnfHqOJX9K/giphy.gif) ] --- # Next (Important Steps) - Fix the labels. - **Get your labels correct before plotting**. - By correct I mean: names and order. -- ```r to_plot <- data_nested %>% filter(str_detect(term, "runoff")) %>% mutate(labels_iv=fct_recode(term, "Haddad Voters"="runoff_haddadOn", "Bolsonaro Voters"="runoff_bolsonaroOn")) %>% mutate(outcome= ifelse(covid=="covid_job", "How likely is it that you \n could lose your job? ", ifelse(covid=="covid_health", "How likely will your health \n be affected by COVID-19?", "Has the government response \n been appropriate ?"))) ``` --- # Final Plot ```r #pick my colors pal <- RColorBrewer::brewer.pal(9, "Spectral") #graph ggplot(to_plot, aes(y=estimate, x=labels_iv, ymin=up, ymax=lb, color=labels_iv)) + geom_pointrange(shape=21, fill="white", size=2) + labs(x="", y="Point Estimates", title = "\nPartisanship, Risk Perceptions and Government Responses to Covid in Brazil", subtitle = "Regression Estimates with Controls by Income, Gender, Age, Education, and Occupation.", caption ="Note: Baseline are Independent Voters") + geom_hline(yintercept = 0, linetype="dashed", color="darkred") + scale_color_manual(values=c("Bolsonaro Voters"=pal[9], "Haddad Voters"=pal[1]), name="Who would you vote for?") + facet_wrap(~outcome) + theme_bw() + theme(strip.text = element_text(size=7), axis.text.x = element_blank()) ``` --- class: center, middle ![](slides_files/figure-html/unnamed-chunk-48-1.png)<!-- -->