class: center, middle, inverse, title-slide # .large[Robust data analysis] ## An introduction to R ### Sina Rüeger & Allie Burns ### 2019-09-05 --- <!--------- https://www.helpessentials.com/2017/08/02/sprinkle-some-google-fonts-into-your-projects/ ---------> <!---------------------- ToC ------------------------------> --- class: left, middle # Who we are .pull-left-large[ ## Sina - **Background** in Data Science and Biostatistics - Works with genomic & clinical **data**
<small>@sinarueeger</small><br />
<small>@sinarueeger</small><br /> <!-----
<small>sinarueeger.github.io</small> --------> ] .pull-right-small[ ## Allie - **Background** in Biology and Bioinformatics - Works with genomic & experimental **data** <!----- **Uses R** to run the statistics and visualize experimental data ----->
<small>@imallieburns</small><br />
<small>@allie-burns</small><br /> ] <!---------------------- SLIDE info ------------------------------> <!--- adding a footer https://github.com/yihui/xaringan/wiki/Footer-and-header-lines ------> --- class: left, middle ## Slides ### http://bit.ly/rds-slides ## Handbook ### http://bit.ly/rds-handbook ## RStudio Cloud ### http://bit.ly/rds-rstudio <!---------------------- ToC ------------------------------> --- class: left, middle <div class="figure"> <img src="https://raw.githubusercontent.com/allisonhorst/stats-illustrations/master/openscapes/starwars-rey-rstats.png" width="600" alt="Cover image" /> <p class="caption">Artwork by <a href="https://github.com/allisonhorst/stats-illustrations">@allison_horst</a> for <a href="https://www.openscapes.org/">openscapes</a></p> </div> --- class: left, middle ## 1. [Data Analysis](#part-1-data-analysis) <!------- What makes a Data Analysis robust?------> ## 2. [Introducing R](#part-2-r) <!--------How can R help with that?-----------> ## 3. [Workshop](#part-3-workshop) ## 4. [Beyond the Basics](#part-4-beyond-the-basics) ## 5. [Learning R](#part-5-learning-r) <!-------------------------------------------------------> <!---------------------- PART 1 -------------------------> <!-------------------------------------------------------> --- class: left, middle, inverse # Part 1 # <a name="part-1-data-analysis"></a>Data Analysis --- class: left, middle # Data Analysis Workflow <div class="figure"> <img src="img/workflow/workflow.001.jpeg" width="800" alt="R 4 DS image" /> <caption>Adapted from <a href="https://r4ds.had.co.nz/introduction.html">R for Data Science by Grolemund & Wickham, 2017.</a></caption> </div> --- exclude: true # Emojification of Data Analysis Workflow ### 🤔
<!--- question > getting data > analysing / distill knowledge > do soemthing with that --> <!------`Question > Data > Analyse data with a tool > Distill knowlege from data > Feel enlightened :-) > Take decisions` -------> --- exclude: true <div class="figure"> <img src="https://github.com/allisonhorst/stats-illustrations/raw/master/openscapes/environmental-data-science-r4ds.png" width="700" alt="Cover image" /> <p class="caption">Artwork by <a href="https://github.com/allisonhorst/stats-illustrations">@allison_horst</a> for <a href="https://www.openscapes.org/">openscapes</a></p> </div> --- class: left, middle # Principles - Tidy data - Distinct "modules"" - Module dependency - Flow: Entrance → Iteration → Exit - Communication & sharing of results - Isolated box --- class: left, middle # In reality <div class="figure"> <img src="img/workflow/workflow.003.jpeg" width="800" alt="R 4 DS image" /> </div> ??? _Revisiting modules many many times: robust DA when this works at easy_. _Draw figure on whiteboard_. --- class: left, middle # Role of Excel <div class="figure"> <img src="img/workflow/workflow.002.jpeg" width="800" alt="R 4 DS image" /> </div> --- class: left, middle # Role of Excel - Use Excel for _data entry & storage_ - Organise data for humans and computer programs - [Data Organization in Spreadsheets](https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989) by Broman & Woo (2018) --- class: left, middle # Requirements to Software - Translation of modules to scripts (high-level language) - Easy updating of modules - Sharing ready results + scripts - Used by others - Continuously developed & improved <!-------------------------------------------------------> <!--------------- Thought experiment --------------------> <!-------------------------------------------------------> --- exclude: true # Thought experiment --- exclude: true ## Q1: ### What is the most annoying & repetitive task at work? ## Q2: ### How could you automate it? --- exclude: true # Automation Programming is essentially an automation of a process - for common tasks - scalable - complicated tasks - where precision is needed - to reuse frameworks - to redo something from the past Side effect: reproducibility. --- class: left, middle, inverse <!-------------------------------------------------------> <!---------------------- PART 2 -------------------------> <!-------------------------------------------------------> --- class: left, middle, inverse # Part 2 # <a name="part-2-r"></a>R --- class: left, middle > <p><small>As developers, “tidyevaluations” help us make sure the user does as little typing as possible and can express really rich ideas [for analysis]. This is what underlies ggplot2 and some of our other libraries.</p></small> The idea is to get things out of your head and on to the computer as quickly as possible. [_Interview with Hadley Wickham, Quartz, 2019_](https://qz.com/1661487/hadley-wickham-on-the-future-of-r-python-and-the-tidyverse/) --- class: left, middle # R - is "a programming language for statistical computing" - is free - has a webpage: [r-project.org/](https://www.r-project.org/) - just celebrated its [25 year anniversary](https://rss.onlinelibrary.wiley.com/doi/10.1111/j.1740-9713.2018.01169.x) - comes with *basic*/*default* packages, but there are over 13'000 R-packages that can be by users. --- class: left, middle # What is R used for? --- class: middle # Quite a few things Data science, data analysis & stats stuff ??? _sdflkjfdksjdf_. --- class: left, middle # Capabilities of R --- class: left, middle ## Communication of results - through literate programming - through a web application ## Data visualisation ## Accessing other tools or data sources through APIs ??? _Beyond being a calculator and being able to estimate statistical models, R can help us to organise the data analysis workflow better. _ --- class: left, middle # R is constantly evolving & improving ??? _Other biologists_ --- class: left, middle # Some terms - Function - Object - Package - Script --- class: left, middle ## Object - A "physical" object. - For example, a dataset or a number. --- class: left, middle ## Function - A pre-written [recipe](https://twitter.com/IsabellaGhement/status/1070691303485128704). - Functions can be applied to objects. - For example, `mean()` is a function in R. - Function = action = verb, applied to object = noun. --- class: left, middle ## Script - Chain of functions + objects create a _programming script_. --- class: left, middle ## Package - Aackage is a **collection of functions**. - Anyone can contribute a package. --- class: left, middle # R vs RStudio --- class: left, middle # Programming language vs. natural language - Similar to natural languages: tool to communicate with others. - Programming: communication with computer (+colleagues). --- exclude: true # Hierarchy of best practices 1. documentation (`lintr`) 1. run all script (`usethis`) 1. version control (`gitr`) 1. unit tests and sanity checks (`testthat`, `assertr`) 1. write functions, package them and tell everyone (`devtools`, `blogdown`) 1. continuous integration (`devtools`, `blogdown`) 1. makefile (caching) (`drake`) 1. binder (`holepunch`) --- class: left, middle, inverse <!-------------------------------------------------------> <!---------------------- PART 3 -------------------------> <!-------------------------------------------------------> --- class: left, middle, inverse # Part 3 # <a name="part-3-workshop"></a>Workshop --- class: left, middle # Goal <figure> <img src="img/workflow/workflow.001.jpeg" width="300" alt="R 4 DS image" /> </figure> 1. Make a data visualisation. 2. Import data into R. 3. Make the data analysis-ready. 4. Create a report that gives insights into meteorite falls around the world. ??? _two steps: make changes + add things_. --- class: left, middle # Get started 1. Go to RStudio Cloud: http://bit.ly/rds-rstudio. 2. Log in (this might take a while). 3. Once logged in, RStudio Cloud will tell you that you have been added to a workspace. 4. Click on `Project`. 5. Choose `Workshop` 6. Click `Make a permanent copy`. --- class: left, middle, inverse <!-------------------------------------------------------> <!---------------------- PART 4 -------------------------> <!-------------------------------------------------------> --- class: left, middle, inverse # Part 4 # <a name="part-4-beyond-the-basics"></a>Beyond the Basics --- class: center, middle <div class="figure"> <img src="https://raw.githubusercontent.com/allisonhorst/stats-illustrations/master/rstats-artwork/r_first_then.png" height="500" alt="Cover image" /> <p class="caption">Artwork by <a href="https://github.com/allisonhorst/stats-illustrations">@allison_horst</a></p> </div> --- class: left, middle <div class="figure"> <img src="https://happygitwithr.com/img/watch-me-diff-watch-me-rebase-smaller.png " width="700" alt="Cover image" /> <p class="caption">Source: <a href="https://happygitwithr.com/">Happy Git with R by Jenny Bryan</a></p> </div> --- class: left, middle # Git & R -- 1. Use git. -- 2. Read [*Happy Git with R*](https://happygitwithr.com/). -- 3. Use the handy *git interface* in RStudio. -- 4. 🎉 -- Tipp: Use [gist.github.com](https://gist.github.com/) to share & store code snippets, notes, thoughts and blogposts. --- class: left, middle # R in the wild --- class: left, middle ## Research compendium <div class="figure"> <a href="https://github.com/seabbs/DirectEffBCGPolicyChange"> <img border="0" alt="mouse" src="img/bcg.png" width="600"> </a> <p class="caption">Source: <a href="https://github.com/seabbs/DirectEffBCGPolicyChange">Effect of BCG policy change in England.</a></p> </div> --- class: left, middle ## Blogpost <div class="figure"> <a href="https://notstatschat.rbind.io/2019/06/16/analysing-the-mouse-autism-data/"> <img border="0" alt="mouse" src="img/mouse.png" width="400"> </a> <p class="caption">Source: <a href="https://notstatschat.rbind.io/2019/06/16/analysing-the-mouse-autism-data/">Analysing mouse autism data</a></p> </div> --- class: left, middle ## Data journalism <div class="figure"> <a href="https://www.srf.ch/static/srf-data/data/2018/federer/#/en"> <img border="0" alt="SRF" src="img/datajournalism.png" width="700"> </a> <p class="caption">Source: <a href="https://www.srf.ch/static/srf-data/data/2018/federer/#/en">SRF</a></p> </div> --- ## Data journalism <div class="figure"> <a href="https://srfdata.github.io/2018-01-roger-federer/#load_data"> <img border="0" alt="SRF" src="img/datajournalism_code.png" width="800"> </a> <p class="caption">Source: <a href="https://www.srf.ch/static/srf-data/data/2018/federer/#/en">SRF</a></p> </div> --- ## Web application with Shiny <div class="figure"> <a href="https://kevinrue.shinyapps.io/isee-shiny-contest/"> <img border="0" alt="SRF" src="https://d33wubrfki0l68.cloudfront.net/8e0d1f45ed43ea538143b4b3cc7ea1cd7ad17e69/33284/images/2019-04-05-isee.png" width="800"> </a> <p class="caption">Source: <a href="https://blog.rstudio.com/2019/04/05/first-shiny-contest-winners/">Winner of the Shiny Contest 2019</a></p> </div> --- ## Animations <div class="figure"> <a href="https://raw.githubusercontent.com/gadenbuie/tidy-animated-verbs/master/images/anti-join.gif"> <img src="https://raw.githubusercontent.com/gadenbuie/tidy-animated-verbs/master/images/anti-join.gif" width="400"> </a> <p class="caption">Source: <a href="https://github.com/gadenbuie/tidy-animated-verbs#tidy-animated-verbs">Animated verbs by @gadenbuie</a></p> </div> --- ## Fun things: Memes <div class="figure"> <a href="http://djnavarro.net/post/2018-05-03-valid-social-commentary/"> <img src="http://djnavarro.net/post/2018-05-03-valid-social-commentary_files/figure-html/unnamed-chunk-2-1.png" width="500"> </a> <p class="caption">Source: <a href="http://djnavarro.net/post/2018-05-03-valid-social-commentary/">Danielle Navaro</a></p> </div> --- class: left, middle # Neat packages & functions --- class: left, middle ## Data import - [`data.table`](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html): (Fast) alternative to `readr`. - [`datapasta`](https://twitter.com/dataandme/status/1148548556850114561): Copy + paste unconventional tables. - [`fabricatr`](https://rviews.rstudio.com/2019/07/01/imagine-your-data-before-you-collect-it/): Imagine your data before you collect it. - Practical recommendations for organizing spreadsheet data in a way that both humans and computer programs can read: [Data Organization in Spreadsheets](https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989) by Broman & Woo (2018). --- class: left, middle ## Data manipulation - [`data.table`](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html): (Fast) alternative to `dplyr`. - [`janitor`](http://sfirke.github.io/janitor/): data cleaning. --- class: left, middle ## Data visualisation - [`esquisse`](https://github.com/dreamRs/esquisse) to interactively create a plot. - Publication ready plots with [`cowplot`](https://github.com/wilkelab/cowplot) and [`ggpubr`](https://github.com/kassambara/ggpubr). - Pick a chart type with [R Graph Gallery](https://www.r-graph-gallery.com/). --- class: left, bottom ## ggpubr <div class="figure"> <img src="https://rpkgs.datanovia.com/ggpubr/tools/README-ggpubr-box-plot-dot-plots-strip-charts-2.png" width="350" alt="Cover image" /> </div> --- class: left, middle ## holepunch Make your package binder-ready with [holepunch](https://github.com/karthik/holepunch). <div class="figure"> <img src="https://camo.githubusercontent.com/fd4274e8efa5ef6e2e096176bf75465c4746c667/68747470733a2f2f692e696d6775722e636f6d2f6f71576c3531322e706e67" width="600" alt="Cover image" /> <p class="caption">Source: <a href="https://github.com/karthik/holepunch">github/holepunch</a></p> </div> --- class: left, middle, inverse <!-------------------------------------------------------> <!---------------------- PART 5 -------------------------> <!-------------------------------------------------------> --- class: left, middle, inverse # Part 5 # <a name="part-5-learning-r"></a>Learning R ??? _Any ideas?_. <!-----------------------------------------------> --- class: left, middle # Learning strategies <!------- - Learn with isolated & digestible examples - Sources of examples: tidytuesday, advent of code / tidies of march (irene steves) - Surround yourself with the language: embed R into your life - Listen to podcasts or watch videos: https://www.rstats.nyc/2019/nyr/ / community calls: https://ropensci.org/commcalls/ - Make use of pen + paper --------> --- class: left, middle ## 1. Learn with isolated & digestible examples --- class: left, middle ## 2. Look for a steady stream of data or exercises [TidyTuesday](https://github.com/rfordatascience/tidytuesday)! --- class: left, middle ## 3. Watch recordings - Screencasts - New York R conference [recordings](https://www.rstats.nyc/2019/nyr/) - R conference [recordings](https://www.youtube.com/channel/UC_R5smHVXRYGhZYDJsnXTwg) - RStudio conference [recordings](https://resources.rstudio.com/rstudio-conf-2019) - rOpenSci community call [archive](https://ropensci.org/commcalls/) <!------- or attend a conference, screen casts ---------> --- class: left, middle ## 4. Read blogposts ![](https://camo.githubusercontent.com/42ba3f035a08ef82195b24b4f39452b63ecbb469/68747470733a2f2f727765656b6c792e6f72672f696d616765732f69636f6e732f69636f6e2d323536783235362e706e67) Weekly supply of blogposts by [rweekly.org/](https://rweekly.org/). <!-----------------------------------------------> --- class: left, middle # Embrace imperfection --- class: left, middle ### Programming is an iterative process ### 1 Problem → 1+ solutions --- class: left, middle ## Exploit imperfection - Look at & review each others code - Rewrite your code - Look at open source code <!-----------------------------------------------> --- class: left, middle # How to ask for help --- class: left, middle ## 1. Check if question has been ask before - [Stack Overflow](https://stackoverflow.com/) (SO) - [RStudio Community](https://community.rstudio.com/) --- class: left, middle <div class="figure"> <img src="https://raw.githubusercontent.com/allisonhorst/stats-illustrations/master/rstats-artwork/reprex.png" height="500" alt="Cover image" /> <p class="caption">Artwork by <a href="https://github.com/allisonhorst/stats-illustrations">@allison_horst</a></p> </div> --- class: left, middle ## 2. Create a reproducible example 1. create a reproducible example using a small R available dataset (e.g. iris) 1. `install.packages("reprex")` 1. select code and run [reprex](https://reprex.tidyverse.org/) <!---- this will often already guide you to the solution -------> --- class: left, middle ## 3. Ask your question - Ask within an online community - [RStudio community](https://community.rstudio.com/) - [Stack Overflow](https://stackoverflow.com/) - Twitter using the `#rstats` hashtag <!------- More: https://masalmon.eu/2018/07/22/wheretogethelp/ - If its a bug: file an issue on github Why not writing directly to the maintainer? Because online helps -----> <!-----------------------------------------------> <!------------ - blogposts (rweekly!) - R user group (RUG, R-Ladies) - list: https://jumpingrivers.github.io/meetingsR/r-user-groups.html (if you know a meetup, make a PR) - CoC - R-Ladies - French [slack](https://r-grrr.slack.com/join/shared_invite/enQtMzI4MzgwNTc4OTAxLWZlOGZiZTBiMWU0NDQ3OTYzOGE1YThiODgwZWNhNWEyYjI4ZDJiNmNhY2YyYWI5YzFiOTFkNDYxYzkwODUwNWM) + [online ressources](https://github.com/frrrenchies/frrrenchies) - other languages - R for DS (slack, tidytuesday) - ropensci community calls / shinyapp (https://ropensci.shinyapps.io/contributr/) - engage on twitter ([R for the rest of us](https://twitter.com/rfortherest) + [weareRLadies](https://twitter.com/WeAreRLadies/status/1154698236583698432) + [Mara Averick](https://twitter.com/dataandme) + [maelle Salmon](https://twitter.com/ma_salmon)) -----> --- class: center, middle # Become part of the community <div class="figure"> <img src="https://raw.githubusercontent.com/allisonhorst/stats-illustrations/master/rstats-artwork/welcome_to_rstats_twitter.png" height="400" alt="Cover image" /> <p class="caption">Artwork by <a href="https://github.com/allisonhorst/stats-illustrations">@allison_horst</a></p> </div> --- class: left, middle ## [R-Ladies](https://rladies.org/) <div class="figure"> <img src="img/rladies-logo.png" width="400" alt="Cover image" /> </div> --- class: left, middle ### 47 countries, 167 chapters <div class="figure"> <img src="img/rladies-global.png" width="600" alt="Cover image" /> </div> --- class: left, middle .pull-left[ <div class="figure"> <img src="https://secure.meetupstatic.com/photos/event/5/3/a/8/600_480981416.jpeg" width="600" alt="Cover image" /> </div> ] .pull-right[ <div class="figure"> <img src="https://pbs.twimg.com/media/DbUq6zqU0AE2_Ls.jpg" width="600" alt="Cover image" /> </div> ] --- class: left, middle ### R-Ladies at useR!2019 <div class="figure"> <img src="https://pbs.twimg.com/media/D_HYayHWsAAWO7t?format=jpg&name=4096x4096" width="700" alt="Cover image" /> </div> --- class: left, middle ### What does R-Ladies do? R-Ladies has a [Code of Conduct](https://rladies.org/code-of-conduct/). For women and gender minorities R-Ladies offers: - [Slack](https://rladies-community-slack.herokuapp.com/) - [Abstract / Scholarship review process](tinyurl.com/rladiesabstracts) - [R-Ladies directory](http://rladies.org/directory/) --- class: left, middle ### Want to start a chapter? <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Interested in starting an <a href="https://twitter.com/hashtag/RLadies?src=hash&ref_src=twsrc%5Etfw">#RLadies</a> meetup in your city? We'd love to help! Send us an email at info@rladies.org to get in touch. <a href="https://twitter.com/hashtag/rstats?src=hash&ref_src=twsrc%5Etfw">#rstats</a></p>— R-Ladies Global (@RLadiesGlobal) <a href="https://twitter.com/RLadiesGlobal/status/770662914365808640?ref_src=twsrc%5Etfw">August 30, 2016</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> --- class: left, middle ## Online Communities - [R for Data Science](https://www.rfordatasci.com/) online learning community (e.g. [Slack](https://rfordatascience.slack.com/join/shared_invite/enQtMzA1Nzk1MjIzNDczLTY0OTVlMzM3ZTU5ZjA3NWE5ZDkxOGVmNjRjODQ2YmRjMzg4NWQxMDAxZTcwNzViZTczOThiNzBhYWJhZDM2ZTU), [TidyTuesday](https://github.com/rfordatascience/tidytuesday)) - French-speaking [r-grrr Slack](https://r-grrr.slack.com/join/shared_invite/enQtMzI4MzgwNTc4OTAxLWZlOGZiZTBiMWU0NDQ3OTYzOGE1YThiODgwZWNhNWEyYjI4ZDJiNmNhY2YyYWI5YzFiOTFkNDYxYzkwODUwNWM) (+ [online resources](https://github.com/frrrenchies/frrrenchies)) --- class: left, middle ## [rOpenSci](https://ropensci.org) - Non-profit initiative founded in 2011. - **Builds software** with a community of users and developers. - **Educates scientists** about transparent research practices. - Check [their handbook](https://devguide.ropensci.org/) on _Development, Maintenance, and Peer Review_. - Attend or re-watch a community call ([archive](https://ropensci.org/commcalls/)). --- class: left, middle ## Join an R meetup --- class: left, middle .pull-left[ ### Switzerland - [R-Ladies Lausanne](https://www.meetup.com/rladies-lausanne/) - [Geneve RUG](https://www.meetup.com/Geneve-R-User-Group) - [R lunches](http://use-r-carlvogt.github.io/prochains-lunchs/) - [adminR](https://www.meetup.com/adminR/) - [Zurich RUG](https://www.meetup.com/Zurich-R-User-Group/) ] .pull-right[ ### [R Ladies Remote](https://twitter.com/RLadiesRemote) - Coffee chats - Journal club - For women & gender minorities ### Around the world Full list of RUG's [here](https://jumpingrivers.github.io/meetingsR/r-user-groups.html) and for R-Ladies [here](https://gqueiroz.shinyapps.io/rshinylady/). ] --- exclude: true ## Write blogposts - Start with [gist.github.com](https://gist.github.com/) - Move to [`blogdown`](https://bookdown.org/yihui/blogdown/) - Submit your post to [rweekly](https://rweekly.org/)! --- class: left, middle ## Engage on Twitter Lots of cool #rstats folks on twitter. For example: - [Mara Averick: @dataandme](https://twitter.com/dataandme) - [Maëlle Salmon: @ma_salmon](https://twitter.com/ma_salmon) - [WeAreRLadies: @WeAreRLadies](https://twitter.com/WeAreRLadies) - [R for the rest of us: @rfortherest](https://twitter.com/rfortherest) --- class: left, middle <div class="figure"> <img src="https://raw.githubusercontent.com/allisonhorst/stats-illustrations/master/openscapes/starwars-teamwork.png" width="600" alt="Cover image" /> <p class="caption">Artwork by <a href="https://github.com/allisonhorst/stats-illustrations">@allison_horst</a> for <a href="https://www.openscapes.org/">openscapes</a></p> </div> <!-----------------------------------------------> --- class: center, middle # .large[Questions?]