Chapter 1 Data Analysis

Data analysis is a term that describes the process of turning raw data into knowledge. This involves data import, data wrangling, statistical data analysis, and communicating the findings.

Grolemund & Wickham, 2017 presented this as a diagram.

The diagram makes clear, that the data analysis workflow consists of different modules that depend on the previous one, and that the flow is directional: an entrance, iteration and an exit. It also emphasises that tidy data is important for the analysis, that understanding is an iterative process, and that the analysis is a portable & isolated box. Lastly, communicating/sharing of results is key.

If you would like to further read about the different modules, have a look at the R for Data Science book.

1.1 Data analysis in reality

In reality, data wrangling is rarely a linear process, but instead a back and forth between modules. Therefore, flexibility in moving between these modules with software, along with automation is important. This can be achieved with written programs that can be re-executed at any time.

R 4 DS image

1.2 Role of Excel

If you have used another tool previously, for example, Excel, then this will still be relevant for data entry & storage.

The article Data Organization in Spreadsheets by Broman & Woo (2018) provides recommendations for organizing spreadsheet data in a way that both humans and computer programs can read.

R 4 DS image

1.3 Requirements to data analysis software

Apart from moving through modules with ease, what other criteria are important when choosing a data analysis software?

Software should be used by others (otherwise, sharing code becomes difficult), be continuously developed & improved (new data formats, new communication formats), easy to use (high-level language).

1.4 More

If you are interested in the data analysis workflow in general, have a look at the R for Data Science book.

And if you are interested in another formalisation of data analysis have a look at recent work by Hicks & Peng, 2019. Elements & Principles lays out the elements that build a data analysis and the principles in assembling them.