Ten Simple Rules for Initial Data Analysis
Initial data analysis (IDA) is a common task for all quantitative research. Do not underestimate the challenge of an effective IDA to conduct all necessary phases, but also do not underestimate the return on this investment. It is a crucial step and aims for transparency and integrity by providing you with an analysis-ready data set and reliable information about its properties that enables you to perform the statistical analyses in a responsible manner and interpret the obtained results. We explain or illustrate each of the following rules with examples.
- Develop an IDA plan that supports the research objective
- IDA takes time and resources
- Make IDA reproducible
- Context matters: know your data
- Avoid sneak peeks - IDA does not touch the research question
- Visualize your data
- Check for what is missing
- Communicate the findings and consider the consequences
- Report IDA findings in research papers
- Be proactive and rigorous
Following the Ten Simple Rules can also help your future Self to reliably reuse your data and research outputs, by making the often-hidden decisions of data analysis more transparent, by helping to separate the phases of IDA from the final data analysis, and publishing all relevant research materials including metadata, code and IDA reports.
The 10 rules are based on extensive experience with research projects, collaborations with domain experts, and discussions among an international group of applied statisticians. To appear in PLOS Computational Biology, Ten Simple Rules Series.