Initial Data Analysis
Initial Data Analysis (IDA) aims to provide reliable knowledge about the data to enable responsible statistical analyses and interpretation. IDA consists of all steps performed on the data of a study between the end of the data collection and start of those statistical analyses that address research questions. If IDA is performed in an informal or unstructured way, this may have a large and non-transparent impact on results and conclusions presented in publications and contributes to (non-)reproducibility issues.
- Framework for IDA consisting of six steps. While the focus is on primary-data collections where data are obtained to address a predefined set of research questions, with an analysis plan, IDA is often performed in more complex studies raising additional issues such as an implementation of IDA processes during ongoing data collections to detect data issues while they are potentially remediable.
- Examples of IDA workflow in open access data sets with R code. Appropriate graphical and analytical tools enable a researcher to perform IDA in order to avoid misinterpretation, poor presentation and analysis errors. These necessary preparations are too often forgotten, even by experienced data analysts.
No prior knowledge of statistical software is required but some R code/functions will be pointed out for those interested.
Here is a 5 minute video discussing IDA: https://mediaspace.msu.edu/media/Only+fools+rush+in%21+-+Initial+data+analysis+is+equired+for+developing+and+validating+prediction+models/1_1n07g3j0