Exploratory Data Analysis: A Growing Toolbox Illustrated by Long-Term Ecosystem Monitoring Data from the UK Environmental Change Network
Datasets are collected as part of designed experiments, observational studies, automatic recordings, or composed from other datasets. Exploratory data analysis (EDA) is an essential first phase that sets the scene for decided which downstream analyses would best address your research questions.
There is no universally applicable recipe for how EDA should be carried out because this highly depends on the structure of the data, the scientific context, and the research questions, but there is a broad agreement on its goals. In this talk we shed light on the evolving range of available tools considering the origin, structure, size, and quality of the data. The is particularly important prior to the application of machine learning techniques where raw data characteristics can have a large impact, potentially more than intended. We look EDA from a historical perspective focussing on its beginnings und the leadership of the father of data sciences, John Tukey. We also consider it from a modern lens addressing the requirements triggered by the use of data in conjunction with the AI (Artificial Intelligence) development. Furthermore, we discuss how AI could potentially be used to support this work.