Regression without regrets
Abstract:
Statistical model building for prediction in the life sciences must consider properties of the collected data that cannot easily be anticipated. Overlooked unexpected data properties, unsystematic exploratory data analysis and/or lack of transparent reporting may threaten the validity and reproducibility of prediction models. We developed a general strategy to screen data before fitting a prediction model. Our approach relies on criteria for data screening that can be integrated in electronic laboratory notebooks to improve transparency and reproducibility of methodological decisions made during prediction modelling. Such an initial data analysis supports the analyst by suggesting modifications to the original statistical analysis plan, and by guiding interpretation and presentation without compromising modelling results. We demonstrate the utility of our proposal in an application involving diagnostic prediction of bacteraemia with 50 laboratory variables. Time permitting an extension of a data screening checklist to longitudinal studies may also be discussed. This is based on a collaboration with an international group of applied statisticians with extensive experience in observational studies and discussions with domain experts.
Zoom info: https://stt.natsci.msu.edu/calendar/colloquium-marianne-huebner2/