Garbage In Garbage Out: Is your data worth the paper it is not written on
Objectives:
In our data-driven world, with decisions being increasingly based on data, their quality is absolutely key. However, while significant resources are invested developing sophisticated AI algorithms, data quality is often neglected. For example, in the research community, there are many articles and publications that detail the statistical methodology, but rarely does one find details of any data checks. Why is this? Is data checking assumed to be “taken as read”, or is it just thought of as too boring and not worth a mention? The aim of this talk is to show how methods for checking data quality can be fun and just as interesting as the analysis itself, and to advocate how important this is.
Methods:
After introducing the principles of data quality dimensions, and comparing the different terminologies used by the database and statistical communities, I shall give examples of methods for checking data and for identifying suspicious patterns. These will mainly be from studies I have carried out myself where I found myself (unwittingly) being transformed from a data scientist/statistician into a “data quality detective”. I shall focus on multivariate methods as these lend themselves very well to data to be used in data science and AI applications.
*This seminar is available for RECR Credit, 1.0 Hours, Attendance will be verified and a survey must be completed afterwards with well thought out responses to receive RECR Credit.