I am more a user of statistics than a producer of statistics. Digging out data and comparing statistics from various sources can take up a good part of my workday. Communicating data and analyses to a wider audience form a more critical part of my role, and this sometimes includes visualizations of uncertainty.
The main reason I try to communicate statistical uncertainty is to help people better understand a topic through the sound use of trustworthy data.
For example, I recently worked with walking and cycling data published by the Department for Transport. I reported on statistics for Oxford with Cambridge and England as its comparators. Since the data tables included the confidence intervals, I included these in my figures so we could draw conclusions on whether the differences between areas were statistically significant.
Emphasizing uncertainty is not something I do every time I cite a statistic. I could do a whole post on examples of scenarios. For now, here are three of many reasons why uncertainty should be communicated. None of this is mind-blowing, but sometimes a light read on statistical uncertainty is just what an analyst needs on a Friday afternoon. Enjoy!
1. Statistics can be confused as the truth
Perhaps this has also happened to you. The inevitable “revised” dataset is released, and the “provisional” data needs to be updated. Or, a dataset has been corrected, and the original figures are no longer accurate. Either way, there is a need to explain why the statistics reported three months ago now needs to be changed.
In cases such as these, it is usually a good idea to double-check the language used to describe the reliability of the data. Keywords like “estimated”, “provisional”, “modelled”, “likely”, and “approximately” should be used when appropriate to help people understand statistics better. Statistics are not non-truths, but there is some positive probability that they are usually some distance from the true mean. It is up to the analyst to convey the message around estimation.
2. Differences can be statistically significant (or not), or practically different (or not)
The slope index of inequality in life expectancy at birth for men in Oxford from the most to least deprived areas has decreased from 9.3 years to 8.8 years, or around six months of life. Is this the beginning of a downward trend? Does this mean that men in the most deprived areas can be expected to live six months longer? For many reasons, the answer to both questions is most likely no.
The reason in the context of statistical uncertainty is that once we look at the confidence intervals - the margin of error - for the recorded time periods, it is unclear whether there is a significant change over time. In other words, the confidence intervals overlap, which might mean that there is no statistical difference, although that might not necessarily be true. As statistical practitioners, it is good practice to keep an eye out for CI's before saying something about trends or differences.
3. Transparency in statistics promotes statistical literacy and builds trust, and might get people to use them more often
I was recently at a conference where the audience was shown an article on the BBC website that was based on Office for National Statistics unemployment figures. The article headline announced a fall of 3,000 in UK unemployment. We were then shown the original statistical bulletin by the ONS, where at the bottom of the bulletin read that the “small fall” had a confidence interval of plus or minus 77,000 and that the “fall” was not actually statistically significant.
“Once bitten, twice shy.” If we make genuine efforts to be transparent in our use of data by being straightforward and upfront about uncertainty, we might improve the likelihood that people use data with confidence. Even when we have to make the inevitable corrections or updates, when we are honest and open about why they have been made and take steps to avoid the inevitable, then we are helping our end users in more ways than one.
Let there be uncertainty because there is
In the day and age of big data, open data, and data protection policies, our work and learning as analysts seems to be ever-evolving. Reminders about some basic principles of data and analysis can be helpful in providing us with the structures and tools we need to grow and get on with our work.