On the second day of Christmas, my true love sent to me: Anscombe's Quartet
One of the first steps in data analysis is plot your data to get an intuitive feel for it. Jo Young and Jan Wessnitzer at the Scientific Editing Company wrote us a chapter about data visualisation in R. I love a good graph (just ask my indulgent co-authors), so I was particularly interested to read their description of Anscombe's quartet.
"Anscombe’s quartet consists of four datasets with almost identical descriptive statistics, including mean (to a minimum of 2 decimal places in the case of y) (¯x = 9.0,y¯ = 7.50), sample variance (SD(x) = 11.0,SD(y) = 4.12), correlation between x and y in each case (0.816 to 3 decimal places), and linear regression line in each case (y = 3.00 + 0.500x, to 2 and 3 decimal places respectively).
However, by graphing Anscombe’s quartet (see below), it becomes clear that a linear regression is probably a reasonable fit for set 1. However, a polynomial regression fit is more appropriate for set 2. By plotting sets 3 and 4, the effects of outliers on descriptive statistics is clearly demonstrated. In both cases, the fitted regression line is ”skewed” by a single outlier. The outliers could be genuine outliers or they could be erroneous data points, e.g., typos during data entry, but they highlight the importance of screening your data visually."
So, next time you get a net full of fresh data, graph it before you get too carried away with your number crunching!
 to at least two decimal places
Modern Statistical Methods in Human Computer Interaction (edited by Judy Robertson and Maurits Kaptein) will be published by Springer in early 2016. Here is the table of contents:
Modern Statistical Methods for HCI
1. An introduction to Modern Statistical Methods for HCI.
J. Robertson & M.C. Kaptein
Section 1: Getting Started With Data Analysis.
2. Getting started with [R]: a brief introduction
3. Descriptive statistics, Graphs, and Visualization.
J. Young & J. Wessnitzer
4. Handling missing data
T. Baguley & M. Andrews
Section 2: Classical Null Hypothesis Significance Testing done properly
5. Effect sizes and Power in HCI
6. Using R for repeated and time-series observations
D. Fry & K. Wazny
7. Non-parametric Statistics in Human-Computer Interaction
J.O. Wobbrock and M. Kay
Section 3: Bayesian Inference
8. Bayesian Inference
9. Bayesian Testing of Constrained Hypothesis
Section 4: Advanced modeling in HCI
10. Latent Variable Models
A. Beaujean & G. Morgan
11. Using Generalized Linear (Mixed) Models in HCI
12. Mixture models: Latent profile and latent class analysis
Section 5: Improving statistical practice in HCI
13. Fair Statistical Communication in HCI
14. Improving statistical practice in HCI
J. Robertson & M.C. Kaptein
Online supplementary materials