*On the second day of Christmas, my true love sent to me: Anscombe's Quartet*

One of the first steps in data analysis is plot your data to get an intuitive feel for it. Jo Young and Jan Wessnitzer at the Scientific Editing Company wrote us a chapter about data visualisation in R. I love a good graph (just ask my indulgent co-authors), so I was particularly interested to read their description of Anscombe's quartet.

"Anscombe’s quartet consists of four datasets with almost identical^{[1]} descriptive statistics, including mean (to a minimum of 2 decimal places in the case of *y*) (¯*x *= 9*.*0*,y*¯ = 7*.*50), sample variance (*SD*(*x*) = 11*.*0*,SD*(*y*) = 4*.*12), correlation between x and y in each case (0.816 to 3 decimal places), and linear regression line in each case (y = 3.00 + 0.500x, to 2 and 3 decimal places respectively).

However, by graphing Anscombe’s quartet (see below), it becomes clear that a linear regression is probably a reasonable fit for set 1. However, a polynomial regression fit is more appropriate for set 2. By plotting sets 3 and 4, the effects of outliers on descriptive statistics is clearly demonstrated. In both cases, the fitted regression line is *”skewed” *by a single outlier. The outliers could be genuine outliers or they could be erroneous data points, e.g., typos during data entry, but they highlight the importance of screening your data visually."

So, next time you get a net full of fresh data, graph it before you get too carried away with your number crunching!

[1] to at least two decimal places

Modern Statistical Methods in Human Computer Interaction (edited by Judy Robertson and Maurits Kaptein) will be published by Springer in early 2016. Here is the table of contents:

Modern Statistical Methods for HCI

Preface

1. An introduction to Modern Statistical Methods for HCI.

J. Robertson & M.C. Kaptein

Section 1: Getting Started With Data Analysis.

2. Getting started with [R]: a brief introduction

L. Ippel.

3. Descriptive statistics, Graphs, and Visualization.

J. Young & J. Wessnitzer

4. Handling missing data

T. Baguley & M. Andrews

Section 2: Classical Null Hypothesis Significance Testing done properly

5. Effect sizes and Power in HCI

K. Yatani

6. Using R for repeated and time-series observations

D. Fry & K. Wazny

7. Non-parametric Statistics in Human-Computer Interaction

J.O. Wobbrock and M. Kay

Section 3: Bayesian Inference

8. Bayesian Inference

M. Tsikerdekis

9. Bayesian Testing of Constrained Hypothesis

J. Mulder

Section 4: Advanced modeling in HCI

10. Latent Variable Models

A. Beaujean & G. Morgan

11. Using Generalized Linear (Mixed) Models in HCI

M.C. Kaptein

12. Mixture models: Latent profile and latent class analysis

D. Oberski

Section 5: Improving statistical practice in HCI

13. Fair Statistical Communication in HCI

P. Dragicevic

14. Improving statistical practice in HCI

J. Robertson & M.C. Kaptein

Online supplementary materials

## Comments