EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. Exploratory data analysis, robust statistics, nonparametric statistics, and the development of statistical programming languages facilitated statisticians’ work on scientific and engineering problems. Such problems exploratory data analysis tukey pdf download the fabrication of semiconductors and the understanding of communications networks, which concerned Bell Labs.
They are also being taught to young students as a way to introduce them to statistical thinking. There are a number of tools that are useful for EDA, but EDA is characterized more by the attitude taken than by particular techniques. Findings from EDA are often orthogonal to the primary analysis task. To illustrate, consider an example from Cook et al where the analysis task is to find the variables which best predict the tip that a dining party will give to the waiter.
The primary analysis task is approached by fitting a regression model where the tip rate is the response variable. However, exploring the data reveals other interesting features not described by this model. The distribution of values is skewed right and unimodal, as is common in distributions of small, non-negative quantities. An interesting phenomenon is visible: peaks occur at the whole-dollar and half-dollar amounts, which is caused by customers picking round numbers as tips. This behavior is common to other types of purchases too, like gasoline.