Exploratory Data Analysis (EDA)

Math 4027/5027 - Fall 2004

 

How do you analyze data?  When faced with data from various sources, of various types, what questions should one ask, and what clues can we find in the data to further our understanding? 

 

Statistics, broadly defined, is the science of and art of analyzing data.  Many of the courses assume formal probability model structures with parameters, and statistical methods offer tools for estimating those model parameters.  Sometimes the assumptions governing those models can be verified.  But, if they cannot be, or if we desire analyses that are less sensitive to these assumptions, what should we do?

 

Exploratory data analysis (sometimes called 'Data Mining') is a philosophy of analyzing data.  In this course, we will learn many different tools for data analysis.  Some knowledge of quantitative methods is useful.  Those who have had formal statistics courses can take the course at a higher level, where connections between EDA tools and mathematical statistical methods will be developed.  This course is valuable to anyone who has data to analyze.  It is also a lot of fun; students learn a lot.  No formal statistical methods pre-requistites are required, though at least one prior course is highly beneficial.

 

Course objectives:

Introduce philosphy of exploratory data analysis Teach tools for the analysis of data Provide opportunties for analyzing data Demonstrate the value of oral/written communication skills Offer experience in preparing oral and written reports of data analyses

 

Time: Monday and Wednesday, 2:30-3:45

 

Texts:

D.C. Hoaglin, F. Mosteller and J.W. Tukey,

   Understanding Robust & Exploratory Data Analysis F. Mosteller and J.W. Tukey,

   Data Analysis and Regression: A Second Course in Statistics

 

Topics:

 

 The philosophy of exploratory versus confirmatory data analysis  Summarizing batches of data: Stem-and-leaf diagrams, boxplots, qq plots  Data Transformations (ladder of re-expressions)  Jackknife and bootstrap  Two-way and three-way analyses (median polish)  Standardization  Fitting robust-resistant lines (least absolute deviations)  Analyzing count data

 

Instructor:

Professor Karen Kafadar, 303-556-2547, kk@math.cudenver.edu