Exploratory Data Analysis
(EDA)
Math 4027/5027 - Fall 2004
How do you analyze
data? When faced with data from various
sources, of various types, what questions should one ask, and what clues can we
find in the data to further our understanding?
Statistics, broadly defined,
is the science of and art of analyzing data.
Many of the courses assume formal probability model structures with
parameters, and statistical methods offer tools for estimating those model
parameters. Sometimes the assumptions
governing those models can be verified.
But, if they cannot be, or if we desire analyses that are less sensitive
to these assumptions, what should we do?
Exploratory data analysis
(sometimes called 'Data Mining') is a philosophy of analyzing data. In this course, we will learn many different
tools for data analysis. Some knowledge
of quantitative methods is useful. Those
who have had formal statistics courses can take the course at a higher level,
where connections between EDA tools and mathematical statistical methods will
be developed. This course is valuable to
anyone who has data to analyze. It is
also a lot of fun; students learn a lot.
No formal statistical methods pre-requistites
are required, though at least one prior course is highly beneficial.
Course objectives:
Introduce philosphy of exploratory data analysis Teach tools for the
analysis of data Provide opportunties for analyzing
data Demonstrate the value of oral/written communication skills Offer
experience in preparing oral and written reports of data analyses
Time: Monday and Wednesday,
2:30-3:45
Texts:
D.C. Hoaglin,
F. Mosteller and J.W. Tukey,
Understanding Robust & Exploratory Data
Analysis F. Mosteller and J.W. Tukey,
Data Analysis and Regression: A Second
Course in Statistics
Topics:
The philosophy of exploratory versus
confirmatory data analysis
Summarizing batches of data: Stem-and-leaf diagrams, boxplots, qq plots Data Transformations (ladder of
re-expressions) Jackknife and
bootstrap Two-way and three-way analyses
(median polish) Standardization Fitting robust-resistant lines (least
absolute deviations) Analyzing count
data
Instructor:
Professor Karen Kafadar,
303-556-2547, kk@math.cudenver.edu