Teaching Data Analytics from probability to prediction

Getting to teach data analysis for undergraduate and graduate students at the same time is rare. I have that opportunity this semester and am very much looking forward to it. My two courses, Civil Engineering Data Analysis (CE 264) for sophomores, and Advanced Data Analysis (CE 1101) for seniors and higher are designed to understand and model big data often encountered in engineering.

I like how CE 264 was intended to teach the concepts of probability and statistics for Civil Engineers. We start with various types of data that Civil Enginers often encounter; for instance, loadings on beams and columns and stress-strain data for structural engineers, flow data, and water quality data for water resources and environmental engineers, and ridership or traffic data for transportation engineers. We look at how one can understand the data in the context of developing models for civil engineering design. We discuss the standard parametric methods available in the textbook. We also learn non-parametric techniques which are not discussed in the book. Non-parametric methods are especially useful when identifying a probability distribution is difficult and where the sample sizes are small, as with many civil engineering data. The homework problems, lab exercises, and the project are also designed using civil engineering data from real projects. One of the unique features of this class is its computer lab time using RStudio every week. The labs are developed to better understand the concepts from the lectures through actual data and simulations.

In the graduate class, we have in-depth coverage of exploratory data analysis including sampling issues, measurement error, estimation of frequency/probability distributions, resampling, and bootstrap confidence or prediction intervals of a process. We also learn dependence measures, trends in space and time, dimension reduction techniques, and frequency domain models. Finally, we build cross-validated predictive models using linear models and basis functions and model free nonparametric nearest neighbor methods.

I am hopeful that these courses will instill in students, an interest in data analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *