# Teaching Data Analytics from probability to prediction

Getting to teach data analysis for undergraduate and graduate students at the same time is rare. I have that opportunity this semester and am very much looking forward to it. My two courses, Civil Engineering Data Analysis (CE 264) for sophomores, and Advanced Data Analysis (CE 1101) for seniors and higher are designed to understand and model big data often encountered in engineering.

I like how CE 264 was intended to teach the concepts of probability and statistics for Civil Engineers. We start with various types of data that Civil Enginers often encounter; for instance, loadings on beams and columns and stress-strain data for structural engineers, flow data, and water quality data for water resources and environmental engineers, and ridership or traffic data for transportation engineers. We look at how one can understand the data in the context of developing models for civil engineering design. We discuss the standard parametric methods available in the textbook. We also learn non-parametric techniques which are not discussed in the book. Non-parametric methods are especially useful when identifying a probability distribution is difficult and where the sample sizes are small, as with many civil engineering data. The homework problems, lab exercises, and the project are also designed using civil engineering data from real projects. One of the unique features of this class is its computer lab time using RStudio every week. The labs are developed to better understand the concepts from the lectures through actual data and simulations.

In the graduate class, we have in-depth coverage of exploratory data analysis including sampling issues, measurement error, estimation of frequency/probability distributions, resampling, and bootstrap confidence or prediction intervals of a process. We also learn dependence measures, trends in space and time, dimension reduction techniques, and frequency domain models. Finally, we build cross-validated predictive models using linear models and basis functions and model free nonparametric nearest neighbor methods.

I am hopeful that these courses will instill in students, an interest in data analysis.

# Resolution: Data Analysis Made Easy for a Million People

I will create a platform to make data analysis easy for atleast a million people.

I am not an expert in statistics by any means. I have a Civil Engineering degree with water resources and hydroclimatology background. As a necessity, I picked up the statistics and data analysis concepts from my mentors and several experts in the field. I use them regularly, and at a fairly advanced level in my investigations and research. Over time, I have gained confidence and reasonable expertise that enables me to teach data analysis for undergraduate and graduate level students in an engineering school. I have also been successful in my teaching and somewhat popular among students. I feel well trained and equipped for this battle.

I clearly understand that there may not be a million people who are interested in data analysis. If my mission succeeds, I will have created interest in more than a million people. Whether I succeed or not, is up to Time. If I succeed, the world will be a better place with more analytical people. If I fail, atleast I will fail spectacularly.

Over the next few blog posts, I will reveal more details about the platform.