Teaching Data Analytics from probability to prediction

Getting to teach data analysis for undergraduate and graduate students at the same time is rare. I have that opportunity this semester and am very much looking forward to it. My two courses, Civil Engineering Data Analysis (CE 264) for sophomores, and Advanced Data Analysis (CE 1101) for seniors and higher are designed to understand and model big data often encountered in engineering.

I like how CE 264 was intended to teach the concepts of probability and statistics for Civil Engineers. We start with various types of data that Civil Enginers often encounter; for instance, loadings on beams and columns and stress-strain data for structural engineers, flow data, and water quality data for water resources and environmental engineers, and ridership or traffic data for transportation engineers. We look at how one can understand the data in the context of developing models for civil engineering design. We discuss the standard parametric methods available in the textbook. We also learn non-parametric techniques which are not discussed in the book. Non-parametric methods are especially useful when identifying a probability distribution is difficult and where the sample sizes are small, as with many civil engineering data. The homework problems, lab exercises, and the project are also designed using civil engineering data from real projects. One of the unique features of this class is its computer lab time using RStudio every week. The labs are developed to better understand the concepts from the lectures through actual data and simulations.

In the graduate class, we have in-depth coverage of exploratory data analysis including sampling issues, measurement error, estimation of frequency/probability distributions, resampling, and bootstrap confidence or prediction intervals of a process. We also learn dependence measures, trends in space and time, dimension reduction techniques, and frequency domain models. Finally, we build cross-validated predictive models using linear models and basis functions and model free nonparametric nearest neighbor methods.

I am hopeful that these courses will instill in students, an interest in data analysis.

When, where and how

to make optimal decisions is what my students from CE 316, Civil Engineering Decision and Systems Analysis learned during Fall 2016. They are now proficient in linear, nonlinear, integer, mixed integer and multiobjective programming/optimization. They also know how to solve network problems like shipping goods, shortest paths, and maximizing flow. They can use the same network structures to make sequential decisions under uncertainty and optimally schedule construction jobs and complete them under budget, ahead of schedule. They know the time value of money and can tell you if a project is beneficial or not in the long run. I am happy to present a few snippets of their term projects, which were identified independently and completed successfully with minimum supervision.

Bidding for Projects: Is your company in constant confusion on what projects to bid for? Are you worried that the projects cannot be completed on time? Soon to be engineers from CCNY have a solution for your problem. Based on the planned duration of any project, they can help you select appropriate and optimal number of projects that will maximize your expected profit under various uncertainties.

Domestic and International Procurements: Do you know which are the best source companies that can supply required quality material (construction or otherwise) at the least cost? Do you want to hire a third party to verify the quality of the material? Don’t worry. We have a sequential decision software to help you pick the best company to procure material from and an associated testing company for quality control.

Operating Water System: Whether you are living in New York City, or in the Catskill area, you can relax, sit back and enjoy the best quality water, even during a drought. Our specialists are at work in satisfying all our competing needs.

CCNY is Starving: With limited food options around the campus, have your ever wondered what to eat to stay healthy and get enough calories to complete the homework, all at a low price? You can do it under $10 per day.

Meal Plan: Are you a high school in the city? Do you know if your daily meal plan is the best? We can give you a nutrition optimized meal plan for high school lunches based on federal regulations and food items approved by the New York City Department of Education.

Where is that food coming from: Do you know what places are best for producing various crops under climate, water, economic and market limitations? Whether you are a farmer or a public planner making water, agriculture, energy policy decisions, we can make your life easier by providing this knowledge in an adaptive framework.

Our secret weapon: Two other secret projects are underway for our design competitions. I will reveal the details of these weapons when we win the competition next semester.


Resolution: Data Analysis Made Easy for a Million People

I will create a platform to make data analysis easy for atleast a million people.

I am not an expert in statistics by any means. I have a Civil Engineering degree with water resources and hydroclimatology background. As a necessity, I picked up the statistics and data analysis concepts from my mentors and several experts in the field. I use them regularly, and at a fairly advanced level in my investigations and research. Over time, I have gained confidence and reasonable expertise that enables me to teach data analysis for undergraduate and graduate level students in an engineering school. I have also been successful in my teaching and somewhat popular among students. I feel well trained and equipped for this battle.

I clearly understand that there may not be a million people who are interested in data analysis. If my mission succeeds, I will have created interest in more than a million people. Whether I succeed or not, is up to Time. If I succeed, the world will be a better place with more analytical people. If I fail, atleast I will fail spectacularly.

Over the next few blog posts, I will reveal more details about the platform.

‘Dam’n Floods

December 2016: San Francisco – Kary, who I shared an Uber ride with, thinks that Pineapple Express is a funny name for a storm.

November 2016: Vietnam – Ha Ting, Quang Tri and Quang Binh provinces that experienced rainfall in October, are hit by another wave of heavy rainfall events. Tens of thousands of people displaced.

October 2016: Argentina – Long-term flooding in Buenos Aires affects rural and farming areas. Persistent rainfall leads to an overflowing Quinto River. Agricultural emergency declared.

July 2016: China – Yangtze River overflows. Around 40,000 houses destroyed. More than 1.5 million hectares of cropland damaged.

June 2016: Texas – Heavy rain has increased river levels. President Declares Disaster for 12 counties.

A common thread in all these events is that the floods lasted for more than 30 days and are associated with repeated rainfall into the region. These are colloquially called long duration floods. Understanding the causes of these types of floods and using that information for managing water infrastructure is a recent area of research.

Nasser Najibi is working in this field and has recently published an article in Advances in Water Resources Journal on the atmospheric teleconnections of long duration floods. Large dams along the main stem of the Missouri River Basin are selected for this investigation. For each dam, we differentiate long duration floods from short duration floods and identify what hydrological, climatological and atmospheric conditions cause the long duration floods. Nasser derived a precursor index that shows an incipient condition for long duration floods. There is an organized atmospheric structure (spatial arrangement of high-pressure nad low-pressure areas) that draws the storm tracks repeatedly into the region causing recurrent rainfall events. These repeated waves of rainfall events fill up the dams and cause river overflows. We are now developing reservoir operation models using this prognostic information for managing flood hazards better . More information can be found in the journal article. We welcome any comments.

Oh, and Pineapple Express is not just a 2008 comedy film or a funny name for a storm. It is also a common term for a strong and persistent flow of atmospheric moisture that causes heavy precipitation in mid-latitudes. Its discovery initiated this new field of climate-informed flood risk research.

No Rate Hike Before Elections

At 2 pm today, the Federal Reserve will come out with their statement on economic projections and Chairwoman Janet Yellen will hold a press conference. Investors are keenly waiting for FED’s signal on the rate hike and borrowing costs for the near to long term. Janet Yellen will come out and speak for “hike”, but they will not raise the rates now because of the proximity to the election. Any rate hike now will only lead to pricking the stock market bubble, and that will look bad. Nobody wants to see a repeat of 2008 pre-election months. So that puts the hike off the table at least till 2017. Who knows if she will continue as the chairwoman after the election. But the reality is that the rates have to be higher. Prepare yourself.

The History of New York City Water

pepactonArun Ravindranath has published his work on the history of New York City Water supply and the Delaware River Basin Compacts in Water Policy Journal. His work is focused on understanding water risks and how the reservoir systems perform under changing climate and political and institutional constraints. He is developing a framework to assess the dynamics of natural and human systems to inform water allocations and policy. We welcome any comments. Here is a quick summary of the work.

The Delaware River is the longest continuous river in the Eastern United States. The river basin encompasses four states, New York, New Jersey, Pennsylvania and Delaware, covers roughly 13,000 square miles, and supplies more than 15 million people with water for drinking, agriculture and industrial use. The Delaware water release policies are constrained by the dictates of two U.S. Supreme Court Decrees, 1931 and 1954, and the need for unanimity among four states and New York City. Critical stakeholder groups include New York City, a variety of environmental interests, and key water organizations from the four states. The reliance of several entities on upstream water sources has led to competing interests, conflicts, and disputes over the years. Arun, through this investigation, has explored important changes in the allocation rules, key implementation issues surrounding drinking water supply and environmental impacts on the downstream ecosystem, wildlife, and fisheries, and provided context for social value changes.

Image - courtesy of nyc.gov.

A New Demand Drought Index


Elius Etienne has published his work on droughts in Journal of Hydrology. He led the entire project from data collection on agriculture, climate and water use, to quality control, to developing the drought indices and validating them. The indices that he designed are an improvement over the standardized indices which do not consider water demand. This Demand-Sensitive Drought Index can be used with aggregate demand (like all agriculture) or can be utilized as a disaggregated index for a particular sectors’ water demand. He also derived drought resilience and recovery estimates for the United States. In the context of the current droughts in various parts of the country, this work can aid the policy experts in mapping the potential duration, severity, and recovery of the drought to proposed changes in demand such as agricultural water use changes and domestic supply restrictions. He has developed a website for sharing these findings. The Project App provides the background, databases and the tools and simulation modules for public understanding. He also created a quick five-minute audio slideshow on the demand sensitive drought index and its utility.