If you find this useful, please like, share and subscribe to my data analysis classroom. You can also follow me on Twitter @realDevineni for updates on new lessons.

After last week’s conversation with Devine about Gamma distribution, an inspired Joe wanted to derive the probability density function of the Gamma distribution from the exponential distribution using the idea of convolution.

But first, he has to understand convolution. So he called upon Devine for his usual dialog.

J: Hello D, I can wait no longer, nor can I move on to a different topic when this idea of convolution is not clear to me. I feel anxious to know at least the basics that relate to our lesson last week.

D: It is a good anxiety to have. Will keep you focused on the mission. Where do we start?

J: We are having a form of dialog since the time we met. Why don’t you provide the underlying reasoning, and I will knit the weave from there.

Their discussion continues in Lesson 46. Devine explains the basics of convolution and Joe applies it to derive the pdf of Gamma distribution.

If you find this useful, please like, share and subscribe to my data analysis classroom. You can also follow me on Twitter @realDevineni for updates on new lessons.

On the x-axis, I am showing 80 years starting with some reference. On the y-axis, I am plotting the counts or the number of extreme rainfall days each year.

Do you think there is an increasing trend is this data? Is there an overall increase in the number of extreme rainfall days?

Perhaps. Let’s add a trend line and see.

Once we superimpose a linear trend line on the plot, it becomes apparent. We still see variability from year to year, but in the long run, we see that the number of extreme rainfall days each year is increasing.

Can you guess why there is a trend in the data? In other words, can you think of some reasons for this trend?

Wait. Hold on to your reasons till I unveil how I created this data.

Look at this chart. It is time vs. a constant number of extreme rainfall events, 10, in each year.

Now, look at this chart which is a pure sine wave with a periodicity of 5 years and amplitude of 1.

Another sine wave with a periodicity of 10 years and amplitude of 1.

This one is with a periodicity of 20 years and amplitude of 1.

And, finally, take a look at this sine wave with a periodicity of 100 yearsand amplitude of 1.

If we add these sine waves and the constant plus some noise to the resultant number we get the original data.

Did you see the pattern? The trend we initially observed is due to a combination of four different periodic sine waves. Were these periodic oscillations in your reasons?

If not, why not?

Saman Armal, a Ph.D. student in the Department of Civil Engineering and NOAA CREST at the City College of New York, CUNY, working on extreme rainfall events, was also asking this question.

“We find trends in the data. What can we attribute these trends to?”

We started with anthropogenic influence, but, anthropogenic forcings cannot solely explain the trend. Climate has a cyclical nature. In a particular region, its manifestation can be entirely different for a given decade or century.

For instance, if we suppose that rainfall in a given area is influenced by interannual to decadal to multidecadal climate oscillations (like the periodic sine waves we saw before), any given decade or a block of time can manifest as runs of wet or dry years.

If the region has observed records long enough to capture these cyclicities, periods of wet years will be transposed by periods of dry years and the resulting long-term time trend as a result of climate cycles in rainfall will be nonexistent. On the contrary, if the region has limited observed records, one can detect a long-term increasing or decreasing trend in the data depending on whether the climate is manifested as wet or dry years.

The effect of natural climate variability in rainfall patterns including the impact of El Niño–Southern Oscillation (ENSO), the interdecadal Pacific oscillation (IPO), the Pacific decadal oscillation (PDO), the North Atlantic Oscillation (NAO), and the Atlantic multidecadal oscillation (AMO) is well documented. Hence, we wanted to understand the influence of anthropogenic forcing and natural climate variability on the occurrence of extreme events in an integrated framework.

This objective motivated Armal’s recent work which got published in the Journal of Climate. The paper provides a hypothesis-driven methodology to understand the association of trends in extreme rainfall event frequency to anthropogenic forcing and natural climate variability over the contiguous United States.

In our analysis, we consider two hypotheses:

The monotonic trend in the annual frequency of extreme rainfall events is solely attributed to anthropogenic forcing, and

The monotonic trend in the annual frequency of extreme rainfall events is attributed to anthropogenic forcing and cyclical climate variability.

The models get information from global near-surface temperature and climate indices, and the residual trends for each hypothesis are examined. The choice of the best alternative hypothesis is made based on the Watanabe–Akaike information criterion, a Bayesian pointwise predictive accuracy measure.

Statistically significant time trends are observed in 742 of the 1244 stations in the continental United States. Trends in 409 of these stations, predominantly found in the U.S. Southeast and Northeast climate regions can be attributed to changes in global surface temperature anomalies. The trends in 274 of these stations, mainly found in the U.S. Northwest, West and Southwest climate regions can be attributed to El Niño–Southern Oscillation, the North Atlantic Oscillation, the Pacific decadal oscillation, and the Atlantic multidecadal oscillation along with changes in global surface temperature anomalies.

Please read the paper and let us know what you think. You can get the paper from AMS website here. If you need a copy of it, please write to me. I will be happy to share. We welcome any comments and critics.

In lesson 45, Joe and Devine meet again, for the eighth time, to discuss Gamma distribution.

Joe summarizes how to derive the probability density function for the exponential distribution. He identifies that it is the continuous analog of the Geometric distribution.

Being a curious kid, he asks the right question.

Does the exponential distribution also have a related distribution that measures the wait time till the ‘r’th arrival?

Devine says that there is a related distribution that can be used to estimate the time to the ‘r’th arrival. It is called the Gamma distribution.

They both discuss how to derive the probability density function for the Gamma distribution using convolution.

The Gamma distribution has two control parameters, the the scale parameter (lambda) and the shape parameter (r).

Gamma distribution is frequently used to fit data with significant skewness such as the rainfall and insurance claims data.

If you find this useful, please like, share and subscribe to my data analysis classroom. You can also follow me on Twitter @realDevineni for updates on new lessons.

If you find this useful, please like, share and subscribe to my data analysis classroom. You can also follow me on Twitter @realDevineni for updates on new lessons.

If you find this useful, please like, share and subscribe to my data analysis classroom. You can also follow me on Twitter @realDevineni for updates on new lessons.

The function I asked you to solve last week is a Beta distribution function. It is defined in the 0 to 1 range. The beta distribution is a bounded distribution. The function is 0 everywhere else.

In lesson 42, we learn the family of Beta distribution and how it relates to the uniform distribution. We solve the function from last week which leads us to the basics of Beta distribution.

Learn more here. There is also a cool animation to make things clear.

If you find this useful, please like, share and subscribe to my data analysis classroom. You can also follow me on Twitter @realDevineni for updates on new lessons.

The characteristic feature in all the discrete distributions is that the random variable X is discrete. The possible outcomes are distinct numbers, which is why we called them discrete probability distributions.

Have you asked yourself, “what if the random variable X is continuous?” What is the probability that X can take any particular value x on the real number line which has infinite possibilities?

For a continuous random variable, the number of possible outcomes is infinite, hence,

P(X = x) = 0.

For continuous random variables, the probability is defined in an interval between two values. It is computed using continuous probability distribution functions.

If you find this useful, please like, share and subscribe to my data analysis classroom. You can also follow me on Twitter @realDevineni for updates on new lessons.

Today’s temperature in New York is below 30F — a cold November day.

Do you want to know what the probability of a cold November day is?

Do you want to know what the return period of such an event is?

Do you want to know how many such events happened in the last five years?

Get yourself some warm tea. Let the room heater crackle. We are diving into rest of the discrete distributions in R. The lesson with complete code is here. Happy coding.

If you find this useful, please like, share and subscribe to my data analysis classroom. You can also follow me on Twitter @realDevineni for updates on new lessons.

Today’s lesson includes a journey through Bernoulli trials and Binomial distribution in R.

I use data from New York City’s parking violations. Since we are learning discrete probability distributions, the violation tickets data can serve as a neat example.

We also learn how to create GIFs in R. We first save the plots as “.png” files and then combine them into a GIF using the “animation” and “magick”packages.

The lesson with complete code is here. Happy coding.