Life Outside the Normal Distribution

Humans like regularity. Our brains drive us to seek out that which is predictable. If we could have our way, then everything would fall…

Cole

Cantor’s Paradise

· ~11 min read · August 22, 2023 (Updated: August 29, 2023) · Free: No

Humans like regularity. Our brains drive us to seek out that which is predictable. If we could have our way, then everything would fall into neat categories. Unfortunately, the universe does not work this way. Nature is chaotic and scattered. Scientists must constantly work with randomness and try to extract the order. In our desire for consistency, humans often find patterns that are not there.

During the Scientific Revolution, many people studying nature became aware of its inherent uncertainty. This led to the development of probability theory and eventually modern statistics. However, uncertainty is almost always harder to deal with than we think. Despite making considerable progress in dealing with the unknown, we still tend to gravitate toward methods that provide clean and straightforward solutions.

In this article, I am going to talk about the most common technique for dealing with randomness: the normal distribution. While the normal distribution certainly has its uses, it has become far more widespread than its accuracy warrants. I am also going to give multiple examples of where this method goes wrong and provide some alternatives.

The Normal Distribution

The Normal Distribution with parameters varied (Source)

If you ever took a statistics course, then you've certainly heard of the normal distribution. It is the ideal probability distribution for beginners since it can be described so simply. You only need the know the mean (μ) and standard deviation (σ). The mean tells us the average measurement or the peak of the curve, and the standard deviation tells us how "spread out" the data are.

If you've never seen a probability distribution before, it can be a little confusing. The x-axis just tells us the thing being measured, like human height, and the y-axis is the chance of that height occurring. The total area under the curve, otherwise known as the integral, must equal 1 to represent the total range of probabilities.

We can apply it to the height distribution of people which is known to follow the normal distribution fairly well. The mean height of men in the United States is 70 inches with a standard deviation of 3 inches. If I wanted to know the percentage of men with a height of 74 inches, then I would integrate under this curve (giving a little range on either side).

The Probability Distribution of Height (Made with Desmos)

To get this value, I need to integrate the region shown in red. Luckily, this is not too difficult. The normal distribution has a precise equation that was first described by Carl Friedrich Gauss in the early 1800s (shown below). It is a somewhat complicated equation but possible to integrate.

The Normal Distribution Equation (Image by Author)

Sometimes standard deviation is replaced with variance (σ²) like in the example plots shown above. It is a fascinating mathematical fact that the number π shows up in this equation!

Using the statistics above, and the fact that height follows a normal distribution, I calculated that around 5% of men in the United States are 74 inches tall. The ease of making these kinds of estimates is what makes the normal distribution so appealing. It is not difficult for me to compare different percentages and integrate them over a range of measurements. Also appealing is that the distribution only requires two quantities to define.

The normal distribution is a great way to represent a variety of natural phenomena. Just like height, the sizes of many different species also follow the normal distribution. Many idealized random number generators like dice also follow this pattern. In physics, the diffusion of particles is approximately normal. We usually assume that uncertainty due to measurement error follows the normal distribution. While it can be applied to a variety of things in nature, there is one more application of the normal distribution which has contributed to its success.

The Central Limit Theorem

This is perhaps one of the most astonishing theorems in mathematics. It has to do with a bunch of averages taken from the same dataset. Say you take multiple sets of measurements (from the same population) and calculate their averages to get μ₁, μ₂, μ₃, … which is all the calculated means of each set. Then, these averages will form a normal distribution! This will happen regardless of the variable you are measuring! Even if it is not normally distributed!

Pierre-Simon Laplace, who popularized the Central Limit Theorem (Source)

This theorem is part of the reason why the normal distribution has become so popular. If you take a variety of samples from a dataset and then perform statistics on their averages, you can assume the normal distribution. Of course, there are some big caveats. Depending on the data, it can take many sets of measurements before the distribution of means begins to appear normal. It is a common benchmark in statistics that the minimum sample size must also be 30. You also must have independent samples, meaning they are not affected by each other. This is a very difficult criterion for some fields of science.

The independence requirement is a big one. The Central Limit Theorem has some philosophical interpretations. It claims that in a system with many small effects that are uncorrelated with each other, the resulting range of values is normally distributed. There are some systems that follow this property, like an idealized box of particles bumping into each other or many different genes contributing to one output quantity (like height).

The central limit theorem is clearly a very powerful statement about probability and randomness. Its discovery catapulted the normal distribution into the mainstream where it has remained ever since. Unfortunately, the assumptions required for the central limit theorem to be applicable are rarely met in the field. Many scientists also forget about the fact that it only applies to the averages of multiple datasets, and they just assume that their data is normal. Again, the simplicity and ubiquity of the normal distribution is hard to resist.

The Rest of Everything

You may have noticed in my earlier list describing applications of the normal distribution, it was rather limited. This is because most observed quantities just simply do not follow this pattern! Indeed, the vast majority of measurements fall into the pattern of another variety of probability distributions of which there are many.

I stated that the normal distribution arises in a situation when many small independent factors determine an outcome. In our complex and interconnected world, this is an extremely rare occurrence. Reality is typically formed by a massive amount of interactions that are not independent and have nonlinear effects. Many quantities observed in both the natural world and human activity show large impacts of just a few outliers.

The most common way to describe these types of distribution is with something called a "power law." This distribution was first brought to the mainstream by Vilfredo Pareto, an Italian economist who lived in the early 20th century. He observed that 20% of people in Italy owned 80% of the land, a massive imbalance in ownership! This has been dubbed the 80–20 rule and is a good heuristic for thinking about power law distributions.

An example power law, colored by the 80–20 rule (Source)

As you can see, this structure is vastly different that the normal distribution. Power laws seem to be a trademark of complex, interconnected systems. Unlike the world of normal distributions in which all effects are independent and small, power laws live in the world of interaction. As humanity's communication and economics become more entwined, the importance of the power law is only going to increase.

The equation for a power law is implied in its name, the probability of x is given by x raised to a power denoted as -k. There is usually also a multiplier in front to convert it into the proper scale which is given by a.

The general power law equation (Image by Author)

It is astonishing how many things follow the power law distribution. The relationship between animal metabolism and size, called Kleiber's Law, follows this distribution. Cyclone sizes follow the power law. Both the size of cities and incomes are distributed according to a power law. Power laws are all over physics, like the Stefan-Boltzmann law describing the probability of thermal radiation at different energy levels.

The distinctive feature of power laws is that they are "heavy-tailed." This means that outliers are more common than in other distributions (although still rare). In a normal distribution, events more than 5 standard deviations away from the mean are extremely rare, almost to the point of being impossible. This is not so in power law distributions, where the presence of outliers is more common. Only a distribution that has this property could explain the presence of billionaires in the modern economy.

In the world of power laws, defining an "average" is difficult and doesn't always tell you much. The average salary of five employees making minimum wage and Bill Gates is not a very interesting value. It is also often difficult to determine the variable k for a given distribution, this is because fitting data with an exponent is hard! We have to move past the simplicity of the normal distribution to describe the real world more accurately.

Just like the normal distribution, there are caveats here. Most of the phenomena described above only follow a power law within a range of values, and they often break down at very large and small values due to physical constraints. There are also many distributions that are similar to a power law, but it can be hard to distinguish between them.

More recently, some research has been published suggesting that the importance of power laws is overstated. Some quantities instead follow the log-normal distribution, shown below.

The Log-Normal Distribution (Made with Desmos)

This structure can be thought of as a mix between the power law and the normal distribution. It shows up in many interesting areas such as the length of internet comments, time taken in a chess game, the concentration of rare elements in minerals, and the number of citations on a paper.

The log-normal distribution has a similar philosophy to that of the normal distribution, with a few caveats. It usually comes about when many independent, positive values are multiplied together. The average number and spread of these values determine the shape of the curve. The story only gets more complicated from here!

The Wild West of Probability Distributions

As you can probably guess, there are many more distributions to choose from. The Wikipedia page lists nearly 100 different types, and each distribution has multiple subtypes. So how do you know which one you are supposed to use?

The Poisson Distribution can be used to describe events per time interval (Made with Desmos)

The best way to begin this process is to first write down some properties about your data. Is it discrete or continuous? Is it skewed one way or the other? Is there a natural limit (like 0) to my data?

Then, go through a list of possible distributions and identify which ones match the data you are working with. I've provided some links at the end of this article which are easier to look through than the Wikipedia page described earlier. Some of them will visually resemble your data as well. However, even if they look similar, you aren't done yet!

Like many things in statistics, you next need to perform a test. Perhaps the most common way to test if a distribution represents your data is the Kolmogorov–Smirnov test. It will output the likelihood that your data and the distribution match each other. To make matters more confusing, there are multiple types of this test that apply to different situations.

You can see how complicated this is becoming. No wonder many often just default to using the normal distribution! However, it is so important that we move beyond this one description of the world around us. Things are so much more complicated than that. While it may seem daunting to work through all the additional steps, you will be rewarded with a more accurate and interesting description of your data.

Going Further

Probability is hard! (Source)

I hope you learned something about probability and the interconnected world in which we live. While the normal distribution certainly has its uses, it often overshadows the interconnected and nonlinear universe in which we live. It is important for us to question the statistical tools we use and how they can impact our assumptions about reality. If you are interested in going further, I have provided a few resources to learn more about the topics described in this article.

The book that first got me interested in this topic is The Black Swan by Nassim Taleb. It's a fascinating book about the dangers of assuming a normal distribution, especially in the world of finance. Taleb's style can come off as unnecessarily arrogant and blunt, but if you can get past that this book is full of good information.
If you want a comprehensive overview of why the normal distribution is dangerous from a philosophical and mathematical perspective, I highly recommend this article called "Why Are Normal Distribution Normal?" It can be technical at times, but well worth the read.
My favorite way to look through different probability distributions is this map of Univariate Distributive Relationships. It provides a map of 79 different equations and lists the properties of each one. You can easily sort by different properties (you may need to look up some of their names)
If you want more information about choosing the correct probability distribution, I really like this book chapter about it. It provides the equations and properties of many distributions, although it, unfortunately, is lacking graphs! This alternative website has some good information with graphs as well.
This website has a good guide to the Kolmogorv-Smirnov test which has a lot of information. The other pages are also very helpful when doing statistics! Further links at the bottom of that link describe other types of fit-tests and when to use them. If you want a more casual description, this Stack Exchange response is fantastic!

If you liked this article, then consider clapping for it! You also may also consider following me for more stories like this! I publish weekly about math and science.

#science #data-science #mathematics #statistics #data

< Go to the original