Demystifying 3 Mathematical Laws Every Data Science Professional Needs to Know

Implementing mathematics from scratch is an ideal way to understand how they work. Mathematics is the core foundation to getting your career started in data science.

The knowledge of mathematics will depend on the role you’ve chosen in the data science field. However, every data science professional needs to have an in-depth understanding of statistics and probability theory.

Perhaps your next question might be, how about the other type of mathematics, don’t we need it?

The answer is simple, it all depends on how much machine learning research you’ll be getting yourself involved with. Also, such questions have no direct answer to them. The data science field composes of multiple job roles, and each role has its set of mathematics requirements.

For instance, if your role is inclined toward developing ETL pipelines or creating data infrastructures then perhaps you might not need math at all. However, if the role is more inclined toward implementing machine learning and deep learning techniques, you should master mathematic concepts such as vector calculus, linear algebra, probability theory, and more.

Moving further, this post will talk about three major mathematical laws every data scientist must know. Let’s dive right into it.

  1. The Law of Large Numbers (LLN)

The law of large numbers (LLN) is one major theorem in probability theory.

Here’s a simple definition –

If by any chance the number of trails of any such random process expands, the average result is likely to get closer to the value expected.

Example 1:

A classic example of a random variable done with Bernoulli Distribution (source: Kelly).

For instance, we are flipping a fair coin and are concerned with how many times the coin lands on heads. We can define the random variable X by

X = 1, if the coin lands on heads

       0, if the coin lands on tails

Example 2:

If rolling a dice, the possibility of expecting the 6-sided dice will be 1, 2, 3, 4, 5, and 6. This simply means, there are 3.5 chances of getting the 6-sided of the dice. While the dice are rolled, the expected number would be between 1 to 6. However, as we keep rolling the dice more than once, the expected value which is 3.5 gets closer to the result. This is what the Law of Large Numbers indicate.

Having an in-depth understanding of the LNN is significant for a data science specialist.

  • Benford’s Law or Law of the First Digit

Benford’s law is also referred to as the first-digit law, Newcomb–Benford law, and the law of anomalous numbers. This law closely speaks about the first digits in a real-world dataset.

Simply said, looking for the first digits or numerals of numbers given in a series of records (most varied sources) does not have a uniform distribution, but rather they are arranged in such a way wherein the digit “1” is likely to be the most frequent. This follows by 2, 3, or more in a decreasing manner to 9.

The Benford law can be applied in multiple scenarios and to a wide variety of data sets such as stock prices, street addresses, electricity bills, death rates, length of rivers, and population numbers.

  • Zipf’s Law

The Zipf’s Law emerged from a linguist by the name of George Kinglsey Zipf. His intented to derive a relationship between word frequencies in document collections.

For example, if the document collection words have been ordered by frequency using y, this helps determine the number of times the x word appears. So, in this case, according to Zipf observation, the calculation can be captured using formulae, i.e. y= cx-1/2.

The item frequency can be said to be inversely proportional to the item rank.

An aspiring data science specialist needs to understand the fact that they will need some amount of mathematics in their learning journey.

On the contrary, it is mandatory for professionals from a background such as hardware engineering, medicine, healthcare, and business management to have significant knowledge of mathematics. Numerical calculations and spreadsheet experience will not suffice to get into the data science domain.

This is one major reason why we see professionals upgrading their skills by obtaining data science certifications from credible certification bodies available online.

Grasping fundamentals in mathematics laws is the bedrock to get into data science. Building a strong foundation in mathematics helps:

  • Debug the already existing algorithm approach
  • Create newer machine learning solutions to solve complex problems that are domain-specific
  • Better understand machine learning models to further build efficient learning systems

Many mathematical laws exist, however, gaining extensive knowledge in the above topics could be a potential tool for your data science career.

Related Posts