Thread
Errors in Linear regression ⚠️

Linear Regression - Day 3

How do we know if our predictive model is good or bad?

I will explain 🔽

🧵
If you missed the first two threads, you can read them here:

twitter.com/i/events/1590049300582735876?s=20

1/11
We have 2 different approaches to measure how good our model is.

1️⃣ Absolute error

2️⃣ Square error

Let's see how they work 👇

2/11
1️⃣ Absolute error

A good Linear Regression model is looking for a line that is close to the points.

The absolute error is the sum of the distances between the data points and the line.

So to find the best line we need to minimize the absolute error.

3/11
This method is called absolute error because we are calculating with absolute values.

Why?

Where the point is below the line we have a negative distance, but we want a positive number, so we are taking its absolute value.

4/11
Perform these steps to calculate the absolute error:

1. Measure the distances between the points and the line

2. Take absolute values

3. Sum the distances

5/11
2️⃣ Square error

This method is a similar approach, but here we use the square of the distances.

Why?

Because if you square a negative number, it becomes positive.

Again we try to minimize the error for the best results.

6/11
Perform these steps to calculate the square error:

1. Measure the distances between the points and the line

2. Square the distance

3. Sum the area of squares

7/11
🚨 But we have an issue.

Let's consider this example:

We want to compare two models.

A with 100 data points & B with 1 million data points.

Probably the sum of both absolute and square error will be larger for B.

8/11
To avoid this issue we use the mean absolute error and mean square error.

These are calculated by using the average of the distances and not the sums, so we can compare models with different dataset volumes.

9/11
🚨 Another issue alert!

In our example, we predicted house prices.

If we use $ as a currency for prices, the squared error will result in squared $, which is hardly understandable.

For this reason, we have the root mean square error (RMSE)

10/11
RMSE is taking the square root of the mean square error, so we have matching units.

If the RMSE is $100, that means that our model makes around $100 issue per prediction.

11/11
That's it for today.

I hope you've found this thread helpful.

Like/Retweet the first tweet below for support and follow @levikul09 for more.

Tomorrow I will tell you how to operate Linear Regression in Excel!

You don't want to miss that 😉

Mentions
See All
Collections
See All