Thread
How to pick the best training & testing points?

Thread on Cross Validation.

๐Ÿงต
We cannot use all data for model training, because that would cause overfitting.

We can of course select randomly, but there is a better option:

Cross Validation.

1/5
The steps Cross Validation does:

1. Divides the data into groups.

2. Iterates through the groups.

- Tries group combinations as training data.

- Uses the other group as testing data.

Let's see an example!

2/5
The first iteration can be:

- Group 1 & 2 as training data

- Group 3 as testing data

Of course every iteration will result in a different model.

In this case we will have 3 models.

Each with a different testing set.

Why is it good?

3/5
Different results mean that we can compare them.

Using different testing datasets, the prediction errors will differ for each model.

With Cross Validation you can select the best performing model.

4/5
That's it for today.

I hope you've found this thread helpful.

Like/Retweet the first tweet below for support and follow @levikul09 for more Data Science threads.

Thanks ๐Ÿ˜‰

5/5

Mentions
See All