How to pick the best training & testing points?

Thread on Cross Validation.

We cannot use all data for model training, because that would cause overfitting.

We can of course select randomly, but there is a better option:

Cross Validation.

The steps Cross Validation does:

1. Divides the data into groups.

2. Iterates through the groups.

- Tries group combinations as training data.

- Uses the other group as testing data.

Let's see an example!

The first iteration can be:

- Group 1 & 2 as training data

- Group 3 as testing data

Of course every iteration will result in a different model.

In this case we will have 3 models.

Each with a different testing set.

Why is it good?

Different results mean that we can compare them.

Using different testing datasets, the prediction errors will differ for each model.

With Cross Validation you can select the best performing model.

That's it for today.

I hope you've found this thread helpful.

Like/Retweet the first tweet below for support and follow @levikul09 for more Data Science threads.

Thanks ๐Ÿ˜‰


Recommended by
Recommendations from around the web and our community.

Simple, intuitive & effective. Great share Levi! ๐Ÿ‘