Thread
A poorly tuned model learns too slow, overfits/underfits or worsen accuracy.

Hyper-parameter tuning by trial and error is impractical.

Techniques of Grid and Random Search turn this tedious task into an automated task, saving your time and sanity.
馃敼Parameter

A parameter is what the model learns about data through training.

For ex: weights of Linear Regression..

It's NOT the same as a Hyperparameter!
馃敼Hyperparameter

Hyperparameter is a 'configuration' that is set before the model in trained.

You then evaluate different configurations by calculating 'error' like RMSE or 'Accuracy'.

Finally, you pick the best performing configuration or hyper-parameter(s).
馃敼> 1 Hyperparameter

A model usually has many hyper-parameters.

For ex, in a Neural Network:
路 number of layers
路 number of neurons in a layer
路 learning rate
路 activation function, bath size etc.

Clearly, an automated way to pick the best values would save time.
馃敼Grid Search

In Grid search, all possible combinations of hyper-parameters are evaluated.

As you can imagine,
>> # of hyper-parameters 鉃★笍 >> # of combinations.

In such a case, Grid search is computationally expensive.
馃敼Random Search

A Random search does a pre-defined # of searches:

路 You select that # -> yielding faster results than Grid search.

Also, each dataset has different 'important' hyper-parameters:

路 While Grid is exhaustive, Random prioritizes some over the rest.

A win-win!
馃敼Cross Validation

Hyper-parameters are always optimized on unseen data -> the Cross-validation set.

The Test set is used to evaluate the model as a whole.
馃敼Conclusion

Hyper-parameter tuning should not be like looking for a needle in a haystack.

Many libraries including sci-kit learn have easy APIs implementing both methods.

Generally:
路 use random search if # of hyper-parameters is high. Else, use grid search.
Thank you for reading. If you like this thread, please leave a like on the 1st tweet and follow me @farazmunshi for more on ML and AI.

See you!

Mentions
See All