We define medium size data by the training time of a top predicitive model being on the order of hours to a day. The data usually still fits into the RAM, but we may need to run experiment on servers beyond the laptop. Overfitting is relatviely unimportant, and hyperparameter selection starts to be guided by computational criteria. The range of applicable techniques is restricted (e.g., kernel methods are out), and only a small number of models and hyperparameter combinations can be tried. On the other hand we are in the sweet spot of deep learning.