AbstractNowadays, huge data quantities are collected and analyzed for delivering deep insights into biological processes and human behavior. This chapter assesses the use of big data for prediction and estimation through statistical machine learning and its applications in agriculture and genetics in general, and specifically, for genome-based prediction and selection. First, we point out the importance of data and how the use of data is reshaping our way of living. We also provide the key elements of genomic selection and its potential for plant improvement. In addition, we analyze elements of modeling with machine learning methods applied to genomic selection and stress their importance as a predictive methodology. Two cultures of model building are analyzed and discussed: prediction and inference; by understanding modeling building, researchers will be able to select the best model/method for each circumstance. Within this context, we explain the differences between nonparametric models (predictors are constructed according to information derived from data) and parametric models (all the predictors take predetermined forms with the response) as well their type of effects: fixed, random, and mixed. Basic elements of linear algebra are provided to facilitate understanding the contents of the book. This chapter also contains examples of the different types of data using supervised, unsupervised, and semi-supervised learning methods.