Practical Foundations of Machine Learning for Addiction Research. Part II. Workflow and use cases.
In a continuum with applied statistics, machine learning offers a wide variety of tools to explore, analyze, and understand addiction data. These tools include algorithms that can leverage useful information from data to build models. These models are capable of addressing different scientific problems. In this second part of this two-part machine learning review, we develop how to apply machine learning methods. We explain the main limitations of machine learning approaches and ways to address them. Like other analytical tools, machine learning methods require careful implementation to carry out a reproducible and transparent research process with reliable results. This review describes a helpful workflow to guide the application of machine learning. This workflow has several steps: study design, data collection, data pre-processing, modeling, and communication. How to train, validate and test a model, detect and characterize overfitting, and determine an adequate sample size are some of the key issues to handle when applying machine learning. We also illustrate the process and particular nuances with examples of how researchers in addiction have applied machine learning techniques with different goals, study designs, or data sources.