Contemporary metabolomics experiments generate a rich array of complex high-dimensional
data. Consequently, there have been concurrent efforts to develop methodological standards and analytical
workflows to streamline the generation of meaningful biochemical and clinical inferences from raw
data generated using an analytical platform like mass spectrometry. While such considerations have been
frequently addressed in untargeted metabolomics (i.e., the broad survey of all distinguishable metabolites
within a sample of interest), this methodological scrutiny has seldom been applied to data generated
using commercial, targeted metabolomics kits. We suggest that this may, in part, account for past and
more recent incomplete replications of previously specified biomarker panels. Herein, we identify common
impediments challenging the analysis of raw, targeted metabolomic abundance data from a commercial
kit and review methods to remedy these issues. In doing so, we propose an analytical pipeline
suitable for the pre-processing of data for downstream biomarker discovery. Operational and statistical
considerations for integrating targeted data sets across experimental sites and analytical batches are discussed,
as are best practices for developing predictive models relating pre-processed metabolomic data
to associated phenotypic information.