Abstract
Laguna Lake is an economically important resource in the Philippines, with reports of declining water quality due to fecal pollution. Currently, monitoring methods rely on counting fecal indicator bacteria, which does not supply information on potential sources of contamination. In this study, we predicted sources of Escherichia coli in lake stations and tributaries by establishing a fecal source library composed of rep-PCR DNA fingerprints of human, cattle, swine, poultry, and sewage samples (n = 1,408). We also evaluated three statistical methods for predicting fecal contamination sources in surface waters. Random forest (RF) outperformed k-nearest neighbors and discriminant analysis of principal components in terms of average rates of correct classification in two- (84.85%), three- (82.45%), and five-way (74.77%) categorical splits. Overall, RF exhibited the most balanced prediction, which is crucial for disproportionate libraries. Source tracking of environmental isolates (n = 332) revealed the dominance of sewage (47.59%) followed by human sources (29.22%), poultry (12.65%), swine (7.23%), and cattle (3.31%) using RF. This study demonstrates the promising utility of a library-dependent method in augmenting current monitoring systems for source attribution of fecal contamination in Laguna Lake. This is also the first known report of microbial source tracking using rep-PCR conducted in surface waters of the Laguna Lake watershed.