AbstractQuantum machine learning models have the potential to offer speedups and better predictive accuracy compared to their classical counterparts. However, these quantum algorithms, like their classical counterparts, have been shown to also be vulnerable to input perturbations, in particular for classification problems. These can arise either from noisy implementations or, as a worst-case type of noise, adversarial attacks. In order to develop defense mechanisms and to better understand the reliability of these algorithms, it is crucial to understand their robustness properties in the presence of natural noise sources or adversarial manipulation. From the observation that measurements involved in quantum classification algorithms are naturally probabilistic, we uncover and formalize a fundamental link between binary quantum hypothesis testing and provably robust quantum classification. This link leads to a tight robustness condition that puts constraints on the amount of noise a classifier can tolerate, independent of whether the noise source is natural or adversarial. Based on this result, we develop practical protocols to optimally certify robustness. Finally, since this is a robustness condition against worst-case types of noise, our result naturally extends to scenarios where the noise source is known. Thus, we also provide a framework to study the reliability of quantum classification protocols beyond the adversarial, worst-case noise scenarios.