Large-scale educational surveys, including PISA, often collect student ratings to assess teaching quality. Because of the sampling design in PISA, student ratings must be aggregated at the school level instead of the classroom level. To what extent does school-level aggregation of student ratings yield reliable and valid measures of teaching quality? We investigate this question for six scales measuring classroom management, emotional support, inquiry-based instruction, teacher-directed instruction, adaptive instruction, and feedback provided by PISA 2015. The sample consisted of 503,146 students from 17,678 schools in 69 countries/regions. Multilevel CFA and SEM were conducted for each scale in each country/region to evaluate school-level reliability (Intraclass Correlations 1 and 2), factorial validity, and predictive validity. In most countries/regions, school-level reliability was found to be adequate for the Classroom Management scale, but only low to moderate for the other scales. Examination of factorial and predictive validity indicated that the Classroom Management, Emotional Support, Adaptive Instruction, and Teacher-directed Instruction scales capture meaningful differences in teaching quality between schools. Meanwhile, the Inquiry scale exhibited poor validity in almost all countries/regions. These findings suggest the possibility of using student ratings in PISA to investigate some aspects of school-level teaching quality in most countries/regions.