AbstractChest X-ray (CXR) images have been one of the important diagnosis tools used in the COVID-19 disease diagnosis. Deep learning (DL)-based methods have been used heavily to analyze these images. Compared to other DL-based methods, the bag of deep visual words-based method (BoDVW) proposed recently is shown to be a prominent representation of CXR images for their better discriminability. However, single-scale BoDVW features are insufficient to capture the detailed semantic information of the infected regions in the lungs as the resolution of such images varies in real application. In this paper, we propose a new multi-scale bag of deep visual words (MBoDVW) features, which exploits three different scales of the 4th pooling layer’s output feature map achieved from VGG-16 model. For MBoDVW-based features, we perform the Convolution with Max pooling operation over the 4th pooling layer using three different kernels: $$1 \times 1$$
1
×
1
, $$2 \times 2$$
2
×
2
, and $$3 \times 3$$
3
×
3
. We evaluate our proposed features with the Support Vector Machine (SVM) classification algorithm on four CXR public datasets (CD1, CD2, CD3, and CD4) with over 5000 CXR images. Experimental results show that our method produces stable and prominent classification accuracy (84.37%, 88.88%, 90.29%, and 83.65% on CD1, CD2, CD3, and CD4, respectively).