Aiming to address the currently low accuracy of domestic industrial defect detection, this paper proposes a Two-Stage Industrial Defect Detection Framework based on Improved-YOLOv5 and Optimized-Inception-ResnetV2, which completes positioning and classification tasks through two specific models. In order to make the first-stage recognition more effective at locating insignificant small defects with high similarity on the steel surface, we improve YOLOv5 from the backbone network, the feature scales of the feature fusion layer, and the multiscale detection layer. In order to enable second-stage recognition to better extract defect features and achieve accurate classification, we embed the convolutional block attention module (CBAM) attention mechanism module into the Inception-ResnetV2 model, then optimize the network architecture and loss function of the accurate model. Based on the Pascal Visual Object Classes 2007 (VOC2007) dataset, the public dataset NEU-DET, and the optimized dataset Enriched-NEU-DET, we conducted multiple sets of comparative experiments on the Improved-YOLOv5 and Inception-ResnetV2. The testing results show that the improvement is obvious. In order to verify the superiority and adaptability of the two-stage framework, we first test based on the Enriched-NEU-DET dataset, and further use AUBO-i5 robot, Intel RealSense D435 camera, and other industrial steel equipment to build actual industrial scenes. In experiments, a two-stage framework achieves the best performance of 83.3% mean average precision (mAP), evaluated on the Enriched-NEU-DET dataset, and 91.0% on our built industrial defect environment.