Unsupervised Crack Detection in Large Stamped Metal Products Via Spatial Transformer and U2-Net with Masked Attention and Patch Positional Encoding
Abstract
Crack detection is crucial for the quality assessment of metal products. While supervised learning methods are commonly employed to automate this process, their performance is constrained by the limited availability of large and diverse crack datasets. Unsupervised learning has shown exceptional performance in anomaly detection by relying solely on healthy data during training. For large products, cracks are small in size compared to the entire object, making image patches a more suitable representation. However, the large diversity of image patches introduces significant challenges for unsupervised learning methods. This study proposes a spatial transformer network utilizing binary masks from Segment Anything Model to reduce the complexity of image patches. To perform crack detection, we propose a U2Net- based model with neighbor masked attention at different scales to learn the distribution of healthy data from discrete representations extracted by Vector Quantized Variational Autoencoder (VQ-VAE). Additionally, patch positional encoding is incorporated to enhance the model’s ability to match patches and the distribution. Features that deviate from the learned distribution are identified as cracks. We resemble a real manufacturing setting and capture images from large stamped metal panels. Comprehensive experiments are conducted, and results indicate the effectiveness of the proposed method and individual modules.
DOI
10.12783/shm2025/37400
10.12783/shm2025/37400
Full Text:
PDFRefbacks
- There are currently no refbacks.