- Volume: 2,
Issue: 4,
Sitasi : 41
Abstrak:
Generative AI models can now produce artwork that is virtually indistinguishable from human-created art, posing critical verification challenges for art galleries, educational institutions, and digital content platforms. Existing detection approaches face notable limitations: handcrafted feature–based methods offer interpretability but achieve limited accuracy, while deep learning approaches typically require substantial computational resources and provide minimal explanation for their predictions. We propose an attention-guided fusion framework that integrates Discrete Cosine Transform (DCT)–based frequency-domain features with deep learned representations through a learned attention mechanism, enabling both improved detection performance and interpretable decision-making grounded in signal processing theory. The attention module dynamically weights each feature modality based on input-specific reliability patterns, allowing adaptive fusion across diverse artistic styles and generation methods. We evaluate the proposed framework using two convolutional backbones: the lightweight MobileNetV2 for efficiency-critical deployment scenarios and the higher-capacity ResNet50 for settings where computational resources permit stronger feature extraction. Experiments are conducted on 18,288 artwork images spanning traditional paintings, digital illustrations, and outputs from multiple generative models, using stratified train–validation–test splits and five random seeds for statistical robustness. Under frozen-backbone settings, MobileNetV2-based attention fusion achieves an F1-score of 90.1%, while ResNet50-based attention fusion reaches 90.5%. When backbones are fine-tuned end-to-end, incorporating handcrafted features via attention fusion yields consistent performance gains: MobileNetV2 improves from 93.9% to 94.5% F1-score (+0.6%), and ResNet50 improves from 94.2% to 95.1% F1-score (+0.9%). Feature importance analysis further reveals that low-frequency DCT energy is the most discriminative handcrafted feature, confirming that frequency-domain characteristics effectively distinguish AI-generated from human-created artwork across diverse artistic styles. These results demonstrate that attention-guided fusion of signal processing features and deep learned representations provides consistent accuracy–efficiency benefits across both lightweight and heavyweight architectures, offering a practical and interpretable solution to the emerging challenge of AI-generated artwork detection.