Abstract
This study presents a novel approach to analyzing and reconstructing AI-generated images using BLIP2 and CLIP models, focusing on a dataset of 268,000 Midjourney-generated images and prompts. We introduce a multi-level feature classification system encompassing 11 categories and 1,100 features, and employ BLIP2 for descriptive text generation and CLIP for feature similarity analysis. Our methodology involves image pre-processing, custom feature extraction, and image reconstruction under various conditions. The effectiveness of our approach is demonstrated through similarity scores between original and reconstructed images, ranging from 68.15 to 71.25 across different feature extraction conditions. This research contributes to the fields of natural language processing and information retrieval by offering insights into AI-generated image analysis, proposing new methods for feature extraction and reconstruction, and suggesting avenues for improving image generation models and multi-modal understanding.
| Original language | English |
|---|---|
| Pages | 401-405 |
| Number of pages | 5 |
| DOIs | |
| State | Published - 13 04 2025 |
| Event | 8th International Conference on Natural Language Processing and Information Retrieval, NLPIR 2024 - Okayama, Japan Duration: 13 12 2024 → 15 12 2024 |
Conference
| Conference | 8th International Conference on Natural Language Processing and Information Retrieval, NLPIR 2024 |
|---|---|
| Country/Territory | Japan |
| City | Okayama |
| Period | 13/12/24 → 15/12/24 |
Bibliographical note
Publisher Copyright:© 2024 Copyright held by the owner/author(s).
Keywords
- AI-Generated Images
- BLIP2
- CLIP
- Feature Extraction
- Image Reconstruction
- Information Retrieval
- Multi-Modal Analysis
- Natural Language Processing
- Similarity Analysis