From Pictures to Prompts: Analyzing and Reconstructing AI-Generated Images with BLIP2 and CLIP

Research output: Contribution to conferenceProceeding

Abstract

This study presents a novel approach to analyzing and reconstructing AI-generated images using BLIP2 and CLIP models, focusing on a dataset of 268,000 Midjourney-generated images and prompts. We introduce a multi-level feature classification system encompassing 11 categories and 1,100 features, and employ BLIP2 for descriptive text generation and CLIP for feature similarity analysis. Our methodology involves image pre-processing, custom feature extraction, and image reconstruction under various conditions. The effectiveness of our approach is demonstrated through similarity scores between original and reconstructed images, ranging from 68.15 to 71.25 across different feature extraction conditions. This research contributes to the fields of natural language processing and information retrieval by offering insights into AI-generated image analysis, proposing new methods for feature extraction and reconstruction, and suggesting avenues for improving image generation models and multi-modal understanding.

Original languageEnglish
Pages401-405
Number of pages5
DOIs
StatePublished - 13 04 2025
Event8th International Conference on Natural Language Processing and Information Retrieval, NLPIR 2024 - Okayama, Japan
Duration: 13 12 202415 12 2024

Conference

Conference8th International Conference on Natural Language Processing and Information Retrieval, NLPIR 2024
Country/TerritoryJapan
CityOkayama
Period13/12/2415/12/24

Bibliographical note

Publisher Copyright:
© 2024 Copyright held by the owner/author(s).

Keywords

  • AI-Generated Images
  • BLIP2
  • CLIP
  • Feature Extraction
  • Image Reconstruction
  • Information Retrieval
  • Multi-Modal Analysis
  • Natural Language Processing
  • Similarity Analysis

Fingerprint

Dive into the research topics of 'From Pictures to Prompts: Analyzing and Reconstructing AI-Generated Images with BLIP2 and CLIP'. Together they form a unique fingerprint.

Cite this