๐ค Vision Language AI Demo
Interactive application showcasing multiple vision-language AI capabilities
Upload an image and AI will generate a description
๐ธ Click on examples to try
Upload an image and ask questions, AI will answer based on the image content
๐ก Common Question Examples:
- What is in the image?
- What color is...?
- How many ... are there?
- Is there a ... in the image?
Define custom categories and AI will classify the image
๐ก Tip: You can input any categories, the model will calculate similarity between the image and each category
Upload an image and have a conversation with AI about it
๐ก Conversation Prompts:
- Describe this image
- What's in the image?
- Where is this?
- What is the main color?
๐ About This Application
- Models: BLIP (Captioning & VQA) + CLIP (Classification)
- Framework: Gradio + Transformers
- Deployment: Can be deployed to Hugging Face Spaces
- Open Source: All models are open source
โก Performance Tip: Use Hugging Face Spaces Zero GPU for significantly faster processing