VisionaryAI is a versatile web application that leverages advanced AI models, including Gemini Pro Vision, DALL-E 3, and Stable Diffusion XL, to provide three main features: Chatbot Interaction, Image Captioning, and Text-to-Image Generation.
Features
- ChatBot: Engage in real-time conversations with the AI, powered by the Gemini Pro model.
- Image Captioning: Generate descriptive captions for your images using the Gemini Pro Vision model.
- Text to Image: Generate images using either DALL-E 3 or Stable Diffusion XL.
Installation
- Clone the repository:
git clone https://github.com/Abhrankan-Chakrabarti/GeminiFusion.git
cd GeminiFusion
- Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
- Create a
.env
file in the root directory.
- Add your Google API key:
api_key=YOUR_GOOGLE_API_KEY
Usage
- Run the application:
- Features:
- ChatBot: Navigate to the ChatBot section to start a conversation with the AI.
- Image Captioning: Upload an image and enter a prompt to generate a caption.
- Text to Image: Enter a text prompt to generate images using either DALL-E 3 or Stable Diffusion XL.
Technology Stack
- Python
- Streamlit
- Google Gemini Pro
- Google Gemini Pro Vision
- DALL-E 3
- Stable Diffusion XL
Contributing
We welcome contributions! Please see our contribution guidelines for more information.
License
This project is licensed under the MIT License. See the LICENSE file for details.