Have you ever wished you could turn your words into beautiful images? Well, now you can! With the power of deep learning and modern web technologies, building a text-to-image generation app is easier than ever before. In this article, we'll explore how to build a text-to-image generation app using Gradio and stable diffusion API, so you can transform your textual ideas into stunning visuals with just a few clicks.
Text-to-image generation is a fascinating and rapidly developing field of artificial intelligence. It involves training a machine learning model to convert textual descriptions into corresponding images, giving rise to a wide range of potential applications, from generating realistic product images for e-commerce sites to creating vivid illustrations for children's books.
To build our text-to-image generation app, we'll use Gradio, an open-source framework that makes it simple to create custom user interfaces for machine learning models. We'll also use stable diffusion API as a service that can be accessed by other applications or web services. This will enable us to easily integrate our app with other systems and make it widely available to users. Whether you're a seasoned machine learning expert or a curious beginner, this article will provide a step-by-step guide to building your own text-to-image generation app.
Text to Image App
We will use the gradio framework to build the front end. Gradio is an open-source python framework to quickly prototype your ML models and see them in action. It provides easy-to-use components for different kinds of inputs and outputs. The first step in building our app is to install gradio. We can do that in the following way
pip install gradio
Stable diffusion API provides API endpoints to use the image generation services. You can signup and get an API key to use in the application we are going to build. We will make our app in such a way that it has four inputs. The input prompt, API key input, number of steps for inference, and a check box for safety filter. We will keep the rest of the input parameters as default for the sake of simplicity for this demo app. Below is the code for the gradio app
import requests
from PIL import Image
import gradio as gr
from io import BytesIO
url = "https://stablediffusionapi.com/api/v3/text2img"
title = """<h2><center>Text to Image Generation with Stable Diffusion API</center></h2>"""
description = """#### Get the API key by signing up here [Stable Diffusion API](https://stablediffusionapi.com)."""
def get_image(key, prompt, inference_steps, filter):
payload = {
"key": key,
"prompt": prompt,
"negative_prompt": "((out of frame)), ((extra fingers)), mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), (((tiling))), ((naked)), ((tile)), ((fleshpile)), ((ugly)), (((abstract))), blurry, ((bad anatomy)), ((bad proportions)), ((extra limbs)), cloned face, (((skinny))), glitchy, ((extra breasts)), ((double torso)), ((extra arms)), ((extra hands)), ((mangled fingers)), ((missing breasts)), (missing lips), ((ugly face)), ((fat)), ((extra legs)), anime",
"width": "512",
"height": "512",
"samples": "1",
"num_inference_steps": inference_steps,"safety_checker": filter,"enhance_prompt": "yes","guidance_scale": 7.5}
headers = {}
response = requests.request("POST", url, headers=headers, data=payload)
url1 = str(json.loads(response.text)['output'][0])
r = requests.get(url1)
i = Image.open(BytesIO(r.content))
return i
demo = gr.Interface(fn=get_image,
inputs = [gr.Textbox(label="Enter API key"), gr.Textbox(label="Enter the Prompt"), gr.Number(label="Enter number of steps"),gr.Checkbox(label="Safety filter")],
outputs = gr.Image(type='pil'), title = title, description = description).launch(debug='True')
In the above code, we first imported the gradio, requests, and pillow. Then we gave the endpoint URL to make the API calls. Gradio needs a function to work. This function is the main component of our app that processes the inputs and returns the output. So we defined a function that has all the input parameters for our app and also we used the pillow library for opening the image.
As the last step, we defined the gradio interface and gave the different input formats and the final output format, and using the .launch() we launch the app. Once we run this app, it looks like the below one
You can try the above app here by giving your API key and seeing the performance of the Stable DIffusion API. You can see the speed with which the images are generated. Upgrade to get a more exciting performance. Find more about the plans and pricing here.
Conclusion
In conclusion, we've explored how to build a text-to-image generation app using Gradio and stable diffusion API. We've seen how text-to-image generation is a rapidly evolving field that has a multitude of applications in various industries. With the help of Gradio, we were able to create a user-friendly interface that allows users to interact with our API and generate images from textual input.
Also Read: How to generate Images from Text using the Dreambooth API?- Blog (stablediffusionapi.com)