close
close

Mondor Festival

News with a Local Lens

OmniGen: An Open Source AI Model That Lets You Edit Images Conversationally
minsta

OmniGen: An Open Source AI Model That Lets You Edit Images Conversationally

It is Decrypt‘s co-founder, Josh Quittner, has an informal meeting with his friend, Vitalik Buterin.

No, not really. They have never met, let alone in the same place at the same time. This image is false, which is not surprising. What’s surprising is that it took us less than a minute to build, using two photos and a simple prompt: “The man in image 1 and the man in image 2 posing for the cameras at a barbecue party.” Pretty cool.

The template is Omnigen, and it’s much more than just an image generator. Instead, it focuses on the image edition and context understanding, allowing users to edit their builds by simply chatting with the model, rather than loading standalone third-party tools. It is capable of “reasoning” and understanding commands thanks to its on-board LLM.

Researchers from the Beijing Academy of Artificial Intelligence finally published the weight— executable AI models that users can run on their computers — this new type of AI model that can be an all-in-one source for creating images. Unlike its predecessors, which functioned as single-purpose task executors (requiring artists to load separate image generators, control networks, IP adapters, inpainting models, etc.), OmniGen functions as a complete creative suite. It handles everything from basic image editing to complex visual reasoning tasks in a single, streamlined framework.

OmniGen relies on two main components: a variational autoencoder (the good old VAE that all AI artists know so well) that deconstructs images into their fundamental building blocks, and a transformer model that processes various inputs with a remarkable flexibility. This simplified approach eliminates the need for additional modules that often bog down other image generation systems.

Trained on a dataset of a billion images, dubbed X2I (anything-to-image), OmniGen handles tasks ranging from text-to-image generation and sophisticated photo editing to more nuanced ones like in-painting and manipulation of depth maps. Perhaps most striking is his ability to understand context. So, for example, when asked to identify a place to wash your hands, it instantly recognizes and highlights sinks in images, demonstrating a level of reasoning that approaches human understanding.

In other words, unlike any other image generator currently available, users can “talk” to Omnigen in the same way they would interact with ChatGPT to generate and edit images – no need to manage segmentation, masking or other complex techniques, since the model is able to understand everything simply through commands.

So imagine asking an open source model to create a winter coat with a herringbone pattern, add fur trim, and adjust the length, all in one go. If you don’t like it, you can just ask “make the coat white” and it will figure out the task without you having to manually select the coat, load a new template, ask “make the coat white” and pray that the coat looks. similar to your generation – or open Photoshop and have to deal with color manipulation.

This is quite a significant step forward.

One of the exciting achievements of this new model is that OmniGen integrates Microsoft’s Phi-3 LLM and researchers trained the model to apply a chain of thought approach to image generation, breaking down complex creative tasks in smaller, more manageable steps. , similar to the way human artists work. This methodical process allows for unprecedented control over the creative workflow, although researchers note that output quality currently matches rather than exceeds standard generation methods.

Looking ahead, researchers are already exploring ways to improve OmniGen’s capabilities. Future iterations could include improved handling of text-heavy images and more sophisticated reasoning capabilities, potentially leading to even more natural interaction between human creators and AI tools.

How to Run Omnigen

Omnigen is open source, so users can run it locally. However, users have a few free builds thanks to Hugging Face, the world’s largest open source AI community/repository, so they can use its servers to test the model in case they don’t have the required hardware .

Those who don’t want to bother too much with the model can go to this free cuddle space for the face and play with the model. This will open a very intuitive user interface.

Basically, the template can handle up to three context images and a nice amount of text input. It also shows a very detailed set of instructions for generating or editing images. If you are new to this, don’t bother too much with all the settings. Simply insert the image (or images) you want the program to edit or use as inspiration, and prompt it the same way you would with ChatGPT, using natural language.

However, those who want to generate images locally will need to download the weights and some libraries. Given its capabilities, it should require a lot of VRam to run. Some reports show that the model works fine with 12 GB of VRam and is currently only compatible with Nvidia cards.

To install the templates locally, simply follow the instructions provided on the GitHub page: Basically, create a new installation folder, clone the github repository, install the dependencies and you are good to go. To have a nice user interface instead of using only text, install the Gradio interface by following the steps provided in the Github page. Alternatively, you can follow this tutorial in case you prefer video instructions.

If you are a little more experienced, you can use ComfyUI to generate images. To install Omnigen, simply go to the download manager, find the Omnigen node and install it. Once you’re done, restart ComfyUI, and that’s it. Once executed, the node itself will download the weights.

We were able to test the model and it takes much longer to generate images than SD 3.5 or Flux. Its strength is not quality but accuracy, meaning some images may lack detail or realism but will show high levels of rapid adherence, particularly when dealing with natural language prompts during modifications.

In its current state, Omnigen is not a good look generator for those looking for a model that can beat Flux or SD 3.5. However, this model does not intend to be that.

For those looking for an AI-powered image editorit’s probably one of the most powerful and user-friendly options currently available. With simple commands, it achieves results similar to those achieved by professional AI artists with very complex workflows, using highly specialized tools.

Overall, the model is a great alternative for beginners testing the waters of Open Source AI. However, it could be great for professional AI artists if they combine its powerful capabilities into their own workflows. It could also significantly simplify workflows from dozens of different nodes or move to a single build with a few fewer things to run and load.

For example, using it as a primary source to merge different elements into a composition and then denoising that result so that it can go through a second pass with a more powerful AI model could prove a very effective and versatile solution for achieve large generations.

Generally intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.