Stable Diffusion by Stability.AI can generate images from text or templates but has limited control over the process. But a new neural network structure developed by researchers at Stanford University is changing that and could give it a leg up against competition like MidJourney and Lensa.
ControlNet copies the weights of each block of Stable Diffusion into a trainable variant and a locked variant, allowing users to fine-tune image synthesis using small data sets while preserving the capabilities of the production-ready diffusion model.
How does ControlNET work?
ControlNet is a new AI technology that allows for the control of diffusion models in image and video creation through sketches, outlines, depth maps, or human poses. It addresses the issue of spatial consistency, providing an efficient way to tell an AI model which parts of an input image to keep. The technology has released pre-trained models that showcase control over image-to-image generation based on different conditions, such as edge detection, depth information analysis, sketch processing, or human pose.
These models have already triggered the development of a new generation of toolkits for creators. With spatial consistency solved, new advances in temporal consistency and AI cinema can be expected.
ControlNet works by introducing a method to enable Stable Diffusion models to use additional input conditions, which provide the AI model with more information about how to manipulate an image or video. By providing a specific set of input conditions, users can control the diffusion model to generate images or videos with specific characteristics.
The developers of ControlNet have released a range of pre-trained models that showcase the technology's capabilities in image-to-image generation based on different conditions, such as edge detection, depth information analysis, sketch processing, or human pose.
For example, the Canny edge model uses an edge detection algorithm to derive a Canny edge image from a given input image and then uses both for further diffusion-based image generation. Meanwhile, its Scribble model can turn your quick doodles into stunning photorealistic images.
The technology has already triggered the development of a new generation of toolkits for creators, which allows users to take advantage of ControlNet's capabilities. Additionally, with spatial consistency solved, new advances in temporal consistency and AI cinema are expected.
Still, some traditional artists and photographers worry about the potential dangers of such technology. This includes not being properly compensated for their work being used to train the models, potential loss of income competing against AI, and even AI deep fakes spreading misinformation or tricking unsuspecting victims.
Several pending lawsuits and USCO decisions (along with a Supreme Court decision on Section 230) could fundamentally change how these AI tools work. For now, however, generative AI is the Wild West, and it's filled with some powerful tools.
Comments / 0