A user’s guide to AI image generators

Print More

Amber Samdahl, using DALL-E 3

In the last year, you have likely heard about the rise of groundbreaking tools such as DALL-E, Midjourney and Stable Diffusion. Trained on vast datasets of existing images, these AI image generators can create stunning visuals, often from a simple text description, unlocking the power of creative visual communication for millions. What makes them particularly extraordinary is their ability to generate and merge styles and concepts to create a broad spectrum of imagery, from hyperreal photography to abstract artistic paintings. 

However, the rapid popularity of these tools has quickly brought to light the complexity of copyright issues in this space. Since these tools were trained on existing images, this raises the question: Does the AI’s output infringe upon the copyright of the original artists whose works were used as training data? Additionally, how do you determine originality and authorship? 

Traditional domestic and international copyright laws are built around human creativity and expression. When an AI creates an image, it challenges our conventional understanding of authorship. Copyright law varies by country and often does not explicitly address AI-generated content, leading to uncertainties and calls for legal reforms.

Balancing the protection of existing artists’ rights with the innovative potential of AI is an ongoing challenge that will be playing out in years to come. We recommend reading the fine print on any gAI tool you’re using. Check what images their model has been trained on, read how they have credited or compensated artists, and check how they address prompts that may pull from copyrighted material.

In addition to copyright concerns, AI image generators are known for surfacing the biases represented in their datasets, resulting in the perpetuation and amplification of societal prejudices. The current lack of diversity in datasets is a concerning and ongoing issue. If these datasets lack diversity or contain historical biases, the AI is more likely to reproduce these limitations in its output.

This can manifest in various forms, such as reinforcement of stereotypes and underrepresentation of certain ethnicities, genders or cultures. Generative art tools also have a significant bias toward creating images of humans that reflect the narrow ideas of beauty perpetuated in popular culture. As a result, it is critical to exercise human oversight on any AI image generation with an eye to these concerns.

How to create successful generative AI imagery

To get the most out of generative art tools, a well-constructed prompt is key. And you may need some patience at first as you work through various iterations to achieve the image in your mind.

A good prompt has specific descriptions. Be precise about what you want. Include details about the subject, setting, mood, color scheme and style. For instance, rather than saying “a dog,” specify “a golden retriever in a sunny park with autumn leaves.”

But you will need to find a balance in how much detail you include in your prompt. Too many details can confuse the AI, and practice is key to find that balance that is descriptive while still leaving room for creative interpretation. Regardless of the level of detail, use clear, plain language. Avoid jargon, ambiguities and contradictions that could mislead the AI.

Once your images are generated, watch out for oddities with people and text. Hands, limbs, fingers and teeth can be tricky for AI image generators, but newer models are improving (especially relative to text generation). And we can’t stress enough that all of these tools require human oversight. Any imagery generated needs a critical artist’s eye (and hand) reviewing it before release.

Terms

  • AI model vs. generative AI tool: An AI model is a set of instructions that have been trained on data to make predictions and generate output based on input data. A generative AI tool (such as DALL-E or Midjourney) is an application or system that uses one or more AI models to create content. Think of the model as the recipe book and the generative AI tool as the chef. The chef uses the recipe book (model) to understand the dish, gathers the necessary ingredients (data) and applies the techniques (algorithms) to create the final dish (generated image).
  • Autoregressive models handle images by breaking them down into a series of pixels, much like how a sentence is broken down into a series of words. Autoregressive models predict the future output of pixels based on patterns determined by past data.
  • Diffusion models are like starting with a clear picture and then gradually blurring it with random noise. This is called forward diffusion. The real trick then is for the model to remove the noise to bring back the original image or even create a brand new one. This step is called reverse diffusion. Imagine you start with a clear picture, then gradually add pixel noise to it until you have a noisy mess. Now, you must remove the noise and clean it back up to reveal a picture underneath. When these diffusion models are combined with powerful language models, they can take text input, build upon their context awareness and generate realistic images. Diffusion models are the basis of tools like DALL-E 3, Imagen from Google, Stable Diffusion and Midjourney.
  • Generative Adversarial Networks (GANs) can be compared to a smart guessing game between two competing neural networks. In this game, one player (the artist) draws a picture, and the other player (the critic) has to figure out whether it’s a real photo or just a drawing. The more they play back and forth, the better the artist gets at drawing lifelike images, and the critic gets sharper at spotting fakes. This is the very concept of “deepfakes.”
  • Variational autoencoders (VAEs) are like artists who take an image, simplify it to its basics and then rebuild it. This process helps them learn about the picture’s details and characteristics to be able to generate new images.

Tools

Below, we have analyzed some of the most popular AI image generators. To demonstrate their abilities, we fed each of them the same three prompts.  

  • Prompt 1: Photoreal person
    • “A weary, middle-aged woman with dark brown hair pinned back into a secure bun. Her face is creased with smile wrinkles. She is wearing a floor-length dress with long sleeves and a heavy woolen shawl wrapped around the shoulders with scalloped edges. The image should be high quality, hyperrealistic, 8K.”
  • Prompt 2: Photoreal landscape
    • “A rustic cabin with a sloping roof and a small lean-to kitchen attached to one side. The cabin is made of hand cut slats of wood. It has a wooden door and a window with a curtain. The cabin is surrounded by cornfields, has a tributary running behind it, and a dirt road nearby. It is sunset. The image should be high quality, hyperrealistic and in a 16×9 format.”
  • Prompt 3: Artistic interpretation
    • “Create a visually striking representation of the theme of ‘Community’ interpreted through the art of origami paper folding. The composition should be rich in color and detail, capturing the essence of origami with its precise folds and geometric shapes. Emphasize the balance and symmetry that origami brings, reflecting how different elements come together to form a cohesive and vibrant community.”
  • Adobe Firefly
    • TLDR: An image generator you can share with your less tech-savvy friends. Bonus: Many of us already have access to it through Adobe Creative Cloud licenses at our stations.  
    • Description: Adobe Firefly is a diffusion text-to-image model that is both a standalone tool and is integrated into several Adobe Creative Cloud apps. It has a very intuitive graphical interface for generating and iterating on images. Adobe, which markets itself as a tool for artists, has a much stronger ethical code compared to other gAI models at your disposal. And overall, they are thinking a lot about the legal and ethical concerns of artists, writers and many others. Firefly’s big selling point is that Adobe trained its AI on its deep library of images. That means it’s not currently being sued for copyright infringement and actually goes so far as to indemnify creators who use its Firefly tool against certain legal actions. Adobe is also continuing to make sure its products are being used ethically by adding content credentials showing when content is AI-generated. 
    • Availability: Web, app, integrated into Creative Cloud applications. 
    • Copyright: Firefly does have a set of guidelines. You cannot use any images generated in Firefly Beta commercially, and Adobe can attach content credentials in the metadata. You are not allowed to train or modify the gAI model itself. Overall, Adobe has made this a tool for artists and seems the most conscious of rights, both in the dataset and what the model produces. 
    • Cost: No cost for those with an Adobe account. However, you have only 25 credits per month. Premium gets you 100 monthly credits for $4.99/month.
    • Pros:
      • Offers support across a variety of creative applications in the Adobe system.
      • Integrated throughout Adobe products including Photoshop, Illustrator and Express. Adobe Firefly Video AI will be launching for Premiere soon.
      • Creates a collection of four images from one prompt, which can be helpful for ideation.
      • Can easily customize prompts through visual, intuitive UX menu, including options for aspect ratio, image style, visual effects, color tone, lighting, composition.
      • Can emulate camera lenses by customizing the aperture, shutter speed and field of view.
      • Additional options allow you to see similar images, apply a generative fill to specific areas within the image and bring the generated image into Adobe Express.
      • Has fairly strict filters to prevent sexual or offensive content. 
      • Maintains a support community through the Discord platform where you can go to live demonstrations and ask moderators questions.
    • Cons:
      • Need an Adobe account to use.
      • You have only 25 credits per month with an account and must pay an additional $4.99 for the premium level.
      • The web version, especially the background remove tool, is buggy.
      • Sometimes the filters are too aggressive and restrict content unnecessarily.
      • Due to Firefly’s relative newness, it is a little behind in image quality.
      • App requires a full Adobe Creative Cloud license.

Prompt 1

Prompt 2

Prompt 3

  • DALL-E 3 
    • TLDR: An accurate, easy-to-use, conversational image generator integrated with ChatGPT.
    • Description: Open AI’s generative art tool is integrated with ChatGPT Plus, which makes it feel less intimidating to use, almost as if you have a friend helping you along the way. ChatGPT takes your prompts and transforms them under the hood into more descriptive forms, which allows for more diversity of imagery and more accuracy of detail. With its conversational interface, it’s also very easy to make updates to your generated images.   
    • Availability: Available to all ChatGPT Plus and Enterprise users. Also available through Bing Image Creator, which is integrated into Microsoft Copilot and Bing Chat.
    • Cost: Free through Bing Image Generator with a Microsoft account. Also free with ChatGPT Plus, but ChatGPT Plus will cost you $20/month.
    • Copyright: 
      • DALL-E 3 has mitigations in place to decline prompts that ask for a public figure by name or that ask for an image in the style of a living artist. 
      • Creators can opt their images out from training of future image generation models.
      • OpenAI has usage policies and terms of use that state that creators own the artwork that is generated from their prompts.
    • Pros:
      • You don’t have to be an expert at crafting prompts to be successful. ChatGPT helps with prompt brainstorming and refinement, making it conversational and very easy to use.  
      • Interprets and follows prompts very accurately.
      • Has better diversity of people in generated images than other tools.
      • Can keep all of your generated images in a long, scrolling conversational format to go back to over time. 
    • Cons:
      • The text-based interface is restrictive compared to other graphic user interfaces.  
      • There are limited features for editing generated images. At this point, DALL-E 3 doesn’t support AI image editing — it just re-runs a new prompt if you ask for adjustments.
      • Does not do as well with photoreal imagery as other tools, especially with people. Images tend to have a computer-generated aesthetic.

Prompt 1

Prompt 2

Prompt 3

  • Imagine with Meta AI
    • TLDR: Your basic AI image generator trained on Facebook and Instagram public-facing content. 
    • Description: Very simple user interface to input text prompts and view an array of generated image results. At this time, there are no features to upscale or edit images after the initial generation. You are limited to 1280×1280 images in a jpeg format. 
    • Availability: Free web interface with a Meta account.
    • Copyright: Meta is using the same copyright information across all their technology. They do have a responsibility statement for how generative AI is being monitored and used. All images come stamped with a watermark stating that the image has been “Imagined with AI.” 
    • Cost: Free with Meta account (Instagram or Facebook)
    • Pros:
      • Unlimited image generation
      • Recognizes and labels AI-generated images
      • The generated image resolution is pretty good for a free product
      • The images are clean with very little artifacting 
      • Would work well for generating a base image to pull into another application for further editing
    • Cons:
      • No built-in editing, up-resolution or refining of generated images in Meta’s Imagine interface
      • No negative prompting (the ability to not include something in the generated image)
      • Does not save a record of your generated images over time

Prompt 1

Prompt 2

Prompt 3

  • Leonardo
    • TLDR: An all-in-one AI Image generator and animator, geared towards artists and content creators.
    • Description: Marketed toward creating production-level quality images, Leonardo allows for text-to-image, image-to-image, image animation in 4-second clips, sketch-to-image, real-time image generation and texture generation (in its Alpha stage as of this writing). Leonardo offers several different gAI models to select how to render your imagery (including Stable Diffusion models). 
    • Availability: Web interface, iOS app, Android app in development
    • Copyright: While there is no plain-language page about the copyright, the Terms of Service states that users with a paid subscription will retain full ownership, copyright and all other intellectual property rights — while the subscription is active. Users must set their images to private to prevent Leonardo from using them for its own promotional, development and training purposes.
    • Cost: Leonardo works off of a token-based system for image generation and up-resolution. Users receive 150 tokens for free per day (for reference, images can be generated for roughly 8–25 tokens per image). Paid accounts range from $12–48/month and come with extra benefits: an increased token allowance, faster image generation and access to premium features.
    • Pros (web review only):
      • Easy to use web interface, clear layout of buttons and interface windows 
      • All the different facets of the program work seamlessly with each other. You can quickly go from text to image generation to animation in just a few clicks. 
      • One-stop shop for all your AI image-generation needs
      • Has a NSFW filter if being used by minors
      • Will flag and remove any copyrighted material
      • Has negative prompting capabilities (the ability to express what not to include in an image)
      • Will separate foreground from background and give an image with Alpha transparency for use 
      • Image generations happen pretty quickly, and depending on which model you have selected, it can create up to eight generations per prompt. However, most cap at 4. 
      • Can build personal datasets and training for AIGenerator to work with
      • Can create four seconds of AI-generated animation
    • Cons:
      • Diversity is difficult to generate without an extremely detailed prompt. Exhibits a bias towards producing images of humans that mirror mainstream ideals of beauty found across Western popular culture.
      • Has little understanding of jargon surrounding media creation (such as camera location, angle of shot, shot framing) and struggles with prompts explaining where the subjects should look in the scene. 
      • With a free account, all the images generated are considered public CC-0 licensed images.
      • Does not successfully generate text as understandable information in an image.

Prompt 1

Prompt 2

Prompt 3

  • Midjourney
    • TLDR: The tool that sparked interest in gAI and its creative potential in 2022. lt is still tops for photoreal, high-quality images.
    • Description: Midjourney scrapes the internet to create its huge dataset, which also helps create high-quality, hyper-realistic, metahuman-style images or people. And even though they have made it almost all the way across the “uncanny valley,” Midjourney’s images of people still have a certain look. Whether you know it or not, you’ve almost certainly seen Midjourney images of people (especially in downmarket banner ads on the web).
    • Availability: On Discord by inviting the Midjourney bot into your server. However, Midjourney is testing out Midjourney Alpha, a web-based version of the Discord tool, which looks like it will be coming to users soon.
    • Copyright: You technically own the image, but Midjourney reserves the rights to use your images and prompts however it pleases. The copyright rules for Midjourney, like most image-related AI, are unclear and will likely be determined by law in the future. In the meantime, Midjourney states its copyright rules on its website
    • Cost: $10/month for Basic Plan.
    • Pros:
      • Creates a collection of four images from one prompt, which can be helpful for ideation.
      • Easy to upscale images and iterate on images with a simple click.
      • Has detailed and clear documentation of how to use Midjourney on its website.
      • Not a steep learning curve.
      • Due to its somewhat lax copyright rules, Midjourney can recreate art styles incredibly well, which makes it great for experimenting.
      • Has trained on a large database, which allows it to create a large variety of high-quality images.
      • Has a Discord server where you can ask questions and see what other people are making.
      • You can blend images — upload 2–5 images and choose pieces from each to include in your new generation.
    • Cons:
      • Clunkier user experience. Until the web version is available, access is limited to Discord by direct messaging the Midjourney bot. The prompts are not as intuitive as interacting with DALL-E.
      • Due to its lax copyright rules, it could infringe on actual artists’ styles and work. 
      • Uses the internet to build its image database, so has a high likelihood of reproducing bias. This could also lead to copyright issues.
      • Due to a large amount of people using Midjourney, it can be somewhat slow. 
      • Less editable than other gAI text-to-image options. 

Prompt 1

Prompt 2

Prompt 3

  • Stable Diffusion
    • TLDR: A flexible, AI image generator that creates realistic images from text with less training data. However, it also faces considerable controversy thanks to that training data and the inclusion of copyrighted images.
    • Description: Developed by Stability AI, Stable Diffusion is a text-to-image latent diffusion model. It is entirely open source, which allows you to fine-tune and personalize the model based on your own training data. Some artists really like Stability AI’s generative art tool, but we’ve found other tools to be better. Plus, Stable Diffusion is the target of the first major IP litigation around the source material used to train gAI models. The main reason you should keep track of this tool is that their experiments in generative video and music hint at a more comprehensive, integrated set of tools for multimedia creators in the future.
    • Availability: Desktop, web; however, web options are not as flexible as the desktop version.
    • Copyright:
      • The model itself is under the Creative ML OpenRAIL-M license, meaning you can change and redistribute modified software. 
      • The images Stable Diffusion co-creates are considered public domain.
      • Stable Diffusion is being sued for unauthorized use of artists’ works in data-training sets. Legally figuring out what can and can’t be used in training AI is somewhat uncharted territory and will affect most text-to-image generators. 
    • Cost: The base version of Stable Diffusion is free to download via GitHub. If you need more tools, there’s a basic plan for $9/month, a standard plan for $49/month, a premium plan for $149 per month and Enterprise options. 
    • Pros:
      • Creates high-quality realistic images. 
      • Has text to video or animation options.
      • Can be trained on your own models to create more nuanced images. For example, you can feed it your own artwork so that it exclusively creates in that style. 
      • Produces high resolutions, up to 1024×1024.
      • Is entirely open source. Can access source code and model weights publicly on GitHub. Allows you to edit the model to truly fine-tune your image generation.
    • Cons:
      • Steep learning curve on how to download and use.
      • Uses the GPU on your computer, so you’ll need a powerful machine for Stable Diffusion to function.
      • Has uncurated datasets, which allow for explicit, stereotyped and offensive content. 
      • Allows for photorealistic misinformation. The model lacks filters to prevent harmful imagery. 
      • Has ethical and legal issues surrounding creating fake content without permission, image ownership and data rights.

Prompt 1

Prompt 2

Prompt 3

When comparing the images generated across these tools, note the similar biases present. For example, in the first prompt, many of the women present as predominantly white, even though we didn’t specify race or ethnicity.

Then there’s the curious case of age representation — what “middle age” looks like seems to differ vastly across platforms, hinting at a skewed perception embedded in the AI models.

In the second prompt, few images featured lean-to kitchens, spotlighting a significant gap in the diversity of the datasets. In the third prompt, the models created very similar outputs — from the color palettes to the conceptual layouts. Overall, we feel that generative art tools really thrive in abstract and imaginative spaces, beyond the constraints of photorealism.

Many have expressed concern about how these tools could negatively impact the careers of artists. Yet the ability to generate high-quality artwork still hinges on a fundamental understanding of art principles and visual communication. We’ve used the word a number of times here, but it’s important to remember these platforms are tools, necessitating human skill and creativity to wield them effectively. While anyone can generate images with these tools, they unleash their full potential with an artist at the helm.

Kayla LaPoure is an Emerging Media Graphic Designer at Nebraska Public Media Labs. She explores spatial media with an emphasis on accessibility. Kayla creates immersive experiences through 3-D modeling and design, 2-D graphics, UI, illustrations and co-creation with AI tools.

Brandon Ribordy is an Environment Designer for PBS Wisconsin. Brandon is responsible for creating and designing scenic designs, animations, 2-D art and 3-D models for interactive experiences for a wide variety of PBS Wisconsin content. 

Amber Samdahl is the Creative Director at PBS Wisconsin where she oversees the design and emerging media work for the organization. Amber is also a co-founder of the Public Media Innovators peer learning community at NETA, a group dedicated to advancing public media through new technologies, fostering innovation and enhancing audience experiences.

Leave a Reply

Your email address will not be published. Required fields are marked *