logo80lv
Articlesclick_arrow
Research
Talentsclick_arrow
Events
Workshops
Aboutclick_arrow
profile_loginLogIn

An Overview of Various AI-Powered Text-to-Image Tools

Generative Artist Taylor Moore talked about various programs for image generation, shared his thoughts on whether Artists can be replaced by AI, and demonstrated lots of cool-looking images generated by him using DALL-E 2, Midjourney, and Disco Diffusion.

Introduction

Hello, my name is Taylor Moore, I am a Canadian Game Developer and Photographer currently residing in Lisbon, Portugal. I studied film at Ryerson University and have worked in the film industry as an editor, post supervisor, and compositor; then laterally migrated to the game industry working for Electronic Arts and then Squaresoft on Final Fantasy and Parasite Eve.

At first, I was using Unity as my go-to tool, but for the past 2 years, I've been working in Unreal Engine as a Landscape and Lighting Artist. I am presently doing lighting and cinematics for Feudal Lands, and working on my solo dev project ATMOS, a voxel-based world-level building system in Unreal Engine. 

Getting Into Generative Art

My wife started pointing some AI-generated work at me over the last few months, and only recently have I gotten time into taking a closer look at AI and its potential uses and applications for gamedev and art creation. Below my first text prompt was "Eadweard Muybridge Studies of Motion".

I started this AI journey one month ago with Disco Diffusion at Google Colab, Colab is quite powerful but too slow to my taste. I then started seeing images online that were being created using Midjourney. I managed to reach out to Chris Stewart who runs the MidJourney AI Facebook group. Chris made it very clear to me that these invites are like gold, and that I had better be committed. That commitment proved quite intense and a highly addictive obsession ensued. In just two days and two sleepless nights, I blew through my monthly subscription plan.

These tools have access to millions and millions of image datasets they are pulling from to create the generation. For example, DALL-E 2 is said to be accessing upwards of 50 million images.

Tools for Generative Art

There are many tools presently out there in the open. Here is a good list to start with. The AI of these tools analyzes each chunk of the sentence as a digestible segment of data and then attempts to produce an image as closely related to that initial sentence. Through various modifications and variations, you can create some amazing work. Art styles can be applied to all of these AI platforms such as Giger, Dali, Picasso, and various impressionistic styles.

What you can do after is to push the image through Gigapixel AI to get a large image, bring it into Photoshop for minor cleanup, and then downscale it to a more reasonable size. Many artists are doing separated art components and then creating a composite mashup in Photoshop.

Here are some interesting pieces of software that you should definitely check out:

DiscoDiffusion by Google Colabs

This was my first kick at the cat, so to say. DD has an extremely powerful toolset, though it does not have a very elegant interface. I found great help getting started from this YouTube page. It was quite amazing to use a photograph I had shot at a specific location and have these used as image_prompts in conjunction with a text prompt. The challenge with DD is that it is quite slow to render out a 1000-step image and in many cases over 1 hour to render.

Google is presently building Imagen, a more powerful, faster, and consumer-friendly Text Prompt to image AI. I do know that Google is very hesitant to show much of Imagen.

Midjourney AI

Midjourney is an invite on-boarding system, it uses Discord as an i/o to send and receive calls to the AI servers. The dev DavidH and his team have done an amazing job with the Midjourney Discord integration and there is a great and growing community. Using simple text and image prompts you can create some incredible and other-worldly images. Depending on the subscription plan you are on, you can spend that money pretty quickly.

Results from your initial text prompt slowly reveal the latent 4 proxy images over 30 seconds. At this point, variant and fresh generations can be generated to bring you closer to your desired ideation. You have the ability to modify the aspect ratio within your text prompt with a maximum resolution of 2048x1280. Once you burrow down and are satisfied with your chosen variant, you can then upscale it, and pull it down to your local machine.

The Midjourney team is very responsive with the tools evolution and grand plans, and keep the community well informed on issues regarding their server scale onboarding and expanding UI and toolset. MidJourney being on Discord allows for you to access the tool via both the PC/Mac and also mobile phone.

DALL-E 2

Dalle-E 2 is the image-based system by OpenAI it is presently a waitlist/invite-only onboarding. By mistake or fortune, I was given unlimited generations on day one, which was overwhelming and at many points hilarious, I blissfully hammered hard to the point of exhaustion. The next day I was capped with the ability to only generate 50 prompts. OpenAI is using this initial research testing to even out the AI models and to see how people respond and react to the code of conduct.

The process is: You submit your text prompt on the app (PC and Mobile) and it will generate nine 1024x1024 images. You then can download your selected square hero images or regenerate them again from a modified prompt.

There is also a variant option, it is currently quite weak in its present state (this is all early days and things do change quickly, across all of the mentioned platforms). There is a cool tool that allows you to paint out certain parts of the displayed image, and then replace the area with a new text prompt.

In addition, DALL-E 2 seems to have a great ability to accurately position hands and feet in an accurate and realistic position. Such as a lion driving a motorcycle.

Currently, EA, SquareEnix, and Ubisoft have AI labs set up to develop methodologies using all forms of AI to improve their in-house toolsets, and art generation.

I have been creating some initial tests with both DALL-E 2 and Midjourney on how they could be used for gamedev. Concept ideation and exploration are incredible with these AI platforms and really help with generating images that can be overpainted and composited, in other use cases an artist is using it for Materials Creation.

Can AI Replace Artists?

Here is where my objectivity dissipates. I believe that image-based AI and its rapid evolution are going to have a profound and deep impact across many, many creative industries. Please find that my hands/mouth are somewhat tied based upon the terms of the various Content Holders Policy Terms.

To be frank, I am 63 years old and have used many hardware and software tools over my 40 years in film, post-production, and game development. I can say, I have never seen anything as powerful as these tools. The speed of innovation and iteration cannot be underestimated with these AI tools.

The future-forward ability to tune and scale these AI tools to be resolution independent, and the added ability to recall and reuse your seed, animation, and 3D capabilities will follow soon. The future of image creation will be in a state of dramatic change, but that is nothing new.

Innovation and iteration are the battlecries I hear from other artists who are using these tools in their current workflow. Looking back to the great media disruptions of history: from the wheels falling off the horse with the invention of the auto car;  to the current deep permeation of the internet and LIDAR camera phones, we are all slaves to technology and its beckoning evolutions.

Conclusion

In closing, with one of these tools, I have been able to generate images of people in any context that are indistinguishable from reality (as we know it). This was quite overwhelming, and in the wrong hands, these tools could be used for very nefarious undertakings. I have come to believe the gatekeepers are very, very anxious and want to hold the door closed, and on the other side are the artists, and explorers who want to rip the doors from their hinges.

Final Note: 97% of these images have had NO retouching and come from various platforms.DALL-E 2 images have a watermark on the bottom right.

Taylor Moore, AI Generative Artist

Interview conducted by Arti Burton

Join discussion

Comments 0

    You might also like

    We need your consent

    We use cookies on this website to make your browsing experience better. By using the site you agree to our use of cookies.Learn more