The Machine Learning Powering Generative Art NFTs - CoinDesk

The Machine Learning Powering Generative Art NFTs – CoinDesk

CoinDesk - Unknown

Jesus Rodriguez is CTO and co-founder of blockchain data platform IntoTheBlock, as well as chief scientist of AI firm Invector Labs and an active investor, speaker and author in crypto and artificial intelligence.

Artificial intelligence (AI) in the non-fungible token (NFT) space is becoming increasingly relevant. Generative art (that is, art that has been created by an autonomous system) has quickly emerged into one of the main categories of the NFT market, driving innovative projects and astonishing collections. From the works of AI art legends such as Refik Anadol or Sofia Crespo to Tyler Hobbs’s new QQL project, NFTs have become one of the main vehicles to access AI-powered art.

Generative art has been one of the quintessential machine-learning use cases, but only recently has the space achieved mainstream prominence. The leap has been mostly powered by computational gains and a new generation of techniques that can help models learn without requiring a lot of labeled datasets, which are incredibly limited and expensive to build. Even though the gap between the generative art community and AI research has been closing in the last few years, many of the new generative art techniques still haven’t been widely adopted by prominent artists, as it takes a while to experiment with these new methods.

Jesus Rodriguez is the CEO of IntoTheBlock.

The rise of generative AI has come as a surprise even to many of the early AI pioneers who mostly saw this discipline as a relatively obscure area of machine learning. The impressive progress in generative AI can be traced back to three main factors:

  1. Multimodal AI: In the last five years, we have seen an explosion of AI methods that can operate across different domains such as language, image, video or sound. This has enabled the creation of models like DALL-E or Stable Diffusion, which generate images or videos from natural language.

  2. Pretrained language models: The emergence of multimodal AI has been accompanied by remarkable progress in language models with methods like GPT-3. This has enabled the use of language as an input mechanism to produce artistic outputs such as images, sounds or videos. Language has played a paramount role in this new phase of generative AI as it has lowered the barrier for people to interact with generative AI models.

  3. Diffusion methods: Most of the photo-realistic art produced by AI methods that we see today is based on a technique called diffusion models. Prior to diffusion models coming onto the scene, the generative AI space was dominated by methods such as generative adversarial networks (GAN) or variational auto-encoders (VAE), which have trouble scaling and suffer from lack of diversity of generated outputs. Diffusion models address those limitations by following an unconventional approach of destroying the training data images until they are complete noise and reconstructing them back. The reasoning is that if a model is able to reconstruct an image from something that is, theoretically, noise, then it should be able to do it from pretty much any representation, including other domains like language. Not surprisingly, diffusion methods have become the foundation of text-to-image generation models like DALL-E and Stable Diffusion.

The influence of these methods in generative art has coincided with the emergence of another technology trend: NFTs, which have unlocked incredibly important capabilities for digital art such as digital ownership, programmable incentives and more democratized distribution models.

Text to image: Text-to-image (TTI) synthesis has been the most popular area of generative AI within the NFT community. The TTI space has produced some AI models that are literally transcending into pop culture. OpenAI’s DALL-E has arguably become the best-known example of TTI used to generate artistic images. GLIDE is another TTI model created by OpenAI, which has been adopted in many generative art settings. Google has been dabbling into the generative art space, experimenting with different approaches such as Imagen, which is based on diffusion models, or Parti, which is based on a different technique called autoregressive models. Meta has also been cultivating the generative art community with models like Make-A-Scene. AI startups are m