Eden, the garden of artificial delights

Arxiv Insights
9 min readMar 15, 2023

--

We are thrilled to finally launch Eden, a collaborative, social platform for artists, creative technologists and machine learners to deploy and interact with generative AI models. Eden is built around an open-source ecosystem of tools which aims to facilitate discovery and collaboration among creatives, as well as to lower the barrier to entry for artists looking to leverage AI. For all the latest news and updates, join our Discord and follow us on Twitter.

This blogpost will focus on the practical usage of our platform, a more in depth write-up of this project’s vision and aspirations wrt open provenance for generative AI can be found here.

Videoloop created with Eden Remix + Real2Real

This post will overview the current features of the Eden App, which can be interacted with directly from the browser, no installation or special hardware needed. For now, our app is only optimized for desktop, smartphone support is on the way!

Create an account

Creating an account is super easy and takes just one minute, you can either connect with an existing MetaMask wallet or just connect with your gmail or phone number.

We support a variety of ways to connect to Eden

Welcome to the garden

Once you’ve signed into your account you’re ready to start browsing / creating! You can take a look at our content feeds or go directly to the “create” tab to make your own!

Overview of our different creation endpoints

Overview

Before diving into each endpoint separately, lets do a quick overview of what each one does:

  • Create is our “text-to-image” pipeline, allowing you to create images from prompts using SDXL (StableDiffusion XL)
  • Interpolate is an extension of that where you enter multiple prompts and get an interpolation video back that morphs through those prompts
  • Real2Real is like Interpolate, but instead of prompts, this endpoint works off images only. You upload a sequence of images and the model will generate a smooth video morph between your images!
  • Remix let’s you upload a single image and create variations of it, no prompts needed.
  • Interrogate let’s you upload an image and get back a matching prompt that describes that image.
  • Lora train let’s you upload a few (1–5) images of a specific person or object and teaches a custom StableDiffusion model what that person looks like. This takes about 15 minutes and, after training, you can use that custom lora model in all the above endpoints to make artworks of your favorite person! A VERY powerful tool indeed!
  • TTS is a customizable text-to-speech model. Upload a few audio samples of a voice, write some lines of text and have the AI model produce customized audio. Always think twice before using these features, digital consent is an important emerging ethical topic we should all be mindful of.
  • Wav2Lip takes an audio file and animates an image or video of a face to make it speak the audio.
  • Complete calls OpenAI’s GPT models to autocomplete a given text prompt.

Now, lets go into each of these for some extra juicy details. After all, getting good results with AI often requires a bit of understanding about what goes on under the hood.

Create

Create is our text-to-image endpoint, powered by StableDiffusion. Set your desired image resolution, pick one of the checkpoints (these are all variations of StableDiffusion trained on different datasets and will yield different results, even for the same prompt), enter your prompt and hit create, simple as that!

If you’re the first person to trigger a creation job in a while, it is possible that our backend will spin up a new gpu-box for you, which might take a few minutes. Once a gpu is up and running, image creations shouldn’t take more than 15 seconds.

Optional settings

Every one of our endpoints has a dropdown “optional settings” that you can adjust if you know what you’re doing.

  • For example, “upscale” wil upscale the resolution of your generated image by the given factor. If you want very HD images, upscaling is generally better than simply rendering at higher starting resolutions (width and height) because the generators are usually trained for a specific resolution and going too far beyond that can create repeating artifacts in the image, but feel free to experiment here!
  • init image” let’s you upload an image that the AI model will use as a color and shape template to start drawing from. This allows much more control over what the final image should look like.
  • The ‘init image strength’ controls how heavily this init image influences the final creation, a good first value to try is 0.35
Init image (left) and resulting creation (prompt: portrait of a beautiful, young humanoid robot wearing a veil, digital painting, artstation, concept art, sharp focus, cinematic lighting, cgsociety)

Interpolate

Interpolate lets you create smooth interpolation video’s by entering a sequence of prompts. This allows you to create simple, linear video narratives and is fully compatible with custom concept training (LORA) explained below. Here’s a simple videoloop between the following prompts:

  • “a single lone sprout grows in a barren desert, the horizon is visible in the background, low angle 8k HD nature photo”
  • “a lone sappling growing in a field of mud, realistic water colour”
  • “a giant old Tree of life, beautiful, intricate, 8k highly professionally detailed, HDR”
Lerping between 3 consecutive input prompts
A very good, long lerp animation!

Real2Real

Real2Real is an algorithm we’re pretty proud of. It essentially does the same as lerp, except that here, the input is not a sequence of prompts, but a sequence of arbitrary images (currently .jpg imgs only). The algorithm will then create a smoothed video interpolation morphing between those real input images, no prompt engineering required.

Real2Real accepts ANY input image, so you can eg import images from MidJourney, use photographs, sketches, video frames, …

Below is an example of a Real2Real morphing between the following input images:

(real) input images
Resulting Real2Real video

Note that while Real2Real accepts litterally any input image, the quality of the interpolation will depend on how well the generative model can represent the input images. Eg StableDiffusion was not particularly trained on faces and so Real2Real tends to give underwhelming results on face interpolations.

Like our other endpoints, Real2Real has a few customization parameters that can dramatically affect the results from this algorithm:

  • FILM iterations: when set to 1, this will post-process the video frames using FILM, dramatically improving the smoothness of the video (and doubling the number of frames).
  • Init image min strength: the minimum strength of the init_imgs during the interpolation. This parameter has a significant effect on the result: low values (eg 0.0–0.20) will result in interpolations that have a longer “visual path length”, ie: more things are changing and moving: the video contains more information at the cost of less smoothness / more jitter. Higher values (eg 0.20–0.40) will seem to change more slowly and carry less visual information, but will also be more stable and smoother.
    → Experiment and see what works best for you!
  • Init image max strength: the maximum strength of the init_imgs during the interpolation. Setting this to 1.0 will exactly reproduce the init_imgs at the keyframe positions in the interpolation at the cost of a brief flicker (due to not being encoded+decoded by VQGAN). Setting this to lower values (eg 0.70–0.90) will give the model some freedom to ‘hallucinate’ around the init_img, often creating smoother transitions. Recommended values are 0.90–0.97, experiment!

Remix

Remix does exactly what you think it does: it takes an input image and creates a variation of it. The most important parameter here is:

  • Init image strength: controls how much influence the init image has over the final result. Setting this to 0.0 will produce a remix that is entirely based on the ‘guessed prompt’ for the image and not influenced at all by the actual colors / shape of the input image.
initial logo draft for https://remix-alias.vercel.app/
Various remixes of the Alias logo using Eden Remix

Interrogate

This endpoint takes an input image and returns a prompt that describes the input image (this is what powers Remix). You can then use this prompt in other endpoints like create or lerp to generate similar content.

LORA Train

LORA is a fascinating technique that lets you teach new visual concepts to the AI model by using just a few example images. Let’s start by showing WHAT it can do, before explaining HOW to do it:

4 training images, cropped so that only a single face is present in the image.
Creations using a LORA trained on the above images

Unlike many apps out there that simply give you a bunch of cool looking images with your face on them, our tools allow full customization and promtability with the custom concept:

LORA models are fully promptable: “<xander> sitting on the iron throne” (top left) || “<xander> as the emperor in front of the colloseum” (top right) || “<xander> as the pope” (bottom left) || “<xander> taking a shower” (bottom right)

Additionally, once a LORA model is trained you can use it in any of our other endpoints (create, lerp, real2real, remix, …)

How to train a good LORA model

Training a good LORA model is a bit of a subtle art and requires some experimentation / iteration to get it right. Unfortunately there’s no “fix everything” button here and there are tradeoffs everywhere.

  • Training data: (for training a person) ideally you want a low number (1–6) of HD images of the same face with various poses & backgrounds.
  • Templates: We currently have three templates: ‘person’, ‘object’ and ‘style’. ‘person’ is well tested and should give decent results in most situations, the other two are still very experimental, use at your own risk :)
  • We highly recommend keeping the defaults, but if you want to adjust the advanced settings please refer to the LORA codebase here if you know what you’re doing.

Using a trained LORA model

Once your LORA model is trained, there’s a a whole lot of subtle tricks that can make your outputs look amazing if you spend a bit of time tweaking and experimenting.

First of all, to trigger the learned concept in the prompt, you have to embed the name of the lora model (the text in the dropdown menu) into the prompt:

When the name of the LORA model is eg “xander” you need to trigger it in the prompt as <xander>.

Generating images with a LORA using large height x width settings will tend to repeat the learned concept in the image, which is usually not what you want (even though it can sometimes produce funny results). To avoid this, render at lower heigth x width and simply use upscale.

Rendering with LORA at high resolution without an init_image tends to repeat the learned concept. Prompt: “<xander> taking a selfie in front of the great pyramid of gizeh, Egypt”

Alternatively, a great way to get really good results with LORA is to use an init_image. This init_image does not have to be high quality as only the rough edges and colors are actually used, but allows you to control the image composition and also render at higher native image resolution.

Using a very simple paste / paint init_image (with a low strength of eg 0.15) is a great way to steer the desired image content.

The final setting to play with is the “LORA scale”. This controls how strongly the trained LORA model is influencing the generation. A scale of 0 will result in the normal base model and will not show the trained concept at all. A scale of 1.0 will fully add the LORA model (which can sometimes be a little bit too much). Ideal values are usually around 0.7–0.9

TTS

Our text-to-speech endpoint takes a few short input .wav files of a person speaking + a prompt. Our backend will then customize a text-to-speech model using that audio and return a soundfile speaking the prompt with the person’s voice.

Wav2Lip

Takes an input image + an audio file (eg from TTS) and creates a speaking avatar using lip-synching.

Complete

Runs autocomplete on an input text using gpt3. (This just calls OpenAI’s API)

--

--