Creating Personalized Memes Using Fine-tuned Text-to-image Models

A meme personalized by a text-to-image AI model

Berkan Zorlubas

Better Programming

· ~7 min read · March 19, 2023 (Updated: August 28, 2023) · Free: Yes

Memes have become an integral part of our online culture, serving as a means of expressing humor, satire, and social commentary. The recent advancements in Generative AI techniques have revolutionized the way we can create and share memes, with the emergence of fine-tuning techniques for text-to-image models.

In this blog post, I introduce the process of creating personalized memes using a fine-tuned Stable Diffusion (SD) model as a style transfer tool and present some examples of famous memes that I personalized with my own images.

Before starting, you can check the entire code on my GitHub page here, which I will be referring to.

Knowledge Base

Before diving into the details, let me briefly build a short knowledge base of some terms I will be mentioning:

Dreambooth for Stable Diffusion: Dreambooth, which is originally published by Google Research, is a fine-tuning technique for making Imagen text-to-image model capable of subject-driven generation. The process requires 3–5 images of the subject (a person, a dog, etc.) and embeds the subject instance in the model's output domain by binding the subject to a unique identifier during a re-training of the model. Here, the implementation of this technique for SD models is introduced. The training can be done locally, however, it requires at least 10GB of VRAM (further details are explained here). In case you don't have high VRAM, you can get it done in Hugging Face Spaces by renting adequate hardware; or, there are websites offering fine-tuning services, such as astria.ai.
Deforum: Deforum is an open-source python project which is built on SD. It adds a lot of additional functionality that is not seen in the default notebook by Stability. One of the key functionalities is the pseudo-text-to-video capability by generating a sequence of frames from single or multiple prompts and rendering a video. It has also built-in functions which make the output videos smooth in latent space via inter-frame interpolation. Although I won't be generating videos in this blog, I use Deforum because it also supports text-to-image, image-to-image as well as mask-guided image-to-image, which I will be discussing below.

After covering this background, we can get started with the steps of creating memes.

Step 1: Model Fine-tuning

The first step is fine-tuning the SD model. I picked SD1.5 and downloaded the weights of the model from here. And then I ran the training algorithm locally as my PC has an RTX3080 with 10GB VRAM which is enough.

Be careful while picking the images for your fine-tuning, try to have different backgrounds and facial expressions. The images I used are as the followings.

The images I used for fine-tuning

After the training, move the model checkpoint file to the models/ subfolder in the repo.

Step 2: Preparing the meme template

The second step is downloading the meme template image you want to personalize and after that, creating a mask of the region of interest. The mask is an image for instructions as to which part of an image to diffuse by SD. It is in greyscale and has the resolution of the meme we pick. The darker the color of a region in the mask, the more the model diffuses that region in the meme. A pair of examples that I created are as follows:

Two meme templates I picked and their corresponding mask images which I created

I have a Microsoft tablet and a stylus pen, and I used a drawing app simply to overlay the ROI in the original template images. This is the lazy approach, and one could do it by a semantic segmentation model as well.

Additionally, I masked the entire body of the target person such that the arms and overall body shape will be diffused for more consistency with the target person.

After finishing the masking, we have to move both images into a template subfolder. In templates/, we create two subfolders with the names michael_scott and escobar, and move the images into them. Make sure the name of the template and mask images are renamed as source_img.jpg and mask_img.jpg.

By the time of publishing this blog, I will already have created several templates for your use, but of course, you can create many more templates with your imagination. Just don't forget to put them in a new template subfolder.

We are all set and can proceed with setting up Deforum.

Step 3: Preparing Deforum settings

The most important step is creating the settings.txt file for Deforum algorithm for each one of the templates. It is basically the list of Deforum parameters, a few of which are now important for us: prompts, init_image and mask_file.

prompts: Unlike normal SD prompting, we don't have to enter many details and cues about the scenery. As we feed an initial image, most of the details in the spatial domain are preserved such as clothing, accessories, and so on. It is often enough to enter "picture of sks person, realistic face" along with some style cues "accent lighting, intricate details, high composition".

init_image and mask_file: these parameters are simply the path to source_img.jpg and mask_img.jpg, respectively.

A section of my settings for michael_scott template is the following:

{
    "ENABLE_STORY_MODE":"False",
    "batch_name":"michael_scott",
    "n_batch": 1,
    "prompts":[
        "picture of sks person, realistic face,... 
         ultra detailed face, accent lighting, extremely detailed,...
         intricate details, high composition"
    ],
    "width":800,
    "height":800,
    "bit_depth_output":8,
    "seed":-1 ,
    "seed_behavior":"fixed",
    "sampler":"euler_ancestral",
    "steps":70,
    "scale":10,
    "ddim_eta":0.0,
    "filename_format":"{timestring}_{index}_{prompt}.png",
    "use_init":true, 
    "init_image":"templates/michael_scott/source_img.jpg",
    "strength":0.8,
    "use_mask":true,
    "use_alpha_as_mask":false,
    "invert_mask":false,
    "mask_file":"templates/michael_scott/mask_img.jpg",
    "mask_brightness_adjust":1.0,
    "mask_contrast_adjust":1.0,
    "overlay_mask":true,
    "mask_overlay_blur":5,
    "animation_mode":"None",
    .
    .  
    .
}

Before creating a settings file of your own template, please check the ones I created. You can even re-use a settings file by only changing init_image and mask_file entries unless you need a specific prompt for your template.

Step 4: Running Deforum

At this stage, we have a source image, mask image, and a settings file for our template, and we are ready to run the Deforum script. I created a python script, run.py, which is a meta script for controlling Deforum functions. You need to give two arguments in the command line: the name of the template and the name of your finetuned model as follows:

python run.py \
  --meme_template="michael_scott" \
  --finetuned_model_path="your_finetuned_model.ckpt"

Depending on your hardware, the script can take up to several minutes. For me, it is about 30 seconds. The output image will be saved in output/ folder; the name of the image will be in the formal of {timestamp}_{prompt}.png.

Outputs of the Deforum algorithm for the templates I have given above

Step 5: Adding text to the image

This is where your imagination and sense of humor rise. We add text to the images which is a fundamental part of any meme. I didn't implement a text-adding functionality in my meta script (will add it soon). You can use any image editing tool of your preference for doing it.

In addition to the meme at the top of the page, some of the personalized memes I've generated are as follows:

Creating memes without any template

We are of course not bound to any meme templates. We can create totally creative memes by using our finetuned models and some slightly more advanced prompting. For that, we create again a template subfolder with only a settings.txt file (no need for a source or mask image).

For a storytelling meme, we often need multiple images. Deforum allows multiple prompts in a single settings file and renders them one after another. The following settings.txt includes two prompts separated by a column as a list; the script will render an image for each prompt and save them in the output/ folder.

{
    "ENABLE_STORY_MODE":"False",
    "batch_name":"engineering_student",
    "n_batch": 1,
    "prompts":[
        "cool picture of sks person hanging out with female friends, cool outfit, handsome, eye glasses, far picture, whole body picture, realistic face, realistic body, ultra detailed face, realistic legs, accent lighting, extremely detailed, ultra detailed, intricate details, high composition",
        "overweight sks person with long beard studying on desk, looking worried, far picture, whole body picture, realistic face, realistic body, ultra detailed face, realistic legs, accent lighting, extremely detailed, ultra detailed, intricate details, high composition"
    ],
    "width":850,
    "height":850,
    "bit_depth_output":8,
    "seed":-1,
    "seed_behavior":"random",
    "sampler":"klms",
    "steps":60,
    "scale":7,
    "ddim_eta":0.0,
    "filename_format":"{timestring}_{index}_{prompt}.png",
    "use_init":false,
    "init_image":"",
    "strength":0.6,
    "use_mask":false,
    "use_alpha_as_mask":false,
    "invert_mask":false,
    "mask_file":"",
    "mask_brightness_adjust":1.0,
    "mask_contrast_adjust":1.0,
    "overlay_mask":true,
    "mask_overlay_blur":5,
    "animation_mode":"None",
}

After running the run.py with this settings file, and some image editing, I created the following meme referring to how I graduated from college without much social life and with +12 kilograms.

Thanks for reading my first blog. Have a great day ahead!

#stable-diffusion #memes #python #software-development #artificial-intelligence

< Go to the original