How I train LoRa’s for Stable Diffusion
Hey guys and gals, I always receive questions about how to train LORAs, so I’ve finally decided to create an article guiding you through the process. Please note that, from my perspective, the process is quite simple. I usually perform basic actions without much thought, and my ability to explain might not be as precise as before.
Most of the time, when training LORAs, the dataset plays a critical role. However, there are instances where the dataset might not be as crucial, and in those cases, there’s less you need to do. This article assumes you need to work with the dataset.
Wondering how I acquire a large number of images for building my datasets? You’ve probably noticed that I always include the image quantity used in the LORAs I release, and some of these numbers are quite large. For this purpose, I use
grabber.exe. You can find it here: https://github.com/Bionus/imgbrd-grabber
This tool has saved me countless hours by allowing me to scrape images from Gelbooru, complete with the associated tags in .txt files. Ensure your settings match the following:
The folder is where all images will be stored. You can choose any folder name you prefer.
There’s another folder named “text output” that stores .txt files with the same names as the images. These files contain the tags required for training.
Don’t forget to navigate to the “sources” section and select your desired source. Some sources might require credentials, so be mindful of that.
When searching for images, make sure to exclude the following:
-animated -3d_(artwork) -webm -gif -video -real_life -6+girls -comic -sketch -english_text -japanese_text -text -speech_bubble -photo_(medium)
These exclusions are based on my experience. They can negatively affect training or be irrelevant for training purposes. For instance, “real_life” and “photo_(medium)” are excluded because I don’t create realistic LORAs. You can remove these exclusions if your aim is different.
Once you’ve made your selections, start the download from the downloads tab.
You might encounter an error prompting you to redownload missed files after the download is complete. I usually select “no” in this case. The app plays a sound when all downloads finish, which is useful for multitasking.
Pruning Dataset Images
For extensive datasets with thousands of images, you might not necessarily need to prune images.
At this point, you likely have a folder with: 1) Images, and 2) Text files. What’s next?
Firstly, I combine these folders, mixing images and text files. This makes it easier to delete both the image and its corresponding text file simultaneously.
If you’re familiar with my LORAs, you’ve probably noticed the prompts I use for calling outfits, such as “harunadef,” “harunakim,” “harunagym,” and “harunaswim.” I’ll partly explain their purpose here and delve deeper in the “Pruning Dataset Tags” section.
These prompts represent the outfits of the character. I create a new folder for each major outfit a character wears, often naming it using the character’s name and the outfit abbreviation. This system allows me to quickly identify which images belong to specific folders. In the “Pruning Dataset Tags” section, I’ll apply these outfit names as prompts to the images’ .txt files en masse.
After creating folders for each outfit that appears promising for training, I meticulously sort the main image folder. For each outfit, I move both the image and its corresponding text file to the appropriate folder.
The “rnd” folder holds the initial extensive dataset, as it’s convenient for flexibility.
During this sorting process, I remove poor-quality images. These include images with unappealing art, sketches (which lack color and can be omitted for larger datasets), images with many people, and confusing visuals (if I can’t understand an image, the AI probably won’t either).
Pruning Dataset Tags
Now, with at least two folders (“rnd” and “def”) containing pruned .txt files, it’s time to prune tags. To accomplish this, I recommend using the SD extension: https://github.com/toshiaki1729/stable-diffusion-webui-dataset-tag-editor
This tag editor is incredibly efficient and user-friendly.
Untick this box when using the extension.
Paste the location of each image folder separately and follow the steps for both.
Proceed to batch edit captions and remove tags in descending frequency order, displaying the most common tags first.
What tags should you prune?
For “default” and “extra outfits,” prune tags related to outfits/clothes, hair color, and eye color.
For “rnd,” prune hair color and eye color tags.
It’s relatively straightforward. Just ensure you don’t prune tags related to poses, nudity, or breast size.
Save all your changes after pruning.
Adding Prompt Triggers
Now that outfit-related tags are pruned, it’s time to add the prompt triggers we discussed earlier when organizing images into folders.
How do you add prompt triggers? I use a custom Python script to prepend prompt triggers to all .txt files in a folder at once.
Here’s the script:
This script adds the “flood” prompt trigger to my flood LORA. You’d adjust the script’s path and keyword to match your situation. Run the script in the command prompt using
As I write this section, I realize how much work goes into each LORA before even beginning the training phase.
At this point, you’ve organized your dataset into multiple folders, pruned .txt files, and added prompt triggers. Now you’re ready to start training.
Before diving into training, let’s discuss image repeats and those numbers attached to folder names.
These numbers represent how many times the images in that folder will repeat during training.
How do you determine the right number? I aim for about 400 images per folder. You don’t necessarily need to hit that exact number; being close is fine.
For example, if you have:
- 150 images in “def”
- 108 images in “kim”
- 98 images in “rnd”
- 50 images in “gym”
- 31 images in “swim”
You get approximately 300, 432, 408, 400, and 310 images per folder, respectively.
I must give credit to Derrian and his LORA training scripts. I use version 6 of his “easy training scripts.” Although he has a newer version, I haven’t explored it yet. You can find it here: https://github.com/derrian-distro/LoRA_Easy_Training_Scripts
Now, let’s walk through the training process:
- Click “no.”
- Click “no” again (unless you’ve saved a configuration and are redoing training).
- Click “no” (you can save configuration files for common training settings).
- Provide the path to your Novel AI model (or the model you intend to train on). I typically use NAI.
- Select the folder containing image folders.
- Choose the “name_out” folder created earlier.
- Type “yes” and name your training (usually the character’s name).
- Optional: I typically click “no.”
- Usually, I click “no.”
- Put “no” (unless you have specific reasons to put “yes”).
- Select “AdamW8bit default.”
- For the dim size, there’s often confusion. I put 32, as it covers most of what’s needed. You can resize LORAs later, so don’t worry too much about dimensions. Lower dimensions reduce detail, higher dimensions enhance detail, but excessive dimensions might lead to irrelevant learning.
- Click “cancel.”
- Select “Lora.”
- Click “cancel” (unless you’re familiar with this setting).
- Click “cancel.”
- Choose “Cosine with restarts” (you can opt for “cosine” as well).
- Put “cancel” (this setting essentially does nothing; “cosine” should work).
- Click “cancel” (unless you understand this setting, such as using 768×768 training).
- Again, click “cancel” (similar to the previous setting).
- I have an 8GB NVIDIA 3070 Ti, so I put “2” here. Adjust based on your VRAM.
- Click “epochs.”
- Set the number of epochs (e.g., “10”).
- Choose “yes” to save what you train.
- Click “cancel.”
- Click “no.”
- Click “no” (unless you’re aware of the effect).
- If you set “shuffle” to “yes,” proceed accordingly; otherwise, select “no.”
- Choose “both.”
- Put “no” (unless you’re confident about this setting).
- Select “fp16” for both options.
- Opt for “cache latents.”
- Click “no.”
Now, the training process should begin. Afterward, you’ll find the epochs in the output folder. Compare the epochs in a grid to find the best epoch by evaluating the generated images. On average, epochs 3 to 10 tend to yield good results.
In conclusion, training LORAs involves a comprehensive process that begins with obtaining datasets and progresses through meticulous pruning and preparation. Despite the complexity, the outcome is rewarding: the creation of AI-generated characters that captivate and engage users.
Remember, while this guide provides a comprehensive overview, each step requires attention to detail and experimentation. As you refine your skills, you’ll likely develop your own techniques and strategies. Don’t hesitate to explore different tools and scripts that suit your workflow, and be open to adapting as the field of AI evolves.
Training LORAs requires patience and dedication, but the results are well worth the effort. By curating datasets, organizing images, and utilizing efficient scripts, you can empower your AI model to produce captivating and diverse characters. As you navigate the world of AI character generation, embrace the iterative nature of the process and keep honing your skills for even more impressive results.
And that concludes my guide through the process of training LORAs. If you have any questions or need further guidance, feel free to reach out. Happy character crafting!