Hi and welcome as I share little details on things I learn while working as an intern involved in a machine learning project that has to do with instance segmentation. Every developer encounters errors. In this post, I'll walk through a frustrating error I faced while working on a project, and exactly how I resolved them. Hopefully, this saves you some time if you encounter similar issues.

Little milestone #1

Training a model on Kaggle is smooth until you try to run it locally. I recently trained a model, downloaded my checkpoint_best_total.pth, and set it up in VS Code using a GitHub repository.

None

After setting up the repository and pointing to my pretrained weights, I ran the inference script and then came the Runtime Error.

None

Initially, I thought it had to do with the resolution, so I added resolution and changed the values a couple of times before realising my problem was not coming from there.

None

After hours of Google searches and a few wrong turns, I finally uncovered the issue, a patch size mismatch between my training configuration and the local repository's defaults. In the src/rrfdetr/config.py, the base of the model had its patch size set to 12 but my pretrained weights were done with a patch size of 16.

You might be wondering what a patch size is. A patch size is the dimension of smaller image sub-regions (for example 16 x16 pixels) processed by the model rather than the full image. This determines the spatial extent of input data processed at once, enabling a balance between computational efficiency and context retention.

The smaller the patch size, the higher the accuracy with a slow training speed and high computational cost while larger patches have lower accuracy, low and more efficient computational cost and a faster training speed.

For maximum exploitation, patch sizes are chosen to fit the input data's resolution and the specific task.

Without the patch size or like in our case, a patch size mismatch, the model will often crash during initialization or inference or throw an AssertionError if the input image dimensions is not perfectly divisible by the patch size.

Patch Size vs. Batch Size: Do not confuse patch size (dimensions of a piece of an image) with batch size (number of training examples processed at once).

None

The patch size was then changed to 16 as it shows below and that was it.

None

After saving the file, I reran the script and the model loaded successfully. You can do whatever changes you need to make and there you go, your override issue is fixed and you can now successfully run your project.

Sometimes the smallest configuration mismatch can cause the biggest headaches. In my case, a single value in config.py was all that stood between a runtime error and a working model.

If you're running into similar issues with pretrained weights, check your patch size, input resolution, and any architecture-specific parameters. And if this post saved you some time, consider sharing it with someone who might be stuck on the same error.

Now, here is the thing. In the field of ml or software engineering it is the tiny details we miss that can cause us lots of debugging time. Reasons why paying attention to details really matter and gets to save us a lot of time. Do it once and do it well.

Side Note

What is RT-DETR?

RT-DETR is a real-time transformer architecture for object detection and instance segmentation developed by Roboflow and word on the streets of AI and ML has it that RT-DETR is rapidly rising and better than YOLOv26.

See you till the next bug. Bye

Good luck as we create magic.