How to prepare a perfect image dataset


Even the most advanced cars require fuel to run. Better fuel means better performance; bad fuel will slow you down . . . if you’re lucky. If you’re not, it will damage the engine. is a F1 bolide, but you need to feed it well to keep it in shape. To race with the best, you need to take extra care of what you put into the tank. Read this article before you start to fuel up your account.

1. Balance

One of the most important rules one should keep in mind is balance. The chance that a reference image will be matched to a query image is directly proportional to the reference image’s quality. This means that if your uploaded images differ in terms of quality, they have an unequal chance of being recognized. Please consider the following example: You are building an app to recognize yogurts. You collect images from different sources that significantly differ in quality. For example, you uploaded this image:


and this one:

They both represent the same series of yogurts by Activia—only the flavors are different.

Now, when someone takes a query picture of Activia’s cherry flavored yogurt, he or she will surely get the right answer, but when the strawberry version is being photographed, the output is not so obvious. In a “single-result” mode, this might look more like flipping a coin between strawberry and cherry. In the “many results” mode, in most cases, both of these will be matched, but the order might be quite random.

The reason is that our algorithm gets undecided. On one hand, there is the ”cherry” reference image, which doesn’t match exactly in terms of the surface coverage, but most of the image (in particular, Danone Activia Selects logo) matches perfectly, so in that region, the confidence is quite high. On the other hand, there is a referenced “strawberry” image that matches in terms of surface, but the reference is so blurred that the algorithm just can’t be 100% sure about that. As a result, it seems that both reference images are equally confident matches.

The important observation to make here is that it would actually be better to upload both images in lower quality because this way we will lower the possibility of making a false recognition (i.e., recognize cherry instead of strawberry). In practice, we don’t recommend uploading low-quality images. It really pays off to prepare a consistent database of quality images. Nevertheless, distinguishing among products so similar is always challenging, and one has to be aware that mistakes can happen from time to time.

At this point, one might say that it’s easy for us to handle this issue by not taking so much attention to the local confidence but rely heavily on surface coverage instead. That would be the right approach provided that we made the assumption that each query image contains the whole object in question, but often that’s not the case. Practice shows that users often frame only a portion of what they want to learn about, hence our hybrid solution is in order.

2. Image dimensions and aspect ratio

As you probably already know, we have a lower limit for reference image dimension of 480 pixels. This means that both image dimensions must be 480px or more. Of course we cannot stop anyone from upscaling smaller images to this size, but this is just a hoax. This won’t make the image recognize well, at least not as well as it should. On the other hand, uploading large images is not great either. We will shrink them anyway, so why waste your time and bandwidth? You can always take one big image and split it into smaller ones and then add them individually. There are some amazing applications of this idea.

The perfect image is 640x480px, but of course we accept all kinds of aspect ratios. Just beware of those extreme ones. With a standard camera view, it’s difficult to make a photo of something that is incredibly narrow, not to mention that this kind of picture will contain only a small portion of useful surface compared to the background. It is also difficult to tell the user to make such a photo. It might be a better idea to divide a long image into a few shorter ones—this way it is easier to handle a different user’s actions.

3. Content

Last but not least, take extra care of what is in the image. Remember that for now, recognizes only flat, textured surfaces. The reference image should contain only that. It is important to realize that the 480px lower limit is designed for images that contain only useful information. This image:

is a perfect example of what a reference image should not look like.

Although it has the required pixel size, it doesn’t meet our requirements. That’s because the only thing we can actually recognize from this image is the Starbucks logo, which is only about 120 x 120px. The cup as well as the background are irrelevant and should not be present. Unfortunately, this cannot be solved by sending a larger image, because, as already mentioned, large images get squeezed, so the logo will eventually end up being smaller than is required.

One way to fix this is by cropping the image. If the remaining portion meets the 480px+ requirement, it will pass. But it would be better if the image contained the original, flat artwork, not its projection onto a surface. Such a projection always introduces some kind of distortion, especially if the surface is viewed from an angle. In this case, the reference image will match objects that are photographed in the same pose, but others might fail. Original artwork as reference will perform significantly better and allow for correct recognition in a wider range of a real object’s poses.
For the Starbucks image, the most desired reference image is this:

4. Summary

Before you upload, check whether your reference images contain the following features:
— high quality, consistent through the entire set
— smaller dimensions equal to 480px or slightly more
— aspect ratio between 1:1 and 2:1 (or 1:2)
— only the flat, recognizable part of the product (no background, frames, packages, or other such things)

That’s all. By following these four simple rules, you will build a great reference dataset that will make a proper feed to our mighty engine. Once it launches, it will put you way ahead of your competition. Have a nice trip!

Tomasz Grzywalski, Developer

WordPress SEO fine-tune by Meta SEO Pack from Poradnik Webmastera