Finding duplicate images

One of the surprising usages of our Image Recognition Technology is a “duplicate image” search. ​Effective use of such a technique allows every image-heavy operations to limit the amount of disk space images use by maintaining only one copy of each one. See below how we do that.

Let’s say you operate on a database of products containing their images ​- like a catalogue/store or a classified site. You also allow your customers to upload images to accompany their description – this is a crucial part of the process, but also a point where you don’t really control what your customers upload. In an e-commerce site you’ll end up with plenty of similar images, especially if there’s a season of something (and it always is) – everybody would download the same images from the google search results and re-upload it to your website. Since there’s no effective way to catalogue these huge amounts of images​ (guess how – by file names? tags? let’s say you have thousands of them…) – you can optimise it only by performing a check “is there such an image in our database already?”.

By using pattern matching techniques our Technology can do that. Because every image is a sort of “a pattern”, we can build a database of patterns (which we call “the reference images”). You can then query this database to check if it contains the images like the one you refer to. ​Since our engine transforms each “reference image” into a set of “features” – it is then much easier to store these in the database for an effective search (a set of features is much smaller in size than a bitmap). What’s more – this “feature extraction” is prone to some modifications (the image may be upside-down or rotated) or even a damage (up to 30% of image can be not visible yet still will match the pattern.).

Sebastian Kwiecień, Business Development Director

