Finding duplicate images

One of the surprising usages of our Image Recognition Technology is a “duplicate image” search. ​Effective use of such a technique allows every image-heavy operations to limit the amount of disk space images use by maintaining only one copy of each one. See below how we do that.

Let’s say you operate on a database of products containing their images ​- like a catalogue/store or a classified site. You also allow your customers to upload images to accompany their description – this is a crucial part of the process, but also a point where you don’t really control what your customers upload. In an e-commerce site you’ll end up with plenty of similar images, especially if there’s a season of something (and it always is) – everybody would download the same images from the google search results and re-upload it to your website. Since there’s no effective way to catalogue these huge amounts of images​ (guess how – by file names? tags? let’s say you have thousands of them…) – you can optimise it only by performing a check “is there such an image in our database already?”.

By using pattern matching techniques our Technology can do that. Because every image is a sort of “a pattern”, we can build a database of patterns (which we call “the reference images”). You can then query this database to check if it contains the images like the one you refer to. ​Since our engine transforms each “reference image” into a set of “features” – it is then much easier to store these in the database for an effective search (a set of features is much smaller in size than a bitmap). What’s more – this “feature extraction” is prone to some modifications (the image may be upside-down or rotated) or even a damage (up to 30% of image can be not visible yet still will match the pattern.).

Contact us if you are interested in using image recognition for finding duplicate images


Sebastian Kwiecień, Business Development Director

WordPress SEO fine-tune by Meta SEO Pack from Poradnik Webmastera