Finding duplicate images

One of surprising usages for our image recognition technology is “duplicate” image find. ​Effective use of such technique allows every image-heavy operations to limit amount images take​, by maintaining only one copy of each image. See below how we do that.

Lets say you operate on a database of products containing also images representing those products ​- like a catalogue/store or classified site. You also allow your customers to upload images to accompany their description – this is crucial part, but also a point where you don’t really controll what your customers upload. In a e-commerce site you’ll end up with plenty of similar images, especially if there’s season for something (and always is) – everybody will download the same image from google search results and re-upload it to your website. Since there’s no effective way to catalogue huge amounts of images​ (gues how – by file names? tags? lets say you have thousands of them…) – you can optimise it only by performing a check “is there such an image in our database already?”.

By working using pattern matching techniques our technology can do exactly that. Since every image is a sort of “pattern”, we can build a database of patterns (which we call “reference images”). You can than query that database to check if it contains image like the one you point to. ​Since our engine transforms each “reference image” into a set of
“features” – we can then much easier store those in the database for effective search (set of features is much smaller in size than a bitmap). What’s more – this “feature extraction” is prone to some modifications (the image may be upside-down, or rotated) or even damage (up to 30% of image can be hidden yet still the pattern will match).


Sebastian Kwiecień, Business Development Director

WordPress SEO fine-tune by Meta SEO Pack from Poradnik Webmastera