Finding Nemo

Thinking of, we always focus on our customers. In order to extend the juggernaut of the tool we provide for them, we continuously search for the new ways to make our service better. This time, we introduced a brand new feature – returning the recognized objects’ locations.


The need of knowing the locations of recognized objects came up to our minds ever since we started to consider the problem of multiple object recognition in our technology. At first, working with single objects, we were satisfied with „somewhere in the image”, but it turned out to be not enough for multi-object cases. We wanted to know where exactly the recognized objects were; if they overlap, if they… good question – what else could we ask for?


To answer these questions, we developed a position estimation algorithm, that returns the location of a recognized object in the shape of four corners forming its bounding box (the objects are assumed to have a rectangular envelope), as shown in figure below. We fixed it, so that the corners are always returned in the same order, and it comes pretty easy to point out, for example, which edge of the bounding box is the upper one.


An attentive reader would notice, that we have been using this tool for some time already, to mark recognized objects in the images in our previous posts.


Recently, we decided to provide this functionality to our customers. In the newly provided REST API version (‘v2′), the user is able to obtain the response with extended functionality of objects’ locations in feasible JSON format (more details here). Obviously, we ensure full retroactive compatibility for those users, who prefer to stick to the previous version.


The corners are always returned in the form points in 2D space (x, y). One shall notice, that they might go beyond the query image (a point coordinate can be negative, or greater than the query image edge). They express (as exactly as possible) the actual position of the object in the image. Thus, one can use them to:

–       determine the object’s position in the image (where are you, baby?)

–       determine what part of the object is in the image (half, quarter?)

–       determine whether the object overlaps with any other one (do you have a friend?)

–       determine the object’s orientation (rotated, upside-down?)

–       determine the skewness of the object (close to rectangle? maybe more to trapezoid, or rhomboid?)

–       extract every single object from the image (get scissors, and go ahead)

–       compare dimensions or surfaces of two (or more) objects (who’s the big boss here?)


Even more advanced functionalities will be achievable from now on. Why not turn the bounding box to a control button, that can be clicked by the mobile app user to perform some fancy actions? Why not try to put another image (or a movie, or 3D-model) within the frame, using it as an anchor for augmented reality object?


We give the powerful feature of determining recognized objects’ locations to our customers. Why? Because we are glad to broaden their horizons. Because we have full confidence in their creativity, imagination and inventiveness. And because we have never been disappointed before.


Pawel Blazejowski, Developer

WordPress SEO fine-tune by Meta SEO Pack from Poradnik Webmastera