Mapping Mobile Screen Recordings

Recently i have been experimenting with feature matching and transformation. I looked for some sample code online, based on opencv of course, and ended up putting together a little program that tracks the scroll position on a webpage onto a reference image of the page.

On mobile devices it could be quite tricky to track a user’s scroll position, especially if we don’t have access neither to the browser’s nor the website’s code.

In the video below, I recorded a short navigation experience of the New York Times mobile homepage, and I took snapshots of the page and joined them to create the reference image. The program looks for features in the reference image (using SIFT). Then it looks for available features in the video (frame-by-frame, using SIFT) and tries to find matching features to those in the image (using BFMatcher). After that it calculates the ratio between the features to estimate the size of the frame in the image. Then, using the centroid of the features, it estimates the position of the frame, hence returning the scroll position.

There is a lot of space for improvement, but this is a proof of conceptI based my script on this tutorial. I tried to change the detector from SIFT to ORB, based on the example found here, because it is faster but the performance was not as good.

If you have any questions or comments, or if you need such a solution for your project feel free to contact me.