3D Object Rotation Using ArUco Markers in Python

Being a psychologist, I admit I have very little understanding of the math behind pose estimation, but I am fascinated by what can be done with this technology in fields like Augmented Reality (AR), Self-Orienting Robots (or vacuum cleaners!).

In this article I will be talking about using ArUco markers to detect the pose of an object in camera coordinates and to dynamically visualize that in 3D. In the animation below, we can see an object on the left with markers on its surface and a three dimensional representation of the same object on the right. As markers are detected, rotation parameters are estimated and used to rotate the 3D object on the right.

animated image showing an example for three dimensional object tracking and rotation using ArUco markers.
Animation 1: Example of 3D object rotation using marker tracking.

In the source code, available on GitHub, you can find the following main parts (you can skip down to the Pose Estimation).

Helper Functions

1 A little helper function “which” we have seen in a previous article. It basically finds the indices of available “values” in “x”.

2 A function to create a custom ArUco dictionary and parameters. Details are explained in a previous article in this series.

3 A function to read a three dimensional (3D) model in “.obj” format. I based this function and the next one on a neatly written article that goes through reading and plotting 3D objects. I have modified the code a bit to set the origin of the axes to the center of the object:

def read_obj(filename):
    triangles = []
    vertices = []
    with open(filename) as file:
        for line in file:
            components = line.strip(' \n').split(' ')
            if components[0] == "f": # face data
                indices = list(map(lambda c: int(c.split('/')[0]) - 1, components[2:]))
                for i in range(0, len(indices) - 2):
                    triangles.append(indices[i: i+3])
            elif components[0] == "v": # vertex data
                vertex = list(map(lambda c: float(c), components[2:]))
                vertices.append(vertex)
    # center coordinates origin
    centered_vertices = np.array(vertices)
    centered_vertices[:,0] = centered_vertices[:,0] - np.divide(centered_vertices[:,0].max(),2)
    centered_vertices[:,1] = centered_vertices[:,1] - np.divide(centered_vertices[:,1].max(),2)
    centered_vertices[:,2] = centered_vertices[:,2] - np.divide(centered_vertices[:,2].min(),2)
    return centered_vertices, np.array(triangles)

4 A function to plot the 3D object we loaded, by passing the vertices and triangles returned by the previous function:

def plot3d(vertices, triangles):
    fig = Figure()
    canvas = FigureCanvas(fig)
    ax = fig.gca(projection='3d')
    ax.view_init(elev=270, azim=270)
    ax.set_xlim([-10, 10])
    ax.set_ylim([-10, 10])
    ax.set_zlim([-10, 10])
    ax.plot_trisurf(vertices[0], vertices[1], triangles, vertices[2], shade=True, color='green')
    ax.axis('off')
    return canvas

5 Then, to be able to use the plotted object in OpenCV, for image manipulation and saving the video, I added a function that takes a PyPlot canvas and returns a NumPy array for the pixels in the plot. An article that goes through the process can be found here:

def canvas2rgb_array(canvas):
    canvas.draw()
    buf = canvas.tostring_rgb()
    ncols, nrows = canvas.get_width_height()
    return np.frombuffer(buf, dtype=np.uint8).reshape(nrows, ncols, 3)

6 Last function in the source code is to merge two image arrays horizontally, resizing the first image to the height of the second, maintaining the proportion:

def merge_images(image1, image2):
    # if images have different heights
    if image1.shape[0] != image2.shape[0]:
        # calculate the size for the first image to have same height as second while keeping the proportion
        newsize = (int(round((image1.shape[1]*image2.shape[0])/image1.shape[0])), image2.shape[0])
        # resize first image
        image1 = cv2.resize(image1, newsize)
    # if second image has alpha channel, remove it!
    if image2.shape[2] > image1.shape[2]:
        image2 = cv2.cvtColor(image2, cv2.COLOR_RGBA2RGB)
    # return an array with both images merged horizontally
    return np.concatenate((image1, image2), axis=1)

Next, the code starts using some of the functions, defining the ArUco dictionary, loading the 3D object and loading the input video file. If the –outputVideo argument is passed to the program, it also opens a “VideoWriter” object to write the frames. After that we load the reference image and detect the markers present in the image. One detail that is different than usual is to add an estimate of the distance to the markers in the reference image from the camera plane (Z axis), which is necessary for the pose estimation.

refCorners = [np.append(theseCorners, np.add(np.zeros((1,4,1)), 50), axis = 2) for theseCorners in refCorners]

Pose Estimation using solvePnP

I have based this part of the program on a very interesting article that demonstrates how easy it is to estimate head pose using OpenCV. A prerequisite of solvePnP is to obtain some camera parameters from a calibration procedure. For simplicity, we will define a rough estimate of these parameters and assume there is no lens distortion.

focal_length = cap.get(3)
center = (cap.get(3)/2, cap.get(4)/2)
camera_matrix = np.array(
                [[focal_length, 0, center[0]],
                [0, focal_length, center[1]],
                [0, 0, 1]], dtype = "double"
                )
dist_coeffs = np.zeros((4,1)) # Assuming no lens distortion

Finally, the “while” loop starts reading frames of the input video. We go through the conversion to gray scale, marker detection and concatenating marker corners from the frame and reference image to fit the requirement of solvePnP, all of which we have covered in a previous article.

The while loop continues to get the rotation and translation vectors estimates using solvePnP function of OpenCV and converts the rotation vector into a matrix:

success, rotation_vector, translation_vector = cv2.solvePnP(these_ref_corners, these_res_corners, camera_matrix, dist_coeffs, flags=cv2.cv2.SOLVEPNP_ITERATIVE)

rotation_matrix, _ = cv2.Rodrigues(rotation_vector)

After that, the rotation matrix is used to transform the vertices of the 3D object based on the estimated rotation from the video frame:

new_points = np.array([np.matmul(rotation_matrix, np.array([p[0], -p[1], p[2]])) for p in vertices])

points = (new_points[:,0], new_points[:,1], new_points[:,2])

The “while” loop then concludes by using previous functions for plotting the transformed vertices, converting the plot into an array, merging it with the video frame and showing (and saving of –outputVideo is provided) the merged frame.

Results are quite interesting and can be a great way to demonstrate the function of pose estimation. As usual, if you are trying to apply such functions in your projects and need some help, feel free to drop me a line using this form or on social media links below.