top of page
  • Writer's pictureMichael Gruner

LightGlue: Local Feature Matching at Light Speed

Updated: Apr 16

Fresh from ICCV 2023, researchers from ETH Zurich and Microsoft Mixed Reality & AI Lab bring us a new state-of-the-art feature matching algorithm: LightGlue.

We introduce LightGlue, a deep neural network that learns to match local features across images.

The source code and the article are freely available in GitHub and ArXiv, respectively:

Key Takeaways

  • LightGlue is a state-of-the-art efficient feature matcher.

  • It's based on the transformers network architecture.

  • It builds and improves upon its predecessor: SuperGlue.

  • It's adaptive: it process less on easy image pairs, and more on challenging ones.

  • LightGlue is faster than other methods while achieving comparable accuracy.

  • It's open source and it... just works!

What are Features, Anyway?


Features represent points in an image that are interesting in some way. Either they are easily detectable, very particular or outstanding. These are typically used to find correspondences between two or more images. Take for example the following pair of images:

Two images side by side, both captured at the same scene but heavily different perspectives. There are green points marking sprinkled throughout both images.
Feature points found in a pair of images

Features provide a "digital signature" of the point, typically known as a descriptor. A good feature extractor is capable of computing similar descriptors for the same point in different images, regardless of it has different illumination, perspective, scale or if its rotated.


Matching features is useful for a number of applications including panoramic image stitching, robot localization and mapping, 3D scene reconstruction, etc...


Applications of feature extractions and matching. Left: SLAM algorithm, center: image stitching, right: 3D reconstruction and photogrametry.


LightGlue does Feature Matching


LightGlue does not extract the features from the images. LightGlue finds the best match between them, if any. So, provided the features computed for the two images above, the algorithm finds the following match:

The same image as above, but there are green lines connecting the points that match together between the two images. Most of them seem correct.
Point correspondences obtained by feature matching

Take a moment to appreciate the precision, even with such perspective difference between the images.


And it does it very well, and very fast. Here's how it compares against other state-of-the-art methods:

A plot depicting pose accuracy vs image pairs per second. LightGlue outperforms all methods in speed while achieving similar accuracy as the most precise one: MatchFormer and LoFTR
Performance of LightGlue against other state-of-the-art methods

Note how it exceeds the throughput of all other methods while achieving similar accuracies.


Testing LightGlue


Time for the fun part. I'm going to try out LightGlue's feature matching capabilities to perform homography estimation. This is: given a pair of images, what transformation do I need to perform on one of them to find the best overlap between them. I'll be using:

  • SuperPoint for feature extraction

  • LightGlue for feature matching

  • RANSAC for homography estimation

  • Everything in the CPU (no GPU processing)

  • The following pair of images:

Challenging pair of images used for the test

Initialize a new Python virtual environment and install LightGlue and OpenCV.

LightGlue related code is fairly trivial:

From the snippet above, it can be seen that lines 20-21 extract the features using SuperPoint, line 24 finds the matches between them and lines 30-31 finally filters the matching points. The remainder, after that is simply for visualization.

A pair of images with correct point correspondences between them. The images depict a pair of hands holding a mobile phone in front of a street by a park.
Result of the feature extraction and matching using SuperPoint and LightGlue

Now let's use these points to find a homography and warp the left image.

In the code continuation above, line 5 uses the points computed previously by LightGlue to estimate an homography. Line 12 applies the warping of the image. The remainder, again, is visualization.

The same image as above, but the point correspondences have been removed. The image at the right has been warped such that it overlaps and extends correctly the one on the left.
Result of the homography estimation using LightGlue's feature matching

Looking great! As a comparison, here's the same estimation but using SIFT features and FLANN matcher, probably the most popular classical methods.

The same image as above, except that the warped image is fully distorted. Nothing can be recognized.
Unsuccessful image warping when using SIFT and FLANN

To be fair, I must admit that I deliberately chose an image pair that I knew these methods struggle with. Also, I didn't perform any fine tuning of the parameters at all. Here's another pair where the classical methods seem to outperform a bit LightGlue:

Pair of results where the classical method outperforms in quality LightGlue. See the defect at the end of the yellow line.


Regardless, still great! Here's the code for the classical method, if you want to try it.




416 views0 comments

Comments


bottom of page