With some pride, the FBI trumpeted the news last month that thanks to the agency’s facial-recognition system Neil Stammer, wanted for sexual assault and kidnapping, had been apprehended in Nepal after being on the run for 14 years. The truth was slightly more prosaic. A State Department official had used the FBI’s “Wanted” posters in a test for passport fraud. The system then matched Mr Stammer’s face with an American calling himself Kevin Hodges who regularly visited the US embassy in Kathmandu to renew his visa. Still, Mr Stammer’s arrest illuminates the growing importance of facial-recognition technology.
The two main techniques used to recognise faces electronically are principal-component analysis (PCA) and linear-discriminant analysis (LDA). Both compare a picture of someone’s phizog with a reference image taken in a controlled environment. Passport photos and mugshots, then, are about as ideal as it gets.
Basic PCA and LDA are good for skin colour, hair colour and the like. Advanced systems, such as that used with British biometric passports, may look at cheek bones, the bridge of the nose, jaw lines and eyes.
All of which is fine when someone is sitting or standing in front of a camera, but is less useful in the world beyond the studio. That requires a technique called Elastic Bunch Graph Matching (EBGM), which tries to create a three-dimensional (3D) model from two-dimensional images. This model can, thereafter, be used to match any subsequent image, or part thereof.
EBGM considers the head as a union of two ellipsoids: one whose main axis is vertical, and runs from forehead to chin; the other whose main axis is horizontal, and runs from tip of the nose to the back of the cranium. This basic scheme is overlaid with “fiducial” points which act as anchors for the modelling. These can be as few as half a dozen (the pupils of the eyes, the corners of the mouth, and so on), or as many, in one system, as 40,000.
EBGM allows the construction of a three-dimensional representation of a face from poorly lit images taken at odd angles, such as a closed-circuit television camera might provide. Once it recognises enough fiducial points it can work out what aspect of a face it is viewing. It then extrapolates the expected positions of other fiducial points. As more data come in from the camera, the model’s shape is updated. Given enough horsepower, says a British official, such a system can build a model from as few as 80 pixels located between a subject’s eyes—and only two images are needed for a 3D reconstruction.
Governments are not the only ones interested. Earlier this year, Facebook’s DeepFace system was asked whether thousands of pairs of photos were of the same person. It answered correctly 97.25% of the time, a shade behind humans at 97.53%. Although DeepFace is only a research project, and is aided by the fact that many Facebook photos are tagged with the names of people in the images, which lets the system learn those faces in different poses and lighting conditions, it is still an impressive feat.
As DeepFace shows, access to an accurate gallery of images is crucial. Passport photos, or those on national identity cards, can act as such galleries, as they can be rendered by EBGM into usable 3D models. Add in the increasing ubiquity of closed-circuit television, and the idea that anyone will be able hide for long in Nepal, or anywhere else, looks quaint.