Real Time Face Recognition

Based on pre-trained FaceNet model

Powered by CoreSearch – R&D unit

Project Overview

We are implementing a new smart security system for our own corporate needs with the intention of powering it with human-machine conversations. Having all the tech potential at hand, we are utilizing a pretty accurate face authentication algorithm employing neural networks.

The aim of the project is automatic identification, or verifying a person from a digital image or a video frame for security reasons. One of the ways to do this is by comparing selected facial features from an image and a facial database. The system we built accumulates embeddings in order to train the classifier, with the highest possible accuracy of 95% and up to 99%, to recognize the visitors’ faces.

With the camera’s two-way audio channel, we also plan to utilize an assistant to allow it to talk to visitors, like Google Home or Amazon Echo, and make them even more custom domain centric.

 

Technologies & Tools

Pretrained Facenet, MTCNN, Python, node.js, our own neural network, Orient DB, javascript, angular 6, rasa.com


Challenge

There are some challenges that impact directly on the computer automated face detection and recognition outcomes:

  • The necessity of huge input data, since we need 300+ embeddings in order to identify 1 person and to gather data for clustering.
  • The developed solution should be able to handle illumination variations, geometric errors, noise, image orientation fluctuations, instability or motion, etc.
  • Increase image processing speed which is critical in real-time environments.
  • Computational costs of face recognition and feature extraction algorithms optimization for everyday application.
  • In order to gain a human-like approach, the voice assistant, layered up with tone and intonation, has to be able to handle contextual conversations with interactive machine learning instead of hand-crafting rules.
 

How it works

Once someone looks at the webcam at the front door, the voice greets them and the system communicates with our own database to identify the visitor. As to the more complex cases, it is planned that the system will be able to identify a visitor’s voice (audio-voice recognition); and with the chatbot functionality and the camera’s two-way audio channel, we can make it produce a relevant-to-the-enquiry dialogue.
E.g., One can ask the system a question, for example, if a certain colleague is already present at work and get a viable answer.

Technology Behind

As a core technology we chose FaceNet, a deep convolutional network designed by Google. This network takes the face and returns points in 128 dimensional space, in such a way that the euclidean distance between the points is small, if corresponding faces are similar, and huge for faces that are not similar. Now those points can be used as feature inputs into a classification, clustering, or regression task.

Train neural network

Our training set consists of a total of 300-1000 images per 1 identified person, after face detection. We also applied extra customization and developed our own admin page for a more precise training of our own classifier. Since 350 people might be difficult for FaceNet, the concept is to use and train our own neural network alongside FaceNet in order to escalate the precision rate of vectors proximity and minimize deviation ratio.

Classifier

The embeddings are fed into the system, and based on the euclidean distance, they detect the person, confirmed by the admin. Then we apply a classifier for better accuracy and we will be able to progress (develop) more dynamically. So when a new face appears, the classifier will get broader and the system will automatically be retrained.

Speech recognition

The next step is a speech recognition algorithm, with further intent classification, in order to gain structured personal communication with the assistant alongside. We are using rasa.com which offers the possibility of a learning process control according to the set rules.

Results

We have built the system for embeddings accumulation in order to train the classifier resulting in the highest possible accuracy gained, up to 99 %.

This means that we can build a pretty accurate and powerful face authentication algorithm coupled with an interactive speech-recognition assistant with the entire range of possible applications, starting from security like automatic key unlocking, or automatic payments to active expansion into other crucial applications including marketing and healthcare.

Key Features include:

  • Face features detection and comparison.
  • Landmark points identification and rotation vector calculation.
  • Improved processing result through image stabilization.

More than just technology. More than just a team.