java - Google Mobile Vision: Poor FaceDetector performance without CameraSource

Question

Welcome To Ask or Share your Answers For Others

java - Google Mobile Vision: Poor FaceDetector performance without CameraSource

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

java - Google Mobile Vision: Poor FaceDetector performance without CameraSource

Right now, our application is running Snapdragon SDK successfully. We are trying to implement FaceDetector from Vision 8.3.0 on our project, in order to increase the number of compatible devices. We can't use CameraSource, as we rely on a custom camera + surface to provide certain functionality. We want to reuse as much code as possible, and Snapdragon SDK is doing amazingly with our current implementation.

Workflow is as follows:

1) Retrieve camera preview

2) Transform incoming byte array to bitmap (for some reason, we haven't managed to work with ByteBuffers. Image size, rotation and NV21 image format are provided and verified, but no faces are found). Bitmap is a global variable already initialized inside of processing thread, in order to avoid slowdowns from allocations.

3) Feed detector via receiveFrame

Results so far aren't good enough. Detection is way too slow (2-3 seconds) and inaccurate, even though we have disabled landmarks and classifications.

The question is: Is it possible to replicate CameraSource + Detector performance without using the former? Is is mandatory to use CameraSource to make it work with live input?

Thanks in advance!

EDIT

Following pm0733464 recommendations below, I'm trying to use ByteBuffer instead of Bitmap. This are the steps I follow:

// Initialize variables
// Mat is part of opencvSDK
Mat currentFrame = new Mat(cameraPreviewHeight + cameraPreviewHeight / 2, cameraPreviewWidth, CvType.CV_8UC1);
Mat yuvMat = new Mat(cameraPreviewHeight + cameraPreviewHeight / 2, cameraPreviewWidth, CvType.CV_8UC1);

// Load current frame
yuvMat.put(0, 0, data);

// Convert the frame to gray for better processing
Imgproc.cvtColor(yuvMat, currentFrame, Imgproc.COLOR_YUV420sp2RGB);
Imgproc.cvtColor(currentFrame, currentFrame, Imgproc.COLOR_BGR2GRAY);

From here, the byte array creation:

// Initialize grayscale byte array
byte[] grayscaleBytes = new byte[data.length];

// Extract grayscale data
currentFrame.get(0, 0, grayscaleBytes);

// Allocate ByteBuffer
ByteBuffer buffer = ByteBuffer.allocateDirect(grayscaleBytes.length);

// Wrap grayscale byte array
buffer.wrap(grayscaleBytes);

// Create frame
// rotation is calculated before
Frame currentGoogleFrame = new Frame.Builder().setImageData(buffer, currentFrame.cols(), currentFrame.rows(), ImageFormat.NV21).setRotation(rotation).build();

Constructing frames this way results in no faces found. However, using bitmaps it works as expected:

if(bitmap == null) {
    // Bitmap allocation
    bitmap = Bitmap.createBitmap(currentFrame.cols(), currentFrame.rows(), Bitmap.Config.ARGB_8888);
}

// Copy grayscale contents
org.opencv.android.Utils.matToBitmap(currentFrame, bitmap);

// Scale down to improve performance
Matrix scaleMatrix = new Matrix();
scaleMatrix.postScale(scaleFactor, scaleFactor);

// Recycle before creating scaleBitmap
if(scaledBitmap != null) {
    scaledBitmap.recycle();
}

// Generate scaled bitmap
scaledBitmap = Bitmap.createBitmap(bitmap, 0, 0, bitmap.getWidth(), bitmap.getHeight(), rotationMatrix, true);

// Create frame
// The same rotation as before is still used
if(scaledBitmap != null) {
    Frame currentGoogleFrame = new Frame.Builder().setBitmap(scaledBitmap).setRotation(rotation).build();
}

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T20:07:21+0000

Having detection take 2-3 seconds isn't typical. Using CameraSource isn't necessary to get the best performance What hardware are you using? Can you provide more specifics?

Some aspects of face detection are speed vs. accuracy trade-offs.

Speed:

Try using lower resolution images, if possible. Face detection should work fine at 640x480, for example. The face detector code does downsample large images before running detection, although this take additional time in comparison to receiving a lower resolution original.
Using ByteBuffers rather than Bitmaps will be a bit faster. The first portion of this should be just a grayscale image (no color info).
As you noted above, disabling landmarks and classification will make it faster.
In a future release, there will be a "min face size" option. Setting the min size higher makes the face detection faster (at the accuracy trade-off of not detecting smaller faces).
Setting the mode to "fast" will make it faster (at the accuracy trade-off of not detecting non-frontal faces).
Using the "prominent face only" option will be faster, but it only detects a single large face (at least 35% the width of the image).

Accuracy:

Enabling landmarks will allow the pose angles to be computed more accurately.
Setting the mode to "accurate" will detect faces at a wider range of angles (e.g., faces in profile). However, this takes more time.
Lacking the "min face size" option mentioned above, only faces larger than 10% the width of the image are detected by default. Smaller faces will not be detected. Changing this setting in the future will help to detect smaller faces. However, note that detecting smaller faces takes longer.
Using a higher resolution image will be more accurate than a lower resolution image. For example, some faces in a 320x240 image might be missed that would have been detected if the image were 640x480. The lower the "min face size" you set, the higher the resolution you need to detect faces of that size.
Make sure that you have the rotation right. The face won't be detected if it is upside down, for example. You should call the face detector again with a rotated image if you want to detect upside down faces.

Also, garbage collection time can be a factor if you're creating a lot of Bitmaps. An advantage of using ByteBuffer is that you can reuse the same buffer repeatedly without incurring per-image GC overhead that you would have encountered if you had used a Bitmap per image. CameraSource has this advantage, since it uses only a few buffers.

Categories

java - Google Mobile Vision: Poor FaceDetector performance without CameraSource

java - Google Mobile Vision: Poor FaceDetector performance without CameraSource

EDIT

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags