Author links open the overlay panel, , , , , ,
Abstract
Face recognition is one of the most popular applications in the field of target recognition. At present, frontal faces can be easily detected, but multi-view face detection remains a difficult task due to various factors such as lighting, various poses, occlusions, and facial expressions. Loss functions based on margins are used to increase the feature margins between different classes, improving the distinguishability of face recognition models, but face recognition performance in complex scenes (e.g. high-tilt face recognition in surveillance environments) can be significantly degraded. Recently, the idea of a mining-based hard sample highlighting strategy was used to get good results in multi-view face recognition. However, most of the existing methods do not explicitly emphasize samples because of their importance, resulting in underutilization of hard samples. In this article, we propose a curriculum learning loss function (HeadPose-Softmax) to classify the difficulty of a sample based on its face pose and embed the concept of curriculum learning in the loss function to implement a novel deep face recognition training strategy. The loss function explicitly emphasizes the importance of the samples according to the different difficulty of each sample, which allows the model to make more extensive use of hard samples, focus on learning pose-invariant features, and improve the accuracy of the model in multi-view face recognition tasks . In particular, during the training phase, our HeadPose-Softmax dynamically adjusts the relative importance of the hard rehearsals according to the pose angle of the face in the hard rehearsals. At each stage, different samples are assigned different meanings based on their difficulty. Extensive experimental results under popular benchmarks show that our HeadPose-Softmax can improve the model's accuracy in multi-view face recognition and outperform the state-of-the-art competitors.
introduction
Face recognition is one of the most researched topics in computer vision and has a wide range of applications in security, finance and transportation[1]. Recently, methods based on deep learning have made great strides in the field of facial recognition, surpassing human capabilities [2], [3], [4], [5], [6]. However, most existing face databases are clearer frontal faces and face images with only a small change in pose (small pose), and rarely contain multi-view face images of the same person in different environments. In contrast, images and videos captured in real life, such as those from surveillance cameras, are often multi-view facial images with different poses, expressions, and lighting characteristics [7], [8], [9], [10] , [11], which fall under the category of multi-view face recognition. And most of the current face recognition models perform poorly on such images and videos. This indicates that the pose-invariant face recognition (PIFR) desired by real applications is far from solved and warrants future investigations.
In PIFR, head pose variation was found to be the most important factor affecting face recognition performance[12]. In order to overcome these difficulties, a large number of application methods have been proposed, which can be divided into two categories. First, some researchers have tried to avoid these problems by synthesizing a front-viewed face on the input image. For example Hassner et al.[13] Face normalization is performed by local texture warping using an average 3D face model to generate a positive face view, FFWM[14] synthesizes a realistic and light reserved frontal image by learning the different lighting properties of the image, 2DAL[15] uses sparse heat maps of 2D facial features as additional information to generate frontal facial images, and Richard et al.[16] synthesized a morphable 3D model (3DMM) into a GAN network to generate positive face images. However, the existing face frontalization work is mainly for the side face frontalization, which is not effective for the pitch angle posture frontalization. Because tilt faces do not have the symmetry of side angle faces, key point information will be blurred or obscured, resulting in the inability to produce true frontal face images. Second, other researchers have focused on learning discriminative representations directly from non-frontal faces by integrating pose perception modules[17], [18], [19] or developing methods for data augmentation by synthesizing faces[20], [21 ], [22]. . Due to the small number of multipose face samples in the training data set, it is difficult for the existing methods for learning non-frontal faces to learn enough pose-invariant features from them. And the faces of various poses synthesized by the data enhancement process lack facial detail relative to the wild images.
State-of-the-art facial recognition (SOTA) mainly uses softmax-based classification losses. Using the original Softmax feature to learn facial features suffers from the lack of adequate recognition of realistic faces[23]. To address this problem, many loss function optimization methods for face recognition have been proposed. For example, some methods improve feature discrimination based on explicit edges to improve compactness and interaction between classes[3], [23], [24], some methods improve feature discrimination based on implicit edges[25], and some Methods introduce a local class difference (LID) for each face image to normalize the absolute similarity value in face recognition [26]. However, these marginal-based loss functions do not account for the importance level of each sample, and both positive face samples and pose change samples are trained using the same importance. Since there are fewer pose variation facial images, using the same importance for training can result in the model not giving enough importance to pose variation facial features and learning fewer pose variation facial features to the point of significant degradation in recognition performance Pose variation facial images. To overcome the problem of consistent importance of sampling, some researchers have proposed the introduction of curriculum learning or hard sampling as a solution. For example, the CurricularFace[27] differentiates the importance of samples by adaptively adjusting hard and easy samples, Guie et al.[28] divide the training data into subsets of increasing complexity for training based on facial expression intensity, and MV-Arc-Softmax[29] directs discriminative feature learning by emphasizing misclassified feature vectors.
In this work, we propose a new approach incorporating the idea of curriculum learning to overcome the face recognition problem with multiple views of different poses, called HeadPose-Softmax. We use head pose as a measure of sampling difficulty, particularly for large datasets of different head poses collected in the field. In HeadPose-Softmax, samples are randomly selected in each small batch, and the samples are classified into easy and hard samples based on their head pose angles. Different hard samples have different meanings, and the meaning of hard samples depends on the angle of the samples' head pose, and their loss values are positively correlated with the head pose, which makes the model more focused on hard samples. In summary, the contributions of this work are:
- •
We present a curriculum-based learning method for multi-view face recognition that measures the learning difficulty of a sample based on the pose angle of the faces in the sample. To the best of our knowledge, this is the first method to use facial pose angle as a measure of sample metrics in curriculum learning.
- •
We have developed a new fit coefficient functionto implement the important change of hard rehearsals during training, which adjusts the importance according to the pose angle of the hard rehearsals, without manually adjusting the importance of each hard rehearsal.
- •
We conducted extensive experiments with popular facial benchmarks on multiple datasets containing a large number of pose variations. The experimental results show that our HeadPose-Softmax outperforms SOTA competitors in multi-view detection.
section cutouts
related works
This section briefly addresses issues related to face pose, loss function face recognition, and curriculum learning methodology.
methods
This section discusses details of the proposed approach. We first describe the head pose estimation method and then give details of the proposed learning algorithm for the face recognition curriculum.
records.
We test our method extensively on several popular benchmarks, including LFW [51], AgeDB-30 [52], [53], CALFW [54], CPLFW [55], CFP-FP [56], IJB-B [57] , IJB-C [58] and some data from TFD [50] as test sets. LFW and AgeDB-30 are currently the most commonly used datasets in face recognition benchmarks. CALFW and CPLFW are more recent datasets for age and large poses. CFP-FP is a data set to validate the model's ability to recognize large changes in face poses, 500 subjects. IJB-B and IJB-C are
Diploma
In this article, we have proposed a simple but very effective loss function based on a softmax loss function guided by facial head pose classification (i.e. HeadPose-Softmax) for the face recognition task. In particular, HeadPose-Softmax Loss focuses explicitly on optimizing feature vectors with a high pose and dynamically adjusting the importance of samples according to the pose angle. It borrows the idea of semantic learning from courses by distinguishing between difficult and easy examples,
Uncited references
[59]
Declaration of Competing Interests
The authors declare that they are not aware of any competing financial interests or personal relationships that may have influenced the work described in this document.
thanks
This work is supported by the National Natural Science Foundation of China (U1903214,62071339, 62072347).
Jifan Yangreceived his B.S. Graduated from the School of Electrical and Control Engineering, Heilongjiang University of Science and Technology, China in 2018 and his M.S. Graduated from the School of Computer Science and Engineering, Guangxi Normal University, China in 2021. He is currently pursuing his PhD at the National Multimedia Software Engineering Research Center, School of Computer Science, Wuhan University, Wuhan, China. His research interests include image processing, quality assessment,
references(70)
- Q.Wanget al.
Face. evolution: A cross-platform library for powerful facial analysis
Neuroinformatik
(2022)
- S.Zhanget al.
Faceboxes: A CPU real-time and accurate unrestricted face detector
Neuroinformatik
(2019)
- S.Zhanget al.
Faceboxes: A CPU real-time face detector with high accuracy
2017 IEEE International Joint Conference on Biometrics (IJCB)
(2017)
- B.Huanget al.
Plface: Progressive learning for face recognition with mask distortion
pattern recognition
(2023)
- R.Sitharaet al.
A survey on facial recognition technology
2019 IEEE International Conference on Innovations in Communication, Computing and Instrumentation (ICCI)
(2019)
- C.Huanget al.
Deep unbalanced learning for face recognition and attribute prediction
IEEE transactions for pattern analysis and machine intelligence
(2019)
- J.Denget al.
Arcface: Additive angular edge loss for deep face detection
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(2019)
- B.Huanget al.
Joint segmentation and recognition of recognition features for occlusion face recognition
IEEE Transactions on Neural Networks and Learning Systems
(2022)
- Z.Wanget al.
Masked Face Recognition Dataset and Application
IEEE Transactions on Biometrics, Behavior, and Identity Science
(2023)
- X.Tschaiet al.
Local linear regression for pose-invariant face detection
IEEE transactions for image processing
(2007)
Sparse feature extraction for pose-tolerant face detection
IEEE transactions for pattern analysis and machine intelligence
(2014)
A comprehensive study of pose-invariant face recognition
ACM Transactions on Intelligent Systems and Technologies (TIST)
(2016)
Blind quality metrics of dibr-synthesized images in the discrete wavelet transform domain
IEEE transactions for image processing
(2020)
Reference-free dibr-synthesized video quality metrics in spatial and temporal domains
IEEE transactions on circuits and systems for video technology
(2022)
The impact of image quality on face recognition performance
33rd WIC Symposium on Information Theory in the Benelux, Boekelo, The Netherlands
(2012)
Effective facial frontalization in unrestricted frames
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2015)
Learning flow-based feature warping for face frontalization with inconsistent lighting monitoring
European Conference on Computer Vision
(2020)
3D facial reconstruction from a single image, supported by 2D facial images in the wild
IEEE Transactions on Multimedia
(2021)
A 3D gan for improved face recognition in large poses
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(2021)
Pose-aware facial recognition in the wild
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2016)
Robust face recognition through multimodal deep face imaging
IEEE Transactions on Multimedia
(2015)
On the way to pose-invariant face recognition in the wild
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
(2018)
Beyond Face Rotation: Global and Local Perceptual Channel for Photorealistic and Identity Preserving Frontal View Synthesis
Proceedings of the IEEE International Conference on Computer Vision
(2017)
Dual-agent goose for photorealistic and identity-preserving profile face synthesis
NIPS
(2017)
3D-assisted dual-agent goose for unrestricted facial recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2019)
Sphereface: Deep hypersphere embedding for face detection
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2017)
Cosface: Loss of cosine with large margin for deep face detection
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2018)
Adacos: Adaptive scaling of cosine logits for learning deep face plots effectively
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(2019)
Dam: Discrepancy alignment metric for face detection
2021 IEEE/CVF International Conference on Computer Vision (ICCV)
(2021)
Curricularface: adaptive curriculum learning loss for deep face recognition
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
(2020)
Learn curriculum for facial expression recognition
2017 12th International IEEE Conference on Automatic Face and Gesture Recognition (FG 2017)
(2017)
Misclassified vector-guided softmax loss for face detection
Proceedings of the AAAI Conference on Artificial Intelligence
(2020)
Learn a pose-invariant high-fidelity model for high-resolution facial frontalization
arXiv form arXiv:1806.08472
(2018)
Joint face alignment and 3D face reconstruction applied to face recognition
IEEE transactions for pattern analysis and machine intelligence
(2018)
Trunk branch ensemble convolutional neural networks for video-based facial recognition
IEEE transactions for pattern analysis and machine intelligence
(2017)
Cited by (0)
Featured Articles (6)
research article
Bayesian network classifiers based on Gaussian kernel density
Expert Systems with Applications, Volume 51, 2016, pp. 207-217
To learn a Bayesian network classifier, continuous attributes usually need to be discretized. But discretizing continuous attributes can lead to missing information, noise, and less sensitivity to the change of attributes versus class variables. In this paper, we use the Gaussian kernel function with smoothing parameters to estimate the density of attributes. The Bayesian network classifier with continuous attributes is built by dependency extension of naive Bayes classifiers. We also analyze the information provided to a class for each attribute as a basis for dependency extension of naive Bayes classifiers. Experimental studies on UCI datasets show that Bayesian network classifiers using the Gaussian kernel function provide good classification accuracy when dealing with continuous attributes compared to other approaches.
research article
Robust detectors for rotationally symmetric shapes based on novel half-shape signatures
Pattern Recognition, Volume 138, 2023, Item 109336
Efficient detectors of rotationally symmetric shapes are proposed by introducing a novel concept of half-shape signatures to overcome the main problem of projection-based approaches to study the rotationally symmetric properties of any binary shape. In fact, the fact is that the projection cues in these conventional approaches are periodic with a period ofhas limited an applicable use of rotational symmetry detection. To this end, we propose a new concept of the profile of half-shapes as a shape signature, together with a simple but efficient technique such that the rotational symmetry of the binary shape can be determined by considering the correlation between this signature and its circular displacement. In addition, a new meaningful measure is also introduced, ranging from 0 to 1 to indicate how perfect the rotational symmetry would be. Experimental results to detect single/composite shapes have clearly confirmed the competence of our proposal.
research article
Tight lower bounds for dynamic time warping
Pattern Recognition, Volume 115, 2021, Item 107895
Dynamic time warp () is a popular similarity measure for matching and comparing time series. Because ofDue to the high computational time of , lower bounds are often used to check for bad matches. Many alternative lower bounds have been proposed, offering a range of different tradeoffs between tightness and computational efficiency.offers a useful compromise in many applications. Two current lower bounds,Andare much narrower than. All three have the same worst case computational complexity - linear with respect to row length and constant with respect to window size. We introduce four new onesLower bounds in the same complexity class.is much tighter thanwith only modest additional computational effort.is more efficient thanwhile often providing a closer bond.is always tighter than. The parameter freeis usually tighter than. A parameterized variant,LB_Webb_Enhanced, is always narrower than. Another variantis useful for some restricted distance functions. In extensive experimentsproves to be very effective for finding nearest neighbors.
research article
Double contrastive representation learning for federated image recognition
Pattern Recognition, Volume 139, 2023, Item 109507
This work focuses on the problematic of personalized federated learning (FL) with contrastive learning (CL) scheme, which aims to implement a collaborative pattern classification by many clients. The traditional FL frameworks mostly allow the global model for the server and the local models for the clients to be similar, often ignoring the data heterogeneity of the clients. Aiming at achieving better performance in clients, this study introduces a personalized federated contrastive learning model called PerFCL by proposing a new approach to double contrastive representation learning (DCL). Specifically, PerFCL borrows the DCL scheme where one CL loss compares the shared parts of local models to the global model and the other CL loss compares the personalized parts of local models to the global model. To promote the difference between the two parts, we created a double optimization problem consisting of maximizing comparison agreement for the former and minimizing comparison agreement for the latter. We evaluated the proposed model against three publicly available datasets for federated image classification. The trial results show that PerFCL benefits from the proposed DCL strategy and outperforms the state-of-the-art federated learning models.
research article
A probabilistic subject model that uses deep visual word representation for simultaneous image classification and annotation
Journal of Visual Communication and Imaging, Volume 59, 2019, pp. 195-203
(Video) CVPR 2019 Oral Session 3-2B: Face & BodyResearch has shown that examining an image holistically provides a better understanding of the image than separate processes, each dedicated to a single task such as annotation, classification, or segmentation. During the last decades there have been several efforts for simultaneous image classification and annotation using probabilistic or neural network based topic models. Despite their relative success, most of these models suffer from poor visual word representation and imbalance between the number of visual and annotation words in the training data. This paper proposes a novel simultaneous image classification and annotation model based on SupDocNADE, a neural network-based theme model for image classification and annotation. The proposed model, named wSupDocNADE, addresses the above shortcomings by using a new encoding and introducing a weighting mechanism for the SupDocNADE model. In the coding step of the model, several patches extracted from the input image are first fed into a deep convolutional neural network and the feature vectors obtained from this network are coded using LLC coding. These vectors are then aggregated into a final descriptor by sum pooling. To overcome the imbalance between the figurative and annotation words, a weighting factor is considered for each figurative or annotation word. The weights of the visual words are determined based on their frequencies obtained from the pooling method, and the weights of the annotation words are learned from the training data. The experimental results on three benchmark datasets demonstrate the superiority of the proposed model over state-of-the-art models in both image classification and annotation tasks.
research article
Using fine tuned conditional probabilities for data transformation of nominal attributes
Pattern Recognition Letters, Band 128, 2019, S. 107-114
Most existing machine learning algorithms do not natively support nominal attributes, so it is important to develop the data transformation from nominal attributes to high quality numeric attributes. The conditional probability transformation (CPT), which uses conditional probability terms to transform categories into nominal attributes, is competitive with modern transformation methods such as One-Hot Encoding (OHE) and Separability Split Value Transformation (SSVT). However, conditional probability terms can be difficult to estimate accurately when the training data is insufficient or when there are strong dependencies between their attributes. Inspired by the fine-tuning method for improving conditional probability terms in distance measures, we proposed a fine-tuned conditional probability transformation (FTCPT). Furthermore, we proposed an improved SSV (ISSV) based on fine-tuned conditional probability terms and used our modified MIC-based feature selection method to further improve the performance of FTCPT. Experimental results show that the proposed methods can improve the quality of the data transformation and thereby help improve the classification performance of the subsequent machine learning algorithm.
Jifan Yangreceived his B.S. Graduated from the School of Electrical and Control Engineering, Heilongjiang University of Science and Technology, China in 2018 and his M.S. Graduated from the School of Computer Science and Engineering, Guangxi Normal University, China in 2021. He is currently pursuing his PhD at the National Multimedia Software Engineering Research Center, School of Computer Science, Wuhan University, Wuhan, China. His research interests include image processing, quality assessment, facial recognition, and biometrics.
Zhongyuan Wangreceived the Ph.D. Graduated in Communication and Information Systems from Wuhan University, Wuhan, China in 2008. Dr. Wang is currently a professor at the School of Computer Science, Wuhan University, Wuhan, China. He has led four projects funded by the National Natural Science Foundation of China. He is the author or co-author of over 80 peer-reviewed journal and conference papers and has held over 30 invention patents. His research interests include biometrics and identification, computer vision and pattern recognition, image processing, etc.
Baojin Huangreceived his BS degree in School of Computer Science from Wuhan University of Technology, Wuhan, China in 2019. He is currently pursuing his Ph.D. Graduated from the National Engineering Research Center for Multimedia Software, School of Computer, Wuhan University, Wuhan, China. His research interests include people search and facial recognition.
Jinsheng Xiaoreceived his PhD in Computational Mathematics from the School of Mathematics, Wuhan University, Wuhan, China in 2001. From 2001 to 2004 he was a research associate at the Institution of Multimedia Network Communication at Wuhan University. From 2004 to 2008 he was a lecturer in communication technology at the School of Electronic Information at Wuhan University. Since 2008 he has been an associate professor of information and communication technology. From August 2014 to August 2015 he was a Visiting Scholar at the Department of Computer Science at UCSB (University of California, Santa Barbara), USA.
Chao Liangreceived the Ph.D. Graduated from National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China in 2012. He is currently working as an Associate Professor at National Engineering Research Center for Multimedia Software, Computer School, Wuhan University, Wuhan, China. His research interests focus on multimedia content analysis and retrieval, and computer vision and pattern recognition, where he has published over 60 articles, including leading conferences such as CVPR, ACM MM, AAAI, IJCAI, and respected journals such as IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, IEEE TRANSACTIONS ON MULTIMEDIA AND IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY. dr Liang won the Best Paper Award at PCM 2014.
Zhen Hanreceived the B.S. degree in Computer Science and Technology and Ph.D. Graduated in Computer Application Engineering from Wuhan University, Wuhan, China in 2002 and 2009 respectively. Now he is Associate Professor at Wuhan University Computer School. His research interests include image/video compression and processing, computer vision, and artificial intelligence.
Hua Zoureceived the B.Sc. and MSc Degrees from Xi'an University of Architecture and Technology in 2001 and 2005 respectively and Ph.D. In 2009, he graduated from Xidian University's School of Electronic Engineering. From 2015 to 2016 he attended the University of Pittsburgh. He is currently Associate Professor at the School of Computers, Wuhan University, China. His main interests are image and video processing and pattern recognition.
© 2023 Elsevier Ltd. All rights reserved.