Neuroinformatik
Band 269,
December 20, 2017
, pages 108-116
Author links open the overlay panel, , ,
https://doi.org/10.1016/j.neucom.2016.08.143Get rights and content
Abstract
Video Quality Rating (VQA) aims toperceptual qualityto increase the performance of practical application systems. However, the traditional methods regard the video as a sequence of two-dimensional images, which is inconsistent with the fact that a video signal is three-dimensional volume data. This operation ignores the temporal information and results in poor agreement with human perception. Therefore, the paper introduces a novel VQA model by exploring and exploiting the compact representation of energy in three-dimensional spaceDiscrete Cosine Transform(3D DCT) domain. First, the video is transformed by 3D-DCT for each group of frames (GOF). Then, from the 3D DCT coefficients, three types of statistical features are derived for representationenergy compressionProperties to simulate the process ofhuman visual system(HVS). The generalized Gaussian distribution (GGD) parameters are estimated to mimic the marginal distribution of the 3D DCT coefficients. Three energy ratios are calculated to show how the video energy is distributed over different frequency components. And the mean and variance value of absolute 3D DCT coefficients are used to measure the frequency variation of the video. Finally, the differences between the reference video feature and the distorted video feature are calculated to predict the quality rating of the distorted video. Experimental results show that the proposed VQA method has a good agreement with human perception and is competitive with the state-of-the-art methods.
introduction
Videos have grown explosively due to the development of digital devices and the internet. However, videos generally suffer from various types of distortions during the process of capture, compression, transmission, and playback that bring degradation to human perception. Therefore, it is of great importance to predict the perceptual quality of multimedia content [1], [2] in order to monitor and improve the performance of practical application systems. Therefore, there is a growing interest in developing methods for objective image quality assessment (IQA) [23], [24], [25], [26] and video quality assessment (VQA) [1], [2], [3], [ 4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16] , [17], [18], [19], [20], [21], [22].
Traditionally, the video quality metric (VQM) extracts seven spatio-temporal features and calculates the differences between the reference and distorted video features to assess the quality of videos [3]. Masry et al. Use wavelet transform and separable filters to predict video quality [4]. And Gunawan and Mohammed extract local harmonic strength to capture the features that are compared to assess the quality of compressed videos [5]. Recently, another type of VQA approach based on extensive statistical modeling for both intra- and inter-frames has been developed. Zeng and Wang propose a metric based on the temporal smoothness of motion in the complex wavelet transform domain [6] and a method based on the computation of both the intra- and interframe features via the statistical models of natural videos [ 7]. Ma et al. Use the energy variation in the Discrete Cosine Transform (DCT) domain to capture the information loss in each individual frame [8]. And they use the interframe natural scene statistics to describe the temporal properties to design VQA metrics [9]. Seshadrinathan and Bovik use a Gaussian Mixture Model (GSM) to describe the characteristics of each frame in the wavelet domain, and then calculate the spatial and temporal entropy differences between the reference and the distorted video to predict the perceptual quality of the distorted video [10]. . The visual properties [23], [24] and the sparse feature [25], [26] are also used to assess the quality score. Recently, the VQA prediction model was developed based on machine learning methods, such as B. Support vector regression [27], [28], [29], [30], neural network and the probabilistic model.
However, most of the proposed methods consider the video as a sequence of two-dimensional (2D) images [8], [9], [12]. This is in contradiction to the fact that a video signal is three-dimensional (3D) volume data, with the time axis forming the third dimension. Following the inspiration, the paper uses the three-dimensional discrete cosine transform (3D-DCT) to use the spatial and temporal information as a whole to create a statistical and perceptual representation of videos. The motivation for using 3D-DCT is that the sparse higher frequency coefficients of 3D-DCT are sensitive to the distortion caused by video compression and noise contamination. In addition, 3D-DCT can capture both the spatial and temporal information in a video simultaneously.
Therefore, we present a novel VQA model by investigating and exploiting the statistical properties in the 3D DCT domain. First, the video is transformed by 3D-DCT for each group of frames (GOF). Three types of statistical features are then derived from the 3D DCT coefficients to represent energy compaction properties for simulating the human visual system (HVS) process. The parameters of the generalized Gaussian distribution (GGD) are estimated to capture the distribution features of the marginal distribution of the 3D DCT coefficients. Three energy ratios are calculated to show how the video energy is distributed over different frequency components. And the mean and variance value of absolute 3D DCT coefficients are used to measure the frequency variation of the video. Finally, the differences between the features of the reference video and the feature of the warped video are calculated to predict the quality rating of the warped video. Experimental results show that the proposed metric has a good agreement with human perception and is competitive with the state-of-the-art methods.
The main contributions of our approach are as follows
- •
First, we exploit the energy compression properties in the 3D DCT domain and then design a novel feature difference model to predict the perceptual quality of videos. The proposed method shows promising performance.
- •
Second, we use the GGD to model the marginal distribution of 3D DCT coefficients to capture the long tails of the distribution, which is not accurately described by the use of the Laplacian in the existing literature.
- •
Finally, we rearrange the 3D DCT coefficients according to spatiotemporal frequency, and then design three energy ratio descriptors to represent how the video energy is distributed across different frequency components.
The rest of the work is structured as follows. Section 2 briefly introduces 3D DCT. Section 3 shows the detailed statistical features derived from the 3D DCT coefficients and presents the framework of the proposed method for video quality assessment. Experimental results and analyzes are presented in Section 4. Finally, section 5 concludes the work.
section cutouts
Three-dimensional discrete cosine transform
Most VQA schemes perform temporal feature extraction followed by 2D transformation. However, a video sequence is generally visualized as a data set and can be understood as a 3D signal rather than a 2D signal. 3D-DCT has been proposed as an alternative to motion-compensated transform coding for video content. Due to the decorrelation and efficient energy compression properties, 3D-DCT has been widely used in many applications such as compression, denoising, and visual tracking [31].
In this
VQA metric based on energy compression statistics
The study of the statistics of natural visual signals is a discipline in the field of perception. Natural scenes are defined as images or videos captured by high quality devices operating in the visual spectrum [40]. It has been shown that static natural scenes have very reliable statistical regularities and that the study of natural image statistics is highly relevant for understanding visual perception [41]. In addition, when the biases are present, the statistic is adjusted
Test results and analysis
To evaluate performance, we test the proposed method and other state-of-the-art VQA metrics on the LIVE video database [1]. It consists of 10 reference videos with spatial resolution of 768×432 and 150 distorted videos covering four distortion categories including MPEG-2 compression, H.264 compression, wireless distortion and IP distortion. Six videos contain 250 frames at 25f/s, one video contains 217 frames at 25f/s and three videos contain 500 frames at 50f/s. additionally
Conclusions
In this paper, we propose a novel VQA model based on the spatio-temporal statistical properties of the 3D DCT coefficients. Experimental results in the LIVE video database show that the proposed metric outperforms state-of-the-art VQAs and even full reference VQAs. The data rate required to transmit the features of the proposed method is much lower than that of the existing VQA methods. The proposed method therefore achieves a good balance between performance and data rate through detection
thanks
This research was supported in part by theNational Science Foundation of China(Nr. 61501349,61372130,61432014, And61571343), the National Key Research and Development Program of China (No. 2016QY01W0204), the Fundamental Research Funds for the Central Universities (No. BDY081426, JB140214, JBX170218 and XJS14042), the Program for New Scientific and Technological Star of Shaanxi Province (No . 2014KJXX-47), the project funded byChina Postdoctoral Science Foundation(NO.
Lihuo Eris currently Associate Professor at Xidian University. He received a B.Sc. degree in Electrical and Information Engineering and a Ph.D. Graduated in Pattern Recognition and Intelligent Systems from Xidian University, Xi'an, China in 2008 and 2013. His research interests focus on Image/Video Quality Assessment, Cognitive Computing and Computational Vision.
- LeeMCet al.
Quantization of 3D DCT coefficients and scan order for video compression
J.Vis. commune represent image.
(1997)
- GuB.et al.
Incremental Learning for υ - Support Vector Regression
neural network
(2015)
- YuanY.et al.
Image quality assessment a sparse learning curve
Neuroinformatik
(2015)
- ZhangC.et al.
No reference image quality assessment using a sparse feature representation in two-dimensional spatial correlation
Neuroinformatik
(2016)
- GaoF.et al.
Biologically inspired image quality rating
signaling process.
(2016)
- K.Seshadrinathanet al.
Investigation of the subjective and objective quality assessment of videos
IEEE-Trans. Bildprozess.
(2010)
- WangZ.et al.
Image quality assessment with reduced and without reference
IEEE signaling process. Like.
(2011)
- M.H.Pinsonet al.
A new standardized way to objectively measure video quality
IEEE Trans. Transfer.
(2004)
- M.Masryet al.
A scalable wavelet-based video distortion metric and applications
IEEE Trans. circuits system video tech
(2006)
- IPGunawanet al.
Reduced reference video quality assessment using discriminating local harmonic strength considering motion
IEEE Trans. circuits system video tech
(2008)
Video quality assessment of compressed video sequences with a reduced reference
IEEE Trans. circuits system video tech
(2012)
Motion-matched spatio-temporal quality assessment of natural videos
IEEE-Trans. Bildprozess.
(2010)
Full video quality assessment by decoupling loss of detail and additive impairments
IEEE Trans. circuits system video tech
(2012)
Video quality assessment by reduced spatio-temporal entropic differentiation of references
IEEE Trans. circuits system video tech
(2013)
Additive log-logistic model for networked video quality assessment
IEEE-Trans. Bildprozess.
(2013)
No reference quality rating of H.264/AVC encoded video
IEEE Trans. circuits system video tech
(2010)
A metric used to evaluate video quality based on the human visual system
Understanding. Calculation.
(2010)
A recurring video quality enhancement framework with multi-granularity frame fusion and frame difference-based attention
2021, Neurocomputing
In recent years, deep learning has attracted significant research attention for video recovery. Among the existing contributions, the single-frame based approaches rely purely on a frame of reference and neglect the remaining neighboring frames when improving a target frame. In contrast, the multi-frame based contributions use temporal information in a sliding window, and the existing recurring design uses only a single preceding extended frame. It is intuitive to use both multiple original adjacent frames and the preceding enhanced frames to improve video quality. In this article, we propose a Recurrent Video Quality Enhancement Framework using Multi-Granularity Frame Fusion and Frame Difference Based Attention (REMD). First, we develop a three-dimensional convolutional neural network-based encoder-decoder fusion model that fuses multiple frames in multigranularity. Second, severe compression artifacts tend to show up around the edges and textures of the compressed frames. We propose a frame-difference-based spatial attention method to intensify the edges and textures of moving regions. Finally, a repetitive sliding window design is conceived to exploit the temporal information in preceding enhanced frames and subsequent neighboring frames. Experiments show that our method achieves superior performance compared to the prior art contributions with significantly reduced spatial and computational complexity.
Bearing error detection with wavelet transform and generalized Gaussian density modelling
2020, Measurement: Journal of the International Measurement Confederation
Because the marginal distribution of the wavelet transform output is a strong-tailed bell-shaped function characterized by a larger fraction of small wavelet coefficients or even zeros, the traditional feature extraction approaches such as wavelet energy spectrum and energy spectrum entropy are used in bearing error detection applications cannot accurately express the statistical characteristic of wavelet subbands. In this paper, we propose a novel wavelet-based bearing error detection approach using wavelet transform and Generalized Gaussian Density (GGD) modelling. A GGD-based feature descriptor is generated from the concatenation of the statistical parameters of each wavelet subband estimated by the maximum likelihood method. According to the descriptor information, a class label for bearing error detection is assigned by the subsequent classifier. The extensive experimental results show that the proposed approach can capture wavelet subband information of bearing vibration signals more accurately and flexibly than energy, energy entropy, Gaussian, and Laplacian-based feature description methods. Furthermore, the experimental results with different wavelet filters, decomposition levels and classifiers show that the new method significantly improves the error detection accuracy compared to conventional approaches with better robustness.
An overview of manipulating digital video: from simple editing to full synthesis
2019, Digital investigation
Video manipulation methods have made significant advances in recent years. This is partly due to the rapid development of advanced deep learning methods and also due to the large amount of video material that is now in the public domain. In the past, persuasive video manipulation was too labor intensive to achieve at scale. However, recent developments in methods based on deep learning have made it possible not only to produce convincing fake videos, but also to fully synthesize video content. Such advances offer new opportunities to enhance visual content itself, but at the same time pose new challenges for state-of-the-art methods for detecting tampering. Video tampering detection has been an active area of research for some time, with regular reviews of the topic. However, little attention has been paid to video manipulation techniques themselves. This paper provides an objective and thorough examination of current techniques related to digital video manipulation. We thoroughly examine their evolution and show how current scoring techniques offer opportunities for the advancement of video tampering detection. A critical and comprehensive review of photorealistic video synthesis is provided with an emphasis on deep learning-based methods. Existing manipulated video datasets are also qualitatively checked and critically discussed. Finally, conclusions are drawn from a comprehensive and thorough review of manipulation methods with discussions of future research directions aimed at improving detection methods.
(Video) The Only Shopify Dropshipping Course You Will Ever Need (FREE FOR EVERYONE)Quality assessment of blurred frames without reference
2018, Proceded Informatik
Visual quality assessment (VQA) methods applicable to assess noise, blur, and MPEG distortion can be viewed as an extension of picture quality assessment (IQA) methods. At the same time, the VQA is a special issue for studying in the sense of an available set of final frameworks. Given our interest in image blur assessment, we focus the discussion on the non-reference methods related to the main types of blur occurring in video sequences such as: B. Motion Blur, Defocus Blur, Shake Blur, and Atmospheric Blur. First, we detect the blurred areas in the frame using a simple imposed blur method. Second, a degree of quality of the blur is evaluated according to the defined type. The obtained estimators facilitate the initial level for blurring algorithms to decide whether to restore or remove the blurred frames from the following processing. The Sports Videos in the Wild (SVW) dataset provides the rich test material for experiments. The resulting estimators agree well with the human visual system (HVS) perceptual assessment.
The critical review of methods for assessing image and video quality
2022, Magazine for image and graphics
Quantitative analysis of the effect of the Damp Heat test on the optical imaging system under different scene usage conditions
2022, Proceedings of SPIE - The International Society for Optical Engineering
research article
Image categorization with non-negative kernel sparse representation
Neurocomputing, Band 269, 2017, S. 21-28
The sparse representation of signals has become an important tool in computer vision. Sparse representations have achieved remarkable feats in many computer vision applications such as image denoising, image superresolution, and object detection. Sparse representation models often contain two phases: sparse coding and dictionary learning. In this article, we propose a nonlinear, nonnegative, sparse representation model: NNK-KSVD. In the sparse coding stage, a non-linear update rule is proposed to preserve the sparse matrix. In the dictionary learning phase, the proposed model extends the kernel KSVD by embedding the non-negative sparse encoding. The proposed non-negative kernel-sparse representation model was evaluated on several publicly available image datasets for the classification task. Experimental results show that by exploiting the nonlinear structure in images and exploiting the "additive" nature of non-negative sparse coding, promising classification performance can be achieved. In addition, the proposed sparse representation method was also evaluated in image research tasks, yielding competitive results.
research article
Neurocomputing, Band 269, 2017, S. 212-219
Recently, the forum for questions and answers (Q&A) for software development (e.g. Stack Overflow) is becoming popular. Identifying the best answer to a question asked is important for the Q&A forum because the best answer that provides an excellent solution to the question asked can guide developers in solving their problems in practice. However, the best answers are often not explicitly marked by the question owners. For other developers with the same question, it would be time consuming to review all candidate answers to find the appropriate one. In this whitepaper, we propose a novel approach to predict the best answers to the questions asked on Stack Overflow using heterogeneous data sources in the forum. We extract different group functions from multiple data sources and combine them for the final prediction through multi-view learning. Experimental results show that the proposed method is effective for identifying the best answers to questions asked on Stack Overflow.
research article
Tunable discounting and visual exploration for language models
Neurocomputing, Band 269, 2017, S. 73-81
(Video) Increasing Quality of Active 3D Imaging, invited talk by Srinivasa Narasimhan, CMUA language model is fundamental to many applications in natural language processing. Most language models are trained on a large set of datasets and are difficult to adapt to other areas where only a small dataset may be available. Optimizing discounting parameters for smoothing is one way to adapt language models for a new domain. In this work, we present novel language models based on tunable discounting mechanisms. The language models are trained on a large dataset, but their discounting parameters can then be tuned to a target dataset. We study tunable discounting and polynomial discounting functions based on the modified Kneser-Ney (mKN) models. In particular, we propose the tunable mKN (TmKN) model, the polynomial discounting mKN (PmKN) model, and the tunable and polynomial discounting mKN (TPmKN) model. We test our proposed models and compare them to the mKN model, the improved KN model, and the tunable mKN with the interpolation model (mKN + interp). With the implementation, our language models achieve perplexity improvements in both in-domain and out-of-domain evaluation. Experimental results show that our new models significantly outperform the base model and our models are particularly adept at adapting to new domains. In addition, we use the visualization technique to depict the relationship between parameter settings and the language model performances to guide our parameter optimization process. Exploratory visual analysis is then used to examine the performance of the proposed language models, revealing the strength and characteristics of the models.
research article
Special issue for the International Conference on Intelligence Science and Big Data Engineering 2016
Neurocomputing, Band 269, 2017, S. 117-119
research article
Specimen-based 3D human pose estimation with sparse spectral embedding
Neurocomputing, Band 269, 2017, S. 82-89
In instance-based approaches, human pose estimation is achieved by retrieving relevant poses with images. Therefore, image description is crucial and it is common to extract multiple features to better describe the visual input data. To merge multiple features, traditional methods simply concatenate multi-view features into one long vector. This oversimplified process suffers from two shortcomings: (1) it usually results in long feature vectors that suffer from the "curse of dimensionality"; (2) it does not make physical sense and may not be able to fully exploit the complementary properties of multi-view features. To address such issues in this article, we present a dimensionality reduction method based on sparse spectral embedding followed by nearest neighbor regression ensemble in low-rank multi-view feature space to derive 3D human poses from monocular videos. The experiments on the HumanEva data set show the effectiveness of the proposed method.
research article
Special edition on selected and extended contributions of the International Conference on Intelligence Science and Big Data Engineering 2015 (IScIDE 2015)
Neurocomputing, Band 269, 2017, S. 1-2
Lihuo Eris currently Associate Professor at Xidian University. He received a B.Sc. degree in Electrical and Information Engineering and a Ph.D. Graduated in Pattern Recognition and Intelligent Systems from Xidian University, Xi'an, China in 2008 and 2013. His research interests focus on Image/Video Quality Assessment, Cognitive Computing and Computational Vision.
Wen Lureceived the B.Sc., M.Sc. and Ph.D. Degrees in signal and information processing from Xidian University, China, in 2002, 2006 and 2009, respectively. From 2010 to 2012, he was a postdoctoral fellow in the Department of Electrical Engineering at Stanford University, USA. He is currently an associate professor at Xidian University. His research interests include image and video quality metrics, human visual system, computational vision. He has published 2 books and around 30 technical articles in peer-reviewed journals and proceedings including IEEE TIP, TSMC, Neurocomputing, Signal Processing, etc.
Changcheng Jiareceived the B.Sc. Degrees in Electrical and Information Engineering from Xidian University, China in 2014. He is currently a PhD student at Xidian University. His research interests are image quality assessment.
And Yoursreceived the B.Sc. and MSc in Electrical and Information Engineering from Xidian University, China in 2012 and 2015. His research interests include image and video quality metrics and visual quality assessment.
© 2017 Elsevier B.V. All rights reserved.