Evaluation of video quality through compact energy representation in the 3D DCT domain (2023)

Table of Contents
Neuroinformatik Abstract introduction section cutouts Three-dimensional discrete cosine transform VQA metric based on energy compression statistics Test results and analysis Conclusions thanks Quantization of 3D DCT coefficients and scan order for video compression J.Vis. commune represent image. Incremental Learning for υ - Support Vector Regression neural network Image quality assessment a sparse learning curve Neuroinformatik No reference image quality assessment using a sparse feature representation in two-dimensional spatial correlation Neuroinformatik Biologically inspired image quality rating signaling process. Investigation of the subjective and objective quality assessment of videos IEEE-Trans. Bildprozess. Image quality assessment with reduced and without reference IEEE signaling process. Like. A new standardized way to objectively measure video quality IEEE Trans. Transfer. A scalable wavelet-based video distortion metric and applications IEEE Trans. circuits system video tech Reduced reference video quality assessment using discriminating local harmonic strength considering motion IEEE Trans. circuits system video tech Video quality assessment of compressed video sequences with a reduced reference IEEE Trans. circuits system video tech Motion-matched spatio-temporal quality assessment of natural videos IEEE-Trans. Bildprozess. Full video quality assessment by decoupling loss of detail and additive impairments IEEE Trans. circuits system video tech Video quality assessment by reduced spatio-temporal entropic differentiation of references IEEE Trans. circuits system video tech Additive log-logistic model for networked video quality assessment IEEE-Trans. Bildprozess. No reference quality rating of H.264/AVC encoded video IEEE Trans. circuits system video tech A metric used to evaluate video quality based on the human visual system Understanding. Calculation. A recurring video quality enhancement framework with multi-granularity frame fusion and frame difference-based attention Bearing error detection with wavelet transform and generalized Gaussian density modelling An overview of manipulating digital video: from simple editing to full synthesis Quality assessment of blurred frames without reference The critical review of methods for assessing image and video quality Quantitative analysis of the effect of the Damp Heat test on the optical imaging system under different scene usage conditions Image categorization with non-negative kernel sparse representation Tunable discounting and visual exploration for language models Special issue for the International Conference on Intelligence Science and Big Data Engineering 2016 Specimen-based 3D human pose estimation with sparse spectral embedding Special edition on selected and extended contributions of the International Conference on Intelligence Science and Big Data Engineering 2015 (IScIDE 2015) Videos

Neuroinformatik

Band 269,

December 20, 2017

, pages 108-116

Author links open the overlay panel, , ,

https://doi.org/10.1016/j.neucom.2016.08.143Get rights and content

Abstract

Video Quality Rating (VQA) aims toperceptual qualityto increase the performance of practical application systems. However, the traditional methods regard the video as a sequence of two-dimensional images, which is inconsistent with the fact that a video signal is three-dimensional volume data. This operation ignores the temporal information and results in poor agreement with human perception. Therefore, the paper introduces a novel VQA model by exploring and exploiting the compact representation of energy in three-dimensional spaceDiscrete Cosine Transform(3D DCT) domain. First, the video is transformed by 3D-DCT for each group of frames (GOF). Then, from the 3D DCT coefficients, three types of statistical features are derived for representationenergy compressionProperties to simulate the process ofhuman visual system(HVS). The generalized Gaussian distribution (GGD) parameters are estimated to mimic the marginal distribution of the 3D DCT coefficients. Three energy ratios are calculated to show how the video energy is distributed over different frequency components. And the mean and variance value of absolute 3D DCT coefficients are used to measure the frequency variation of the video. Finally, the differences between the reference video feature and the distorted video feature are calculated to predict the quality rating of the distorted video. Experimental results show that the proposed VQA method has a good agreement with human perception and is competitive with the state-of-the-art methods.

introduction

Videos have grown explosively due to the development of digital devices and the internet. However, videos generally suffer from various types of distortions during the process of capture, compression, transmission, and playback that bring degradation to human perception. Therefore, it is of great importance to predict the perceptual quality of multimedia content [1], [2] in order to monitor and improve the performance of practical application systems. Therefore, there is a growing interest in developing methods for objective image quality assessment (IQA) [23], [24], [25], [26] and video quality assessment (VQA) [1], [2], [3], [ 4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16] , [17], [18], [19], [20], [21], [22].

Traditionally, the video quality metric (VQM) extracts seven spatio-temporal features and calculates the differences between the reference and distorted video features to assess the quality of videos [3]. Masry et al. Use wavelet transform and separable filters to predict video quality [4]. And Gunawan and Mohammed extract local harmonic strength to capture the features that are compared to assess the quality of compressed videos [5]. Recently, another type of VQA approach based on extensive statistical modeling for both intra- and inter-frames has been developed. Zeng and Wang propose a metric based on the temporal smoothness of motion in the complex wavelet transform domain [6] and a method based on the computation of both the intra- and interframe features via the statistical models of natural videos [ 7]. Ma et al. Use the energy variation in the Discrete Cosine Transform (DCT) domain to capture the information loss in each individual frame [8]. And they use the interframe natural scene statistics to describe the temporal properties to design VQA metrics [9]. Seshadrinathan and Bovik use a Gaussian Mixture Model (GSM) to describe the characteristics of each frame in the wavelet domain, and then calculate the spatial and temporal entropy differences between the reference and the distorted video to predict the perceptual quality of the distorted video [10]. . The visual properties [23], [24] and the sparse feature [25], [26] are also used to assess the quality score. Recently, the VQA prediction model was developed based on machine learning methods, such as B. Support vector regression [27], [28], [29], [30], neural network and the probabilistic model.

However, most of the proposed methods consider the video as a sequence of two-dimensional (2D) images [8], [9], [12]. This is in contradiction to the fact that a video signal is three-dimensional (3D) volume data, with the time axis forming the third dimension. Following the inspiration, the paper uses the three-dimensional discrete cosine transform (3D-DCT) to use the spatial and temporal information as a whole to create a statistical and perceptual representation of videos. The motivation for using 3D-DCT is that the sparse higher frequency coefficients of 3D-DCT are sensitive to the distortion caused by video compression and noise contamination. In addition, 3D-DCT can capture both the spatial and temporal information in a video simultaneously.

Therefore, we present a novel VQA model by investigating and exploiting the statistical properties in the 3D DCT domain. First, the video is transformed by 3D-DCT for each group of frames (GOF). Three types of statistical features are then derived from the 3D DCT coefficients to represent energy compaction properties for simulating the human visual system (HVS) process. The parameters of the generalized Gaussian distribution (GGD) are estimated to capture the distribution features of the marginal distribution of the 3D DCT coefficients. Three energy ratios are calculated to show how the video energy is distributed over different frequency components. And the mean and variance value of absolute 3D DCT coefficients are used to measure the frequency variation of the video. Finally, the differences between the features of the reference video and the feature of the warped video are calculated to predict the quality rating of the warped video. Experimental results show that the proposed metric has a good agreement with human perception and is competitive with the state-of-the-art methods.

The main contributions of our approach are as follows

First, we exploit the energy compression properties in the 3D DCT domain and then design a novel feature difference model to predict the perceptual quality of videos. The proposed method shows promising performance.

Second, we use the GGD to model the marginal distribution of 3D DCT coefficients to capture the long tails of the distribution, which is not accurately described by the use of the Laplacian in the existing literature.

Finally, we rearrange the 3D DCT coefficients according to spatiotemporal frequency, and then design three energy ratio descriptors to represent how the video energy is distributed across different frequency components.

The rest of the work is structured as follows. Section 2 briefly introduces 3D DCT. Section 3 shows the detailed statistical features derived from the 3D DCT coefficients and presents the framework of the proposed method for video quality assessment. Experimental results and analyzes are presented in Section 4. Finally, section 5 concludes the work.

section cutouts

Three-dimensional discrete cosine transform

Most VQA schemes perform temporal feature extraction followed by 2D transformation. However, a video sequence is generally visualized as a data set and can be understood as a 3D signal rather than a 2D signal. 3D-DCT has been proposed as an alternative to motion-compensated transform coding for video content. Due to the decorrelation and efficient energy compression properties, 3D-DCT has been widely used in many applications such as compression, denoising, and visual tracking [31].

In this

VQA metric based on energy compression statistics

The study of the statistics of natural visual signals is a discipline in the field of perception. Natural scenes are defined as images or videos captured by high quality devices operating in the visual spectrum [40]. It has been shown that static natural scenes have very reliable statistical regularities and that the study of natural image statistics is highly relevant for understanding visual perception [41]. In addition, when the biases are present, the statistic is adjusted

Test results and analysis

To evaluate performance, we test the proposed method and other state-of-the-art VQA metrics on the LIVE video database [1]. It consists of 10 reference videos with spatial resolution of 768×432 and 150 distorted videos covering four distortion categories including MPEG-2 compression, H.264 compression, wireless distortion and IP distortion. Six videos contain 250 frames at 25f/s, one video contains 217 frames at 25f/s and three videos contain 500 frames at 50f/s. additionally

Conclusions

In this paper, we propose a novel VQA model based on the spatio-temporal statistical properties of the 3D DCT coefficients. Experimental results in the LIVE video database show that the proposed metric outperforms state-of-the-art VQAs and even full reference VQAs. The data rate required to transmit the features of the proposed method is much lower than that of the existing VQA methods. The proposed method therefore achieves a good balance between performance and data rate through detection

thanks

This research was supported in part by theNational Science Foundation of China(Nr. 61501349,61372130,61432014, And61571343), the National Key Research and Development Program of China (No. 2016QY01W0204), the Fundamental Research Funds for the Central Universities (No. BDY081426, JB140214, JBX170218 and XJS14042), the Program for New Scientific and Technological Star of Shaanxi Province (No . 2014KJXX-47), the project funded byChina Postdoctoral Science Foundation(NO.

Lihuo Eris currently Associate Professor at Xidian University. He received a B.Sc. degree in Electrical and Information Engineering and a Ph.D. Graduated in Pattern Recognition and Intelligent Systems from Xidian University, Xi'an, China in 2008 and 2013. His research interests focus on Image/Video Quality Assessment, Cognitive Computing and Computational Vision.

(Video) How To Make A Portfolio That Gets New Jobs FAST!
  • LeeMCet al.

    Quantization of 3D DCT coefficients and scan order for video compression

    J.Vis. commune represent image.

    (1997)

  • GuB.et al.

    Incremental Learning for υ - Support Vector Regression

    neural network

    (2015)

  • YuanY.et al.

    Image quality assessment a sparse learning curve

    Neuroinformatik

    (2015)

  • ZhangC.et al.

    No reference image quality assessment using a sparse feature representation in two-dimensional spatial correlation

    Neuroinformatik

    (2016)

  • GaoF.et al.

    Biologically inspired image quality rating

    signaling process.

    (2016)

  • K.Seshadrinathanet al.

    Investigation of the subjective and objective quality assessment of videos

    IEEE-Trans. Bildprozess.

    (2010)

  • WangZ.et al.

    Image quality assessment with reduced and without reference

    IEEE signaling process. Like.

    (2011)

  • M.H.Pinsonet al.

    A new standardized way to objectively measure video quality

    IEEE Trans. Transfer.

    (2004)

  • M.Masryet al.

    A scalable wavelet-based video distortion metric and applications

    IEEE Trans. circuits system video tech

    (2006)

  • IPGunawanet al.

    Reduced reference video quality assessment using discriminating local harmonic strength considering motion

    IEEE Trans. circuits system video tech

    (2008)

  • K. Zeng, Z. Wang, Measurement of Temporal Motion Smoothness to Assess Video Quality with Reduced Reference, Proceedings of ...
  • K. Zeng, Z. Wang, Quality-Conscious Video Based on Robust Embedding of Reduced Reference Features Within and Between Frames,...
  • L.Maet al.

    Video quality assessment of compressed video sequences with a reduced reference

    IEEE Trans. circuits system video tech

    (2012)

  • L. Ma, K.N. Ngan, L. Xu, Reduced reference video quality assessment based on HVS mutual spatial masking and temporal...
  • K.Seshadrinathanet al.

    Motion-matched spatio-temporal quality assessment of natural videos

    IEEE-Trans. Bildprozess.

    (2010)

  • LiS.et al.

    Full video quality assessment by decoupling loss of detail and additive impairments

    IEEE Trans. circuits system video tech

    (2012)

  • R.Soundarajanet al.

    Video quality assessment by reduced spatio-temporal entropic differentiation of references

    IEEE Trans. circuits system video tech

    (2013)

  • ZhangF.et al.

    Additive log-logistic model for networked video quality assessment

    IEEE-Trans. Bildprozess.

    (2013)

  • T.brandaoet al.

    No reference quality rating of H.264/AVC encoded video

    IEEE Trans. circuits system video tech

    (2010)

  • LuW.et al.

    A metric used to evaluate video quality based on the human visual system

    Understanding. Calculation.

    (2010)

  • (Video) The Bendy Controversy, Revisited | The Story of Joey Drew Studios
    • A recurring video quality enhancement framework with multi-granularity frame fusion and frame difference-based attention

      2021, Neurocomputing

      In recent years, deep learning has attracted significant research attention for video recovery. Among the existing contributions, the single-frame based approaches rely purely on a frame of reference and neglect the remaining neighboring frames when improving a target frame. In contrast, the multi-frame based contributions use temporal information in a sliding window, and the existing recurring design uses only a single preceding extended frame. It is intuitive to use both multiple original adjacent frames and the preceding enhanced frames to improve video quality. In this article, we propose a Recurrent Video Quality Enhancement Framework using Multi-Granularity Frame Fusion and Frame Difference Based Attention (REMD). First, we develop a three-dimensional convolutional neural network-based encoder-decoder fusion model that fuses multiple frames in multigranularity. Second, severe compression artifacts tend to show up around the edges and textures of the compressed frames. We propose a frame-difference-based spatial attention method to intensify the edges and textures of moving regions. Finally, a repetitive sliding window design is conceived to exploit the temporal information in preceding enhanced frames and subsequent neighboring frames. Experiments show that our method achieves superior performance compared to the prior art contributions with significantly reduced spatial and computational complexity.

    • Bearing error detection with wavelet transform and generalized Gaussian density modelling

      2020, Measurement: Journal of the International Measurement Confederation

      Because the marginal distribution of the wavelet transform output is a strong-tailed bell-shaped function characterized by a larger fraction of small wavelet coefficients or even zeros, the traditional feature extraction approaches such as wavelet energy spectrum and energy spectrum entropy are used in bearing error detection applications cannot accurately express the statistical characteristic of wavelet subbands. In this paper, we propose a novel wavelet-based bearing error detection approach using wavelet transform and Generalized Gaussian Density (GGD) modelling. A GGD-based feature descriptor is generated from the concatenation of the statistical parameters of each wavelet subband estimated by the maximum likelihood method. According to the descriptor information, a class label for bearing error detection is assigned by the subsequent classifier. The extensive experimental results show that the proposed approach can capture wavelet subband information of bearing vibration signals more accurately and flexibly than energy, energy entropy, Gaussian, and Laplacian-based feature description methods. Furthermore, the experimental results with different wavelet filters, decomposition levels and classifiers show that the new method significantly improves the error detection accuracy compared to conventional approaches with better robustness.

    • An overview of manipulating digital video: from simple editing to full synthesis

      2019, Digital investigation

      Video manipulation methods have made significant advances in recent years. This is partly due to the rapid development of advanced deep learning methods and also due to the large amount of video material that is now in the public domain. In the past, persuasive video manipulation was too labor intensive to achieve at scale. However, recent developments in methods based on deep learning have made it possible not only to produce convincing fake videos, but also to fully synthesize video content. Such advances offer new opportunities to enhance visual content itself, but at the same time pose new challenges for state-of-the-art methods for detecting tampering. Video tampering detection has been an active area of ​​research for some time, with regular reviews of the topic. However, little attention has been paid to video manipulation techniques themselves. This paper provides an objective and thorough examination of current techniques related to digital video manipulation. We thoroughly examine their evolution and show how current scoring techniques offer opportunities for the advancement of video tampering detection. A critical and comprehensive review of photorealistic video synthesis is provided with an emphasis on deep learning-based methods. Existing manipulated video datasets are also qualitatively checked and critically discussed. Finally, conclusions are drawn from a comprehensive and thorough review of manipulation methods with discussions of future research directions aimed at improving detection methods.

      (Video) The Only Shopify Dropshipping Course You Will Ever Need (FREE FOR EVERYONE)

    • Quality assessment of blurred frames without reference

      2018, Proceded Informatik

      Visual quality assessment (VQA) methods applicable to assess noise, blur, and MPEG distortion can be viewed as an extension of picture quality assessment (IQA) methods. At the same time, the VQA is a special issue for studying in the sense of an available set of final frameworks. Given our interest in image blur assessment, we focus the discussion on the non-reference methods related to the main types of blur occurring in video sequences such as: B. Motion Blur, Defocus Blur, Shake Blur, and Atmospheric Blur. First, we detect the blurred areas in the frame using a simple imposed blur method. Second, a degree of quality of the blur is evaluated according to the defined type. The obtained estimators facilitate the initial level for blurring algorithms to decide whether to restore or remove the blurred frames from the following processing. The Sports Videos in the Wild (SVW) dataset provides the rich test material for experiments. The resulting estimators agree well with the human visual system (HVS) perceptual assessment.

    • The critical review of methods for assessing image and video quality

      2022, Magazine for image and graphics

    • Quantitative analysis of the effect of the Damp Heat test on the optical imaging system under different scene usage conditions

      2022, Proceedings of SPIE - The International Society for Optical Engineering

    View all citing articles on Scopus
    • research article

      Image categorization with non-negative kernel sparse representation

      Neurocomputing, Band 269, 2017, S. 21-28

      The sparse representation of signals has become an important tool in computer vision. Sparse representations have achieved remarkable feats in many computer vision applications such as image denoising, image superresolution, and object detection. Sparse representation models often contain two phases: sparse coding and dictionary learning. In this article, we propose a nonlinear, nonnegative, sparse representation model: NNK-KSVD. In the sparse coding stage, a non-linear update rule is proposed to preserve the sparse matrix. In the dictionary learning phase, the proposed model extends the kernel KSVD by embedding the non-negative sparse encoding. The proposed non-negative kernel-sparse representation model was evaluated on several publicly available image datasets for the classification task. Experimental results show that by exploiting the nonlinear structure in images and exploiting the "additive" nature of non-negative sparse coding, promising classification performance can be achieved. In addition, the proposed sparse representation method was also evaluated in image research tasks, yielding competitive results.

    • research article

      Neurocomputing, Band 269, 2017, S. 212-219

      Recently, the forum for questions and answers (Q&A) for software development (e.g. Stack Overflow) is becoming popular. Identifying the best answer to a question asked is important for the Q&A forum because the best answer that provides an excellent solution to the question asked can guide developers in solving their problems in practice. However, the best answers are often not explicitly marked by the question owners. For other developers with the same question, it would be time consuming to review all candidate answers to find the appropriate one. In this whitepaper, we propose a novel approach to predict the best answers to the questions asked on Stack Overflow using heterogeneous data sources in the forum. We extract different group functions from multiple data sources and combine them for the final prediction through multi-view learning. Experimental results show that the proposed method is effective for identifying the best answers to questions asked on Stack Overflow.

    • research article

      Tunable discounting and visual exploration for language models

      Neurocomputing, Band 269, 2017, S. 73-81

      (Video) Increasing Quality of Active 3D Imaging, invited talk by Srinivasa Narasimhan, CMU

      A language model is fundamental to many applications in natural language processing. Most language models are trained on a large set of datasets and are difficult to adapt to other areas where only a small dataset may be available. Optimizing discounting parameters for smoothing is one way to adapt language models for a new domain. In this work, we present novel language models based on tunable discounting mechanisms. The language models are trained on a large dataset, but their discounting parameters can then be tuned to a target dataset. We study tunable discounting and polynomial discounting functions based on the modified Kneser-Ney (mKN) models. In particular, we propose the tunable mKN (TmKN) model, the polynomial discounting mKN (PmKN) model, and the tunable and polynomial discounting mKN (TPmKN) model. We test our proposed models and compare them to the mKN model, the improved KN model, and the tunable mKN with the interpolation model (mKN + interp). With the implementation, our language models achieve perplexity improvements in both in-domain and out-of-domain evaluation. Experimental results show that our new models significantly outperform the base model and our models are particularly adept at adapting to new domains. In addition, we use the visualization technique to depict the relationship between parameter settings and the language model performances to guide our parameter optimization process. Exploratory visual analysis is then used to examine the performance of the proposed language models, revealing the strength and characteristics of the models.

    • research article

      Special issue for the International Conference on Intelligence Science and Big Data Engineering 2016

      Neurocomputing, Band 269, 2017, S. 117-119

    • research article

      Specimen-based 3D human pose estimation with sparse spectral embedding

      Neurocomputing, Band 269, 2017, S. 82-89

      In instance-based approaches, human pose estimation is achieved by retrieving relevant poses with images. Therefore, image description is crucial and it is common to extract multiple features to better describe the visual input data. To merge multiple features, traditional methods simply concatenate multi-view features into one long vector. This oversimplified process suffers from two shortcomings: (1) it usually results in long feature vectors that suffer from the "curse of dimensionality"; (2) it does not make physical sense and may not be able to fully exploit the complementary properties of multi-view features. To address such issues in this article, we present a dimensionality reduction method based on sparse spectral embedding followed by nearest neighbor regression ensemble in low-rank multi-view feature space to derive 3D human poses from monocular videos. The experiments on the HumanEva data set show the effectiveness of the proposed method.

    • research article

      Special edition on selected and extended contributions of the International Conference on Intelligence Science and Big Data Engineering 2015 (IScIDE 2015)

      Neurocomputing, Band 269, 2017, S. 1-2

    Evaluation of video quality through compact energy representation in the 3D DCT domain (3)

    Lihuo Eris currently Associate Professor at Xidian University. He received a B.Sc. degree in Electrical and Information Engineering and a Ph.D. Graduated in Pattern Recognition and Intelligent Systems from Xidian University, Xi'an, China in 2008 and 2013. His research interests focus on Image/Video Quality Assessment, Cognitive Computing and Computational Vision.

    Evaluation of video quality through compact energy representation in the 3D DCT domain (4)

    Wen Lureceived the B.Sc., M.Sc. and Ph.D. Degrees in signal and information processing from Xidian University, China, in 2002, 2006 and 2009, respectively. From 2010 to 2012, he was a postdoctoral fellow in the Department of Electrical Engineering at Stanford University, USA. He is currently an associate professor at Xidian University. His research interests include image and video quality metrics, human visual system, computational vision. He has published 2 books and around 30 technical articles in peer-reviewed journals and proceedings including IEEE TIP, TSMC, Neurocomputing, Signal Processing, etc.

    Evaluation of video quality through compact energy representation in the 3D DCT domain (5)

    Changcheng Jiareceived the B.Sc. Degrees in Electrical and Information Engineering from Xidian University, China in 2014. He is currently a PhD student at Xidian University. His research interests are image quality assessment.

    Evaluation of video quality through compact energy representation in the 3D DCT domain (6)

    And Yoursreceived the B.Sc. and MSc in Electrical and Information Engineering from Xidian University, China in 2012 and 2015. His research interests include image and video quality metrics and visual quality assessment.

    Show full text

    © 2017 Elsevier B.V. All rights reserved.

    Videos

    1. X ray Microscopy pf Battery Reactions Jordi Cabana 7 12 2017
    (AdvancedPhotonSource)
    2. 2019 CSRC Student Blitz Presentations - Edited Version (5/22/19)
    (SDSU Computational Science Research Center)
    3. K3 Liu
    (Picture Coding Symposium Channel)
    4. Laboratory for Recognition and Organization of Speech
    (Microsoft Research)
    5. Evermore: The Theme Park That Wasn't
    (Jenny Nicholson)
    6. Basics of Video Technology, Lecture 6
    (Prof Dr-Ing Gerald Schuller - Media Technology)
    Top Articles
    Latest Posts
    Article information

    Author: Van Hayes

    Last Updated: 07/05/2023

    Views: 5696

    Rating: 4.6 / 5 (66 voted)

    Reviews: 81% of readers found this page helpful

    Author information

    Name: Van Hayes

    Birthday: 1994-06-07

    Address: 2004 Kling Rapid, New Destiny, MT 64658-2367

    Phone: +512425013758

    Job: National Farming Director

    Hobby: Reading, Polo, Genealogy, amateur radio, Scouting, Stand-up comedy, Cryptography

    Introduction: My name is Van Hayes, I am a thankful, friendly, smiling, calm, powerful, fine, enthusiastic person who loves writing and wants to share my knowledge and understanding with you.