Sparsity, High-Dimensional Data, Dimension Reduction and the Time Series Setting

In Talks
17:25, 11 Apr 2024

Event: Sparsity, High-Dimensional Data, Dimension Reduction and the Time Series Setting
Date: Wednesday 17 April 2024
Time: 12:00 - 13:00
Venue: VC101, IT Services Building, University of Malta, Msida Campus or Online (via Zoom)

Speaker: Dr David Suda - Department of Statistics & Operations Research, Faculty of Science, University of Malta

For the benefit of audiences who may be unfamiliar with the concepts, terms such as sparsity, high dimensional data and dimension reduction will be introduced.

Sparsity is the statistical practice of reducing the number of non-zero parameters in a model, a variable selection approach that can commonly be achieved through adding a penalty to the objective function of an estimation problem.

High dimensionality, on the other hand, refers to datasets where the number of variables is close to or larger than the sample size. Examples of situations where this can occur are econometrics, imaging (e.g. fMRI data) and genomics.

Dimension reduction, on the other hand, is a more commonly known concept – it is the practice of explaining a multivariate setting in lower dimensions, and in classical statistics is most commonly achieved via principal components analysis (PCA). We start by providing a brief overview of the sparsity treatment on classic statistical models – particularly with the intent of handling variable selection in the high dimensional context.

The main focus of this talk will be dynamic principal components analysis (DPCA) which refers to an extension of PCA in a time series setting. The popular dimension reduction technique of PCA needs no introduction with many practitioners, but its lesser known relative, DPCA, addresses the handling of time-dependence and/or short-term correlation not catered for by PCA. Brillinger's frequency domain approach is the earliest of such approaches and is aimed at a single realisation setting. In the last decade, time-domain approaches have also evolved, addressing both the single realisation and multiple realisation settings. Some sparsity extensions for the high-dimensional data setting have also been introduced. Peer-reviewed literature addressing high-dimensionality in the frequency domain setting remains missing.

The frequency domain approach to principal components essentially replicates the classical approach but on cross-spectra instead of the covariance matrix. Alternatively, we can also consider spectral decompositions of the data matrix/Fourier transforms instead of the data matrix itself.

From the frequency domain setting, the loadings in the time series domain can then be recuperated through the Fourier inverse, and the principal components through the dynamic Karhunen-Loeve expansion. Our current research aims to address the void concerning high-dimensionality in academic literature when it comes to frequency-domain principal components.

Taking cue from a plethora of literature in the context of sparse PCA, we find that many techniques for addressing sparsity in static sparse PCA setting can be extrapolated to the frequency-domain approach. A way forward is devised regarding its practical implementation, with the long-term aim of also implementing these techniques on real applications.

Registration is available online.

Please note that in-person participation is encouraged.