Statistica ed econometria

Confidence sets for model selection

Description: 

At first glance the goals of model selection might seem clear. Out of a set of possible models, we want to select the ”best” or a subset of ”best” models. This notion of ”best” however is not well defined, since it obviously depends on the initial goals of the selection. In order to study the uncertainty in model selection, we introduce a new definition of a model, where the models are no longer defined through zero and non-zero components but through irrelevant and relevant component. Then inspired by confidence intervals for estimated parameters, we propose a method to build confidence sets for model selection in a parametric setting, i.e. create sets of models within which the true model is included with a certain confidence. This allows us to perform inference on model selection. We discuss the computational challenges with such a method, how to find p-values (for the model), consistency in model selection and through a data set and a simulation study show the implications of this new method.

Robust penalized M-estimators for generalized linear and additive models

Description: 

Generalized linear models (GLM) and generalized additive models (GAM) are popular statistical methods for modelling continuous and discrete data both parametrically and nonparametrically. In this general framework, we consider the problem of variable selection by studying a wide class of penalized M-estimators that are particularly well suited for high dimensional scenarios where the number of covariates $p$ is very large relative to the sample size $n$. We focus on resistance issues in the presence of deviations from the stochastic assumptions of the postulated models and highlight the weaknesses of widely used estimators. We advocate the need for robust estimators and propose several penalized quasilikelihood estimators that achieve both good statistical properties at the assumed model and stability in a neighborhood of it. Specifically, we provide careful asymptotic analyses of our robust estimators for GLM and GAM when the number of parameters increases with the sample size. We start by revisiting the asymptotics of M-estimators for GLM with a diverging number of parameters. We establish asymptotic normality of these estimators and reexamine distributional results for likelihood ratio type and Wald type tests based on them. We then consider penalized M-estimators for high dimensional set ups where $pgg n$. In the GLM setting we show that our estimators are consistent, asymptotically normally distributed and variable selection consistent under regularity conditions. Furthermore they have a bounded bias in a neighborhood of the model. In the GAM setting we establish an $ell_2$-norm consistency result for the nonparametric components which achieves the optimal rates of convergence. In addition, the proposed penalized estimator is able to select the correct model consistently. We propose new algorithms for the implementation of our penalized M-estimators and illustrate the finite sample performance of our methods, at the model and under contamination, in simulation studies. An important contribution of this thesis is to formally study the local robustness properties of general nondifferentiable penalized M-estimators. In particular, we propose a framework that allows us to define rigorously the influence function as the limiting influence function of a sequence of approximating functionals. We show that this influence function can be used to characterize the robustness properties of a wide range of sparse estimators and that it can be viewed as a derivative in the sense of distribution theory. At the end of this thesis, we discuss some extensions of our work and give an overview of the future challenges of robust statistics in high dimensions.

Time-frequency Granger causality with application to nonstationary brain signals

Description: 

This PhD thesis concerns the modelling of time-varying causal relationships between two signals, with a focus on signals measuring neural activities. The ability to compute a dynamic and frequency-specific causality statistic in this context is essential and Granger causality provides a natural statistical tool. In Chapter 1 we propose a review of the existing methods allowing one to measure time-varying frequency-specific Granger causality and discuss their advantages and drawbacks. Based on this review, we propose in Chapter 2 an estimator of a linear Gaussian vector autoregressive model with coefficients evolving over time. Estimation procedure is achieved through variational Bayesian approximation and the model provides a dynamical Granger-causality statistic that is quite natural. We propose an extension to the `a trous Haar decomposition that allows us to derive the desired dynamical and frequency-specific Granger-causality statistic. In Chapter 3 we propose an application of the model to real experimental data.

Simulation based bias correction methods for complex problems

Description: 

Nowadays, the increase in data size and model complexity has led to increasingly difficult estimation problems. The numerical aspects of the estimation procedure can indeed be very challenging. To solve these estimation problems, approximate methods such as pseudo-likelihood functions or approximated estimating equations can be used as these methods are typically easier to implement numerically although they can lead to inconsistent and/or biased estimators. In this thesis, we propose a unified framework to compare four existing bias reduction estimators, two of them are based on indirect inference and two are based of bootstrap. We derive the asymptotic and finite sample properties of these bias correction methods. We demonstrate the equivalence between one version of the indirect inference and the iterative bootstrap which both correct sample biases up to the order $n^{-3}$. Therefore, our results provide different tools to correct the asymptotic as well as finite sample biases of estimators and give insight as to which method should be applied according to the problem at hand. We then apply these bias reduction techniques to robust estimation of income distributions. We used a very simple starting estimator which is known to be robust but not consistent and correct its bias with indirect inference. This is a very general way to construct robust estimators for complex models. A second illustration is provided by the estimation of Generalized Linear Latent Variable Models. We were able to compute unbiased estimates for these very complex models that have a large number of parameters without employing numerical integration techniques. As a by-product, bias reduction techniques allow to compute a goodness-of-fit test statistic for latent variable models.

Robust methods for personal income distribution models

Description: 

In the present thesis, robust statistical techniques are applied and developed for the economic problem of the analysis of personal income distributions and inequality measures. We follow the approach based on influence functions in order to develop robust estimators for the parametric models describing personal income distributions when the data are censored and when they are grouped. We also build a robust procedure for a test of choice between two models and analyse the robustness properties of goodness-of-fit tests. The link between economic and robustness properties is studied through the analysis of inequality measures. We begin our discussion by presenting the economic framework from which the statistical developments are made, namely the study of the personal income distribution and inequality measures. We then discuss the robust concepts that serve as basis for the following steps and compute optimal bounded-influence estimators for different personal income distribution models when the data are continuous and complete. In a third step, we study the case of censored data and propose a generalization of the EM algorithm with robust estimators. For grouped data, Hampel's theorem is extended in order to build optimally bounded-influence estimators for grouped data. We then focus on tests for model choice and develop a robust generalized Cox-type statistic. We also analyse the robustness properties of a wide class of goodness-of-fit statistics by computing their level influence functions. Finally, we study the robustness properties of inequality measures and relate our findings with some economic properties these measures should fulfil. Our motivation for the development of these new robust procedures comes from our interest in the field of income distribution and inequality measurement. However, it should be stressed that the new estimators and tests procedures we propose do not only apply in this particular field, but they can be used in or extended to any parametric problem in which density estimation, incomplete information, grouped or discrete data, model choice, goodness-of-fit, concentration index, is one of the key words.

Options pricing with realized volatility

Description: 

We develop a discrete-time stochastic volatility option pricing model, which exploits the information contained in high-frequency data. The Realized Volatility (RV) is used as a proxy of the unobservable log-returns volatility. We model its dynamics by a simple but effective (pseudo) long memory process, the Heterogeneous Auto-Regressive Gamma with Leverage (HARGL) process. Both the discrete-time specification and the use of the RV allow us to easily estimate the model using observed historical data. Assuming a standard, exponentially affine stochastic discount factor, we obtain a fully analytic change of measure. An extensive empirical analysis of S&P 500 index options illustrates that our approach significantly outperforms competing time-varying (i.e. GARCH-type) and stochastic volatility pricing models. The pricing improvement can be ascribed to: (i) the direct use of the RV, which provides a precise and fast-adapting measure of the unobserved underlying volatility; and (ii) the specification of our model, which, on the one hand, is able to accurately reproduce the volatility persistence and, on the other hand, provides the necessary smoothing of the noise present in the RV dynamics.

Higher-order robustness

Description: 

The higher–order robustness for M–estimators is introduced and defined. The conditions needed to ensure higher stability of the asymptotic bias are provided by refining the Von Mises bias expansion. Admissible M–estimators featuring second–order robustness are thus introduced. Then, a saddle-point argument is applied in order to approximate the finite sample distribution of second–order robust M–estimators. The link between the stability of this approximation and the second–order robustness is explored. Monte Carlo simulation provides evidence that second–order robust M–estimators perform better than the MLE and Huber–type estimators, even in moderate to small sample sizes and/or for large amounts of contamination.

Goodness-of-fit for Generalized Linear Latent Variables Models

Description: 

Generalized Linear Latent Variables Models (GLLVM) enable the modeling of relationships between manifest and latent variables, where the manifest variables are distributed according to a distribution of the exponential family (e.g. binomial or normal) and to the multinomial distribution (for ordinal manifest variables). These models are widely used in social sciences. To test the appropriateness of a particular model, one needs to define a Goodness-of-fit test statistic (GFI). In the normal case, one can use a likelihood ratio test or a modified version proposed by citeN{SaBe:01} (S&B GFI) that compares the sample covariance matrix to the estimated covariance matrix induced by the model. In the binary case, Pearson-type test statistics can be used if the number of observations is sufficiently large. In the other cases, including the case of mixed types of manifest variables, there exists GFI based on a comparison between a pseudo sample covariance and the model covariance of the manifest variables. These types of GFI are based on latent variable models that suppose that the manifest variables are themselves induced by underlying normal variables (underlying variable approach). The pseudo sample covariance matrices are then made of polychoric, tetrachoric or polyserial correlations. In this article, we propose an alternative GFI that is more generally applicable. It is based on some distance comparison between the latent scores and the original data. This GFI takes into account the nature of each manifest variable and can in principle be applied in various situations and in particular with models with ordinal, and both discrete and continuous manifest variables. To compute the

Wavelet-Variance-Based Estimation for Composite Stochastic Processes

Description: 

This article presents a new estimationmethod for the parameters of a times series model.We consider here composite Gaussian processes that are the sum of independent Gaussian processes which, in turn, explain an important aspect of the time series, as is the case in engineering and natural sciences. The proposed estimation method offers an alternative to classical estimation based on the likelihood, that is straightforward to implement and often the only feasible estimation method with complex models. The estimator furnishes results as the optimization of a criterion based on a standardized distance between the sample wavelet variances (WV) estimates and the model-basedWV. Indeed, the WV provides a decomposition of the variance process through different scales, so that they contain the information about different features of the stochastic model. We derive the asymptotic properties of the proposed estimator for inference and perform a simulation study to compare our estimator to the MLE and the LSE with different models. We also set sufficient conditions on composite models for our estimator to be consistent, that are easy to verify. We use the new estimator to estimate the stochastic error's parameters of the sum of three first order Gauss–Markov processes by means of a sample of over 800,000 issued from gyroscopes that compose inertial navigation systems.

Degrees-of-freedom tests for smoothing splines

Description: 

When using smoothing splines to estimate a function, the user faces the problem of choosing the smoothing parameter. Several techniques are available for selecting this parameter according to certain optimality criteria. Here, we take a different point of view and we propose a technique for choosing between two alternatives, for example allowing for two different levels of degrees of freedom. The problem is addressed in the framework of a mixed‐effects model, whose assumptions ensure that the resulting estimator is unbiased. A likelihood‐ratio‐type test statistic is proposed, and its exact distribution is derived. Tests of linearity and overall effect follow directly. We then extend this idea to additive models where it provides a more attractive alternative than multi‐parameter optimisation, and where it gives exact distributional results that can be used in an analysis‐of‐deviance‐type approach. Examples on real data and a simulation study of level and power complete the paper.

Pagine

Le portail de l'information économique suisse

© 2016 Infonet Economy

Abbonamento a RSS - Statistica ed econometria