FREE ELECTRONIC LIBRARY - Dissertations, online materials

Pages:   || 2 | 3 | 4 |

«1. Introduction Consider the frequently encountered goal of determining a rule m(x) for predicting a future observation of a univariate response ...»

-- [ Page 1 ] --

Sufficient Dimension Reduction and

Prediction in Regression

By Kofi P. Adragni and R. Dennis Cook

University of Minnesota, 313 Ford Hall, 224 Church Street S.E. Minneapolis, MN

55455, USA

Dimension reduction for regression is a prominent issue today because technological

advances now allow scientists to routinely formulate regressions in which the num-

ber of predictors is considerably larger than in the past. While several methods have

been proposed to deal with such regressions, principal components still seem to be the most widely used across the applied sciences. We give a broad overview of ideas underlying a particular class of methods for dimension reduction that includes prin- cipal components, along with an introduction to the corresponding methodology.

New methods are proposed for prediction in regressions with many predictors.

Keywords: Lasso, Partial least squares, Principal components, Principal component regression, Principal fitted components

1. Introduction Consider the frequently encountered goal of determining a rule m(x) for predicting a future observation of a univariate response variable Y at the given value x of a p×1 vector X of continuous predictors. Assuming that Y is quantitative, continuous or discrete, the mean squared error E(Y −m(x))2 is minimized by choosing m(x) to be the mean E(Y |X = x) of the conditional distribution of Y |(X = x). Consequently, the prediction goal is often specialized immediately to the task of estimating the con- ditional mean function E(Y |X) from the regression of Y on X. When the response is categorical with sample space SY consisting of h categories SY = {C1,..., Ch }, the mean function is no longer a relevant quantity for prediction. Instead, given an observation x on X, the predicted category C∗ is usually taken to be the one with the largest conditional probability C∗ = arg max Pr(Ck |X = x), where the maximization is over SY. When pursuing estimation of E(Y |X) or Pr(Ck |X) it is nearly always worthwhile to consider predictions based on a function R(X) of dimension less than p, provided that it captures all of the information that X con- tains about Y so that E(Y |X) = E(Y |R(X)). We can think of R(X) as a function that concentrates the relevant information in X. The action of replacing X with a lower dimensional function R(X) is called dimension reduction; it is called sufficient dimension reduction when R(X) retains all the relevant information about Y. A potential advantage of sufficient dimension reduction is that predictions based on an estimated R may be substantially less variable than those based on X, without introducing worrisome bias. This advantage is not confined to predictions, but may accrue in other phases of a regression analysis as well.

One goal of this article is to give a broad overview of ideas underlying sufficient dimension reduction for regression, along with an introduction to the correspond

–  –  –

ing methodology. Sections 1a, 1b, 2 and 3 are devoted largely to this review. Sufficient dimension reduction methods are designed to estimate a population parameter called the central subspace, which is defined in §1b. Another goal of this article is to describe a new method of predicting quantitative responses following sufficient dimension reduction; categorical responses will be discussed only for contrast. The focus of this article shifts to prediction in §4 where we discuss four inverse regression models, describe the prediction methodology that stems from them, and give simulation results to illustrate their behaviour. Practical implementation issues are discussed in §5, along with additional simulation results.

(a) Dimension reduction There are many methods available for estimating E(Y |X) based on a random sample (Yi, Xi ), i = 1,..., n, from the joint distribution of Y and X. If p is sufciently small and n is sufficiently large, it may be possible to estimate E(Y |X) adequately by using nonparametric smoothing (see, for example, Wand & Jones 1995). Otherwise, nearly all techniques for estimating E(Y |X) employ some type of dimension reduction for X, either estimated or imposed as an intrinsic part of the model or method.

Broadly viewed, dimension reduction has always been a central statistical concept. In the second half of the nineteenth century ‘reduction of observations’ was widely recognized as a core goal of statistical methodology, and principal components was emerging as a general method for the reduction of multivariate observations (Adcock 1878). Principal components was established as a first reductive method for regression by the mid 1900s.

Dimension reduction for regression is a prominent issue today because technological advances now allow scientists to routinely formulate regressions in which p is considerably larger than in the past. This has complicated the development and fitting of regression models. Experience has shown that the standard iterative paradigm for model development guided by diagnostics (Cook & Weisberg 1982, p.

7) can be imponderable when applied with too many predictors. An added complication arises when p is larger than the number of observations n, leading to the so called ‘n p’ problem. Standard methods of fitting and corresponding inference procedures may no longer be applicable in such regressions. These and related issues have caused a shift in the applied sciences toward a different regression genre with the goal of reducing the dimensionality of the predictor vector as a first step in the analysis. Although large-p regressions are perhaps mainly responsible for renewed interest, dimension reduction methodology can be useful regardless of the size of p. For instance, it is often helpful to have an informative low-dimensional graphical summary of the regression to facilitate model building and gain insights.

For this goal p may be regarded as large when it exceeds 2 or 3 since these bounds represent the limits of our ability to view a data set in full using computer graphics.

Subsequent references to ‘large p’ in this article do not necessarily imply that n p.

Reduction by principal components is ubiquitous in the applied sciences, particularly in bioinformatics applications where principal components have been called ‘eigen-genes’ (Alter et al. 2000) in microarray data analyses and ‘meta-kmers’ in analyses involving DNA motifs. The 2006 Ad Hoc Committee Report on the ‘Hockey Stick’ Global Climate Reconstruction, authored by E. Wegman, D. Scott and Y. Said Article submitted to Royal Society Prediction in Regressions with large p 3 and commissioned by the U.S. House Energy Committee, reiterates and makes clear that past influential analyses of data on global warming are flawed because of an inappropriate use of principal component methodology.

While principal components seem to be the dominant method of dimension reduction across the applied sciences, there are many other established and recent statistical methods that might be used to address large p regressions, including factor analysis, inverse regression estimation (Cook & Ni 2005), partial least squares, projection pursuit, seeded reductions (Cook et al. 2007), kernel methods (Fukumizu et al. 2009) and sparse methods like the lasso (Tibshirani 1996) that are based on penalization.

(b) Sufficient Dimension Reduction Dimension reduction is a rather amorphous concept in statistics, changing its character and goals depending on context. Formulated specifically for regression, the following definition (Cook 2007) of a sufficient reduction will help in our pursuit

of methods for reducing the dimension of X while en route to estimating E(Y |X):

Definition 1.1. A reduction R : Rp → Rq, q ≤ p, is sufficient if it satisfies one of

the following three statements:

(i) inverse reduction, X|(Y, R(X)) ∼ X|R(X), (ii) forward reduction, Y |X ∼ Y |R(X), (iii) joint reduction, X Y |R(X), where indicates independence, ∼ means identically distributed and A|B refers to the random vector A given the vector B.

Each of the three conditions in this definition conveys the idea that the reduction R(X) carries all the information that X has about Y, and consequently all the information available to estimate E(Y |X). They are equivalent when (Y, X) has a joint distribution. In that case we are free to determine a reduction inversely or jointly and then pass it to the conditional mean without additional structure: E(Y |X) = E(Y |R(X)). In some cases there may be a direct connection between R(X) and E(Y |X). For instance, if (Y, X) follows a nonsingular multivariate normal distribution then R(X) = E(Y |X) is a sufficient reduction, E(Y |X) = E{Y |E(Y |X)}. This reduction is also minimal sufficient: if T (X) is any sufficient reduction then R is a function of T. Further, because of the nature of the multivariate normal distribution, it can be expressed as a linear combination of the elements of X: R = βT X is minimal sufficient for some vector β.

Inverse reduction by itself does not require the response Y to be random, and it is perhaps the only reasonable reductive route when Y is fixed by design. For instance, in discriminant analysis X|Y is a random vector of features observed in one of a number of subpopulations indicated by the categorical response Y, and no discriminatory information will be lost if classifiers are restricted to R.

If we consider a generic statistical problem and reinterpret X as the total data D and Y as the parameter θ, then the condition for inverse reduction becomes D|(θ, R) ∼ D|R so that R is a sufficient statistic. In this way, the definition of a sufficient reduction encompasses Fisher’s (1922) classical definition of sufficiency.

Article submitted to Royal Society 4 K. P. Adragni, R. D. Cook One difference is that sufficient statistics are observable, while a sufficient reduction may contain unknown parameters and thus needs to be estimated. For example, if (X, Y ) follows a nonsingular multivariate normal distribution then R(X) = βT X and it is necessary to estimate β.

In some regressions R(X) may be a nonlinear function of X, and in extreme cases no reduction may be possible, so all sufficient reductions are one-to-one functions of X and thus equivalent to R(X) = X. Most often we encounter multi-dimensional reductions consisting of several linear combinations R(X) = ηT X, where η is an unknown p× q matrix, q ≤ p, that must be estimated from the data. Linear reductions may be imposed to facilitate progress, as in the moment-based approach reviewed in §3a. They can also arise as a natural consequence of modelling restrictions, as we will see in §3b. If η T X is a sufficient linear reduction then so is (ηA)T X for any q × q full rank matrix A. Consequently, only the subspace span(η) spanned by the columns of η can be identified – span(η) is called a dimension reduction subspace.

If span(η) is a dimension reduction subspace then so is span(η, η 1 ) for any matrix p× q1 matrix η 1. If span(η 1 ) and span(η 2 ) are both dimension reduction subspaces, then under mild conditions so is their intersection span(η 1 ) ∩ span(η 2 ) (Cook 1996, 1998). Consequently, the inferential target in sufficient dimension reduction is often taken to be the central subspace SY |X, defined as the intersection of all dimension reduction subspaces (Cook 1994, 1996, 1998). A minimal sufficient linear reduction is then of the form R(X) = η T X, where the columns of η now form a basis for SY |X. We assume that the central subspace exists throughout this article, and use d = dim(SY |X ) to denote its dimension.

The ideas of a sufficient reduction and the central subspace can be used to further our understanding of existing methodology and to guide the development of new methodology. In Sections 2 and 3 we consider how sufficient reductions arise in three contexts: forward linear regression, inverse moment-based reduction and inverse model-based reduction.

2. Reduction in Forward Linear Regression The standard linear regression model Y = β0 + β T X + ǫ, with ǫ X and E(ǫ) = 0, implies that SY |X = span(β) and thus that R(X) = β T X is minimal sufficient.

The assumption of a linear regression then automatically focuses our interest on β, which can be estimated straightforwardly using ordinary least squares (OLS) when n is sufficiently large, and it may appear that there is little to be gained from dimension reduction. However, dimension reduction has been used in linear regression to improve on the OLS estimator of β and to deal with n p regressions.

One approach consists of regressing Y on X in two steps. The first is the reduction step: reduce X linearly to GT X using some methodology that produces G ∈ Rp×q, q ≤ p. The second step consists of using ordinary least squares to estimate the mean function E(Y |GT X) for the reduced predictors. To describe the resulting estimator β G of β and establish notation for later sections, let Y be the n × 1 vector of centred responses, let X = n X/n denote the sample mean vector, let X be ¯ i=1 ¯ the n × p matrix with rows (Xi − X)T, i = 1,..., n, let Σ = XT X/n denote the usual estimator of Σ = var(X), let C = XT Y/n, which is the usual estimator of −1 C = cov(X, Y ), and let β ols = Σ C be the vector of coefficients from the OLS fit

–  –  –

where βj is the j-th element of β, j = 1,..., p, and the tuning parameter λ is often chosen by cross validation. Several elements of β lasso are typically zero, which corresponds to setting the rows of G to be the rows of the identity matrix Ip corresponding to the nonzero elements of β lasso. However, with this G we do not necessarily have β lasso = βG, although the two estimators are often similar. Consequently, methodology based on penalization does not fit exactly the general form given in equation (2.1).

Pursuing dimension reduction based on linear regression may not produce useful results if the model is not accurate, particularly if the distribution of Y |X depends on more than one linear combination of the predictors. There are many diagnostic and remedial methods available to improve linear regression models when p is not too large. Otherwise, application of these methods can be quite burdensome.

Article submitted to Royal Society 6 K. P. Adragni, R. D. Cook

Pages:   || 2 | 3 | 4 |

Similar works:

«Value-Added versus Positive Sorting In Higher Education: Evidence from General Education Sector in India ∗ and Sheetal Sekhri† Yona Rubinstein July, 2013 Abstract Public college graduates in many developing countries outperform graduates of private ones on the college exit exams. This has often been attributed to the cuttingedge education provided in public colleges. However, public colleges are highly subsidized, suggesting that the private-public education outcome gap might reflect the...»

«Les Waters Manager, Licensing Telephone 020 7282 2106 E-mail les.waters@orr.gsi.gov.uk 24 November 2014 Company Secretary Network Rail Infrastructure Limited Kings Place 90 York Way London N1 9AG Network licence condition 7 (land disposal): Burrows Yard, Swansea Decision 1. On 9 October 2014, Network Rail gave notice of its intention to dispose of land at Burrows Yard, Swansea (the land) in accordance with paragraph 7.2 of condition 7 of its network licence. The land is described in more detail...»

«LEY DE OBRAS PÚBLICAS Y SERVICIOS RELACIONADOS CON LAS MISMAS CÁMARA DE DIPUTADOS DEL H. CONGRESO DE LA UNIÓN Última Reforma DOF 28-05-2009 Secretaría General Secretaría de Servicios Parlamentarios Centro de Documentación, Información y Análisis LEY DE OBRAS PÚBLICAS Y SERVICIOS RELACIONADOS CON LAS MISMAS Nueva Ley publicada en el Diario Oficial de la Federación el 4 de enero de 2000 TEXTO VIGENTE Última reforma publicada DOF 28-05-2009 Nota de vigencia: Las reformas y adiciones a...»

«Geometría Recreativa Yakov Perelman GEOMETRIA RECREATIVA PARTE PRIMERA GEOMETRIA AL AIRE LIBRE El idioma de la naturaleza es matemática, letra de esta lengua, son los círculos, triángulos y otras figuras geométricas. Galileo. CAPITULO PRIMERO GEOMETRÍA EN EL BOSQUE Contenido: 1. Por longitud de la sombra. 2. Dos modos mas 3. El modo de Julio Verne 4. Como actuó el coronel 5. Con ayuda de una agenda 6. Sin acercarse al árbol 7. El altímetro de los silvicultores. 8. Con ayuda del espejo...»

«Bay Area Scientists in Schools Presentation Plan Lesson Name: A Whole New World of DNA and Proteins: 7th Grade Version_ Presenters: Grade Level: 7th Standards Connection(s): Life Science: 1.c the nucleus is the repository for genetic information in plant and animal cells. 2.e Students know DNA (deoxyribonucleic acid) is the genetic material of living organisms and is located in the chromosomes of each cell. Next Generation Science Standards MS-LS1Conduct an investigation to provide evidence...»

«Response prepared for reviewer #1 We would like to thank the reviewer for their detailed comments and suggestions for the manuscript. We believe that the comments have identified important areas which required improvement. After completion of the suggested edits, the revised manuscript has benefitted from an improvement in the overall presentation and clarity. Below, you will find a point by point description of how each comment was addressed in the manuscript. Original reviewer comments in...»

«ARECLS, 2012, Vol.9, 15-41.WHAT ARE THE TYPES AND PROPORTIONS OF ‘MAJOR’ SPELLING ERRORS MADE BY ‘SHORT-STAY’ JAPANESE UNIVERSITY STUDENTS ENROLLED FULL-TIME AT NEWCASTLE UNIVERSITY? ABSTRACT The availability of a plethora of articles focussing on the misspellings of ‘L1UE’ evidences the interest this topic has engendered amongst researchers. However, the misspellings of ‘L2UE’ have only received similar research attention over recent decades. As spellings are seldom perceived...»

«26 April 2016 Atlas Mara Limited Announces 2015 Year‐End Results Atlas Mara Limited (Atlas Mara or the Company, including its subsidiaries, the “Group”), the sub-Saharan African financial services group, today releases summary full year results extracted from its audited financial statements for the year ended 31 December 2015. Key financial highlights during the period  Atlas Mara reported profit before tax of $19.2 million compared to a loss before tax of $58 million for the prior...»

«STATE OF RHODE ISLAND AND PROVIDENCE PLANTATIONS PROVIDENCE, SC. SUPERIOR COURT (FILED – FEBRUARY 3, 2011) STATE OF RHODE ISLAND, : JUDICIARY, DISTRICT COURT : : V. : C.A. No. PM-2010-4473 : RHODE ISLAND LABORERS’ : DISTRICT COUNCIL, LOCAL 808 : DECISION GALLO, J. Before the Court is the State of Rhode Island’s (“State”) timely Motion to Vacate an Arbitration Award (G.L. 1956 § 28-9-21) and the Rhode Island Laborers’ District Council’s (“Union”) timely Petition to Confirm the...»

«Ayuntamiento de Medina del Campo LA CORTE DE LOS REYES CATÓLICOS FORMACIÓN DE LA ÉLITE CORTESANA La movilidad de la Corte y la multiplicación de sus instituciones favoreció la formación de equipos de gobierno lo suficientemente articulados para servir de enlace entre los soberanos y los diferentes organismos instalados en el territorio. Como todo colectivo social de finales de la Edad Media, estos grupos semi-profesionales se integraban en una estructura clientela basada en vínculos de...»

«OPENBARE LES Peter Troxler Beyond Consenting Nerds Lateral Design Patterns for New Manufacturing Beyond Consenting Nerds Lateral Design Patterns for New Manufacturing Hogeschool Rotterdam Uitgeverij Colophon ISBN: 9789051799231 first edition, 2015 © Dr. Peter Troxler This work is licensed under the Creative Commons Attribution 4.0 International License (CC-BY-4.0). To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/. Photo Credits: Cover: © 2007 by Bill Ward,...»

«MIDI Expression User Manual Quattro iO Updated 10/22/2016 Table of Contents Introduction (E) Splash Note Features Pedal Sensing Technology (A) Input Type Supported Pedal Types (B) Mode Selector Class Compliant (C) Invert Standalone Operation (D) Toggle Colors (E-H) Channel and Control Change.12 On-Board LED Modes for Sustain Pedals Parameter Feedback Value Persistence Note On / Off Default Settings INC / DEC Modes Expression Control Application Standalone vs. Plugin Mode Key Command Device and...»

<<  HOME   |    CONTACTS
2016 www.dissertation.xlibx.info - Dissertations, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.