Choice of Order in Regression Strategy
Julian J. Faraway
Regression analysis is viewed as a search through model space using
data analytic functions. The desired models should satisfy several
requirements, unimportant variables should be excluded, outliers
identified, etc. The methods of regression data analysis such as
variable selection, transformation and outlier detection, that
address these concerns are characterized as functions acting on
regression models and returning regression models. A model that is
unchanged by the application of any of these methods is considered
acceptable. A method for the generation of all acceptable models
supported by all possible orderings of the choice of regression data
analysis methods is described with a view to determining if two
statisticians may reasonably hold differing views on the same data.
The consideration of all possible orders of analysis generates a
directed graph in which the vertices are regression models and the
arcs are data-analytic methods. The structure of the graph is of
statistical interest. The ideas are demonstrated using a LISP-based
analysis package. The methods described are not intended for the
entirely automatic analysis of data, rather to assist the
statistician in examining regression data at a strategic level.
This paper appeared ``Selecting Models from Data: Artificial
Intelligence and Statistics IV'' (1994), Cheeseman P. \& Oldford W. Eds.
Springer-Verlag. If you are unable to obtain this text, a
preliminary version is available.
Last modified on 06/23/97