Development and Selection of Statistical Models

Session Chair: Andrew Rukhin, NIST


A Prediction-Based Model Selection Approach

Huaiqing Wu
Iowa State University

Abstract: In many applications it is common to observe a continuous response with corresponding potential explanatory variables or covariates. Linear models are often employed to model such data. To perform model selection in these situations, step-wise or all-subsets selection based on the Cp criterion, the Akaike Information Criterion (AIC), or the Bayesian Information Criterion (BIC) is often used. Another model selection strategy frequently considered is based on the Cross-Validation. One theme central to all of these methods is that they only consider model performance at the observed data. However, in some applications we wish to predict the expected response over a distribution of explanatory-variable values that are possibly different from those in the observed data. We propose a new model selection strategy where the focus is on prediction over a user-specified distribution of covariate values. The idea is that, if a model is to be used for making predictions, the covariate locations at which predictions are to be made should influence the selection procedure. The proposed method is illustrated with a textbook example and a real application aimed at assessing battery performance. We also present simulation results to demonstrate situations in which the new method produces gains in prediciton ability. This is joint work with Adam Pintar at Iowa State University and Christine Anderson-Cook at Los Alamos National Laboratory.


Generalized Selective Assembly: A Methodology for Quality Improvement

Matthias H. Y. Tan and C.F. Jeff Wu
Georgia Tech

Abstract: Selective assembly has traditionally been used to achieve tight specifications on the clearance of two mating parts. However, its applicability is not limited to this particular type of assembly. The main purpose of this paper is to develop a generalized version of selective assembly, called GSA. It can be used as a powerful tool to improve the quality of virtually any assembly of any number of components, provided that the assembly response function is known and that each unit of product can be viewed as an assembly of one unit of each of ?? different component types. We consider the selective assembly of products in job shops and batch production systems, which, due to the low production rate requirements, can readily accommodate the added steps necessary to implement the proposed GSA. Two variants of GSA are considered: direct selective assembly and fixed bin selective assembly. The former is selective assembly using information from measurements on component characteristics directly, whereas the latter is selective assembly of components sorted into bins. For each variant, the problem of matching the ?? components of each type to give ?? assemblies that minimize quality cost is formulated as a linear integer program. It turns out that the component matching problem for direct selective assembly is an axial multi-index assignment problem while the component matching problem for fixed bin selective assembly is an axial multiindex transportation problem. We use simulations to evaluate the performance of these methods and to find the optimal number of bins. Realistic examples are given to show that the proposed methods can significantly improve the quality of assemblies.


Internet VoIP Traffic Analysis and Modeling

B. Xi, H. Chen, W. S. Cleveland, and T. Telkamp
Purdue University

Abstract: Network engineering for quality-of-service (QoS) of Internet voice communication (VoIP) can benefit substantially from simulation study of VoIP packet traffic queueing on a network of routers. This requires accurate statistical models for the packet arrivals to the network from a gateway. This talk presents the development and validation of models for the superposed arrival process based on statistical analyses of VoIP traffic from the Global Crossing (GBLX) international network. Statistical models and methods involve point processes and their superposition; time series auto-correlations and power spectra; long-range dependence; random effects and hierarchical modeling; bootstrapping; robust estimation; modeling independence and identical distribution; and many visualization methods for model building. The result is two models validated by the analyses that can generate accurate synthetic multiplexed VoIP packet traffic. One is a semi-empirical model. The second is a mathematical model whose components are parametric statistical models. The modeling is for the IP-inbound traffic to an IP network. This is achieved because the GBLX data, collected on an IP link, are very close to their properties when they entered the GBLX network.