Keynote Speaker:

Sallie Keller-McNulty, Rice University

 

Title: Reliability Reloaded

 

In this age of exponential growth in science and technology, the capability to evaluate the performance, reliability, and safety of complex systems presents new challenges. Today's methodology must respond to the ever increasing demands for such evaluations to provide key information for decision and policy makers at all levels of government and industry, problems ranging from national security to space exploration. Scientific progress in integrated reliability assessment requires the development of processes, methods, and tools that combine diverse information types (e.g., experiments, computer simulations, expert knowledge) from diverse sources (e.g., scientists, engineers, business developers, technology integrators, decision-makers) to assess quantitative performance metrics that can aid decision-making under uncertainty. These are highly interdisciplinary problems. The principle role of the statistician is to bring statistical sciences thinking and application to these problems. By the nature of our training, statisticians frequently assume the role of scientific integrator, hence are well poised to lead the development of integrated reliability assessments. However, this puts the statistician closer to policy pressures and politics. This talk will focus on the growing challenges facing statistical sciences in the domain of integrated reliability assessment and how we, as statisticians, must separate the scientific method from the politics of the scientific process to develop assessment methodology that will facilitate the decision-making processes.

 

Invited Speaker:

Roshan Vengazhiyil, Georgia Institute of Technology

Robust Parameter Design and Vartiation Reduction

 

C.F. Jeff Wu, Georgia Institute of Technology

Improving calibration systems through designed experiments

 

Abstract: Taguchi (1987) advocates the use of designed experiments to improve measurement and calibration systems. In this paper we study some statistical aspects of the problem. An appropriate performance measure is derived that provides us with a deeper insight into Taguchi's signal-tonoise ratio. Two different modeling approaches, namely, performance measure modeling and response modeling are considered. The proposed approaches are illustrated and compared using an experiment on drive shaft imbalance. (Joint work with Arden Miller and Tirthankar Dasgupta)

 


 

Daniel D. Frey, Massachusetts Institute of Technology

Adaptive OFAT Applied to Robust Parameter Design

 

Abstract: Previous investigations have explored the performance of adaptive OFAT ("one factor at a time") experimentation establishing that the method exploits main

effects with high probability and also tends to exploit twofactor interactions when they are large. The current study applies these results to robust parameter design. A simple method is proposed in which resolution III factorial designs are used for an outer array of noise factors and adaptive OFAT is used for exploring control factors. This approach exploits control by noise interactions with high probability and also tends to exploit control by control by noise interactions when they are large. Model-based assessments and case studies suggest that this approach provides substantially more improvements than alternatives with similar run size including crossed resolution III arrays and combined arrays with mininum J-aberration. The approach

also provides advantages in flexibility, use of prior knowledge, and costs due to control factor changes.

 

Judy Jin,  University of Michigan

Title: Variance Component Decomposition and Diagnosis for Batch Manufacturing Processes using ANOVA

 

Abstract: In batch manufacturing processes, the total process variation is generally decomposed into batch-by-batch variation and within-batch variation. Since different

variation components may be caused by different sources, separation, testing and estimation of each variance component are essential to the process improvement. Most of the previous SPC research emphasized on reducing variations due to assignable causes by implementing control charts for process monitoring. Different from this focus, this talk aims to analyze and reduce inherent natural process variations by applying the ANOVA method. The key issue of using the ANOVA method is how to develop appropriate statistical models for all variation components of interest. The paper provides a generic framework for decomposition of three typical variation components in batch manufacturing processes. For the purpose of variation root causes diagnosis, the corresponding linear contrasts are defined to

represent the possible site variation patterns and the statistical nested effect models are developed accordingly. It shows that the use of a full factor decomposition model can expedite the determination of the number of nested effect models and the model structure. Finally, an example is given for the variation reduction in the screening conductive gridline printing process for solar battery fabrication.

 

 


 

Invited Session

Kenneth Gilbert, University of Tennessee

Simulation of Supply Chains

 

This session is an experiential simulation of a multi-stage supply chain and a tutorial on ARIMA models of supply chains. It does not assume any prior knowledge of supply chain models and requires only a rudimentary understanding

of time series models.

In the simulation the participants will play the roles of managers in a supply chain. Each will manage an inventory by placing orders with an upstream supplier while filling orders for a downstream customer. The simulation will illustrate the dynamics of multistage supply chains. Then we will demonstrate how Autoregressive Integrated Moving Average Models ARIMA can be used to model these dynamics. Specifically if the customer demand can be characterized as an ARIMA time series, then for a rather general class of ordering policies, the ARIMA time series of the orders and the inventories at each of the upstream stages can be derived. These models can be used to predict the performance of the supply chain and to derive optimal ordering policies.

 

Invited Session

William Parr, The University of Tennessee

Six Sigma: What is Missing?

 

Roger Hoerl, GE Global Research

What is Missing in Six Sigma, and What Should We Do About It?

 

Abstract: Six Sigma has been a tremendously successful improvement initiative for close to 20 years now. Despite its unparalleled success, however, it has its limitations like any other initiative. In this talk we will attempt to separate the hype from the facts, and better understand what Six Sigma is and is not. Based on this analysis, we plan to identify improvement opportunities, and suggest ways in which these

opportunities might be captured. In summary, it will be argued the Six Sigma was never designed to be an overall quality management system, and even within the arena of project-by- project improvement, it has a "sweet spot", outside of which it generally is not the best option.

 


 

Doug Zahn, Statistical Consultant and Coach

Six Sigma: What’s Missing?  Applying it to Ourselves!

 

Six Sigma is a complex system, including teaching, research, design of experiments, data analysis, consulting, and administration. At the heart of this system is encounter—a purposeful meeting of a Six Sigma professional with another person: colleague, student, client, supervisor, supplier, customer, or member of staff. An encounter is a process consisting of five steps: preparing, beginning, working, ending, and reviewing. There is variation in this process as not all encounters are effective. To understand this variation and systematically reduce the number of ineffective encounters, gather primary data on the event by videotaping it. Analyze the data by using three lenses (interpersonal, intrapersonal, and technical) to learn how to identify and recover from the breakdowns that naturally occur in encounters. I will give you an opportunity to learn how to apply this process to one of your current tough problems by using a videotape of an actual consultation.

 

Invited Session

Bruce Ankerman, Northwestern University

Design of Experiments for Discrete Event Simulation

 

Hua Shen and Hong Wan, Perdue University

Controlled Sequential Factorial Design for Simulation Factor Screening

 

Abstract: We propose controlled sequential factorial design for discrete-event simulation factor screening. It combines a sequential hypothesis-testing procedure with the traditional factorial design to control the Type I Error and power for

each factor under heterogeneous variances conditions. The method requires minimum assumptions and demonstrates robust performance with different system conditions.

 

Bruce Ankenman, Northwest University

Russell Cheng and Sue Lewis, University of Southampton, United Kingdom

An Adaptive Method for Factor Screening for Simulation Experiment

 

Abstract: The sequential method is based on an orthogonal array, but assumes that factor effects have a known direction. The rows of the orthogonal array are run in a strategic order to allow for group factor screening of these factors. An interior point quadratic programming technique is used to get constrained estimates of the factors and quickly eliminate any groups of factors with null effects allowing all factor effects to be estimated before completing the orthogonal array. The method will be applied to screening for both location and dispersion effects and will be compared with competing methods such as sequential bifurcation.

 

 

 

 


 

Russell Barton, Penn State University

DOE for Fitting Forward and Inverse Simulation Metamodels

 

Abstract: Simulation models predict system performance as a function of one or more design variables. These models generally operate in a sense that is opposite to the design objective: given desired performance, identify appropriate values for the design variables. In many cases there are multiple simulation outputs of interest, and the possibility exists for determining an explicit inverse map that would provide design variable values to produce (approximately) the desired output performance. Experiment design strategies for this problem will be presented.

 

Contributed Session

Quality and Process Control Enhancement I

 

Luis F. Dominquez-Palomeque and David D. McLean, University of Ottawa

Robust Design of Chemical Processes under Uncertainty through Stochastic Optimization

 

Abstract: Robust Parameter Design is a methodology advocated by Taguchi for designing products and processes that are insensitive to input variation. His approach to quality engineering is based on running statistically designed experiments on prototypes. For the design of chemical processes this approach has two limitations. Plants and processes are usually not available at the design stage thereby limiting any experimentation and the Taguchi approach focuses on variability reduction to reduce quality costs but neglects capital and operating costs. To overcome these limitations new approaches accounting for quality costs as well as capital and operating costs based on stochastic optimization using process simulators have been forthcoming. These stochastic optimization algorithms require that the expected value of the objective function be calculated. This is usually approximated using quadrature formulas. In our approach, a Hammersley sampling technique has been employed since it has been shown to have a favorable computational demand and it provides a measure of uncertainty for the expected value of the objective function. The optimal robust design of a simple reactor system will be used to demonstrate the benefits of integrating capital and operating costs with quality costs as well as providing a comparison of using Hammersley sampling to the application of quadratures for calculation of the expected value of the overall cost function.

 

Donald J. Wheeler, Consulting Statistician

Gauge R&R Studies and Four Classes of Process Monitors I & II

 

Abstract: The use of the Gage R&R Study, promoted by the Auto Industry Action Group of the American Society for Quality, has been widespread. Unfortunately there are serious problems with both the ratios computed and the guidelines used to interpret those ratios. Based on standard statistical theory, fixes for the incorrectly computed ratios are given and a new interpretative scale is presented and explained in practical, easy-to-understand terms.

 

Invited Sessions

Stuart Hunter, Princeton University

Reliability Analysis

 

Luis A. Escobar, Louisiana State University

Reliability: the other dimension of quality

 

Abstract: During the past twenty years, manufacturing industries have gone through a revolution in the use of statistical methods for product quality. Tools for process monitoring and, particularly experimental design, are much more commonly used today to maintain and improve product quality. A natural extension of the revolution in product quality is to turn focus to product reliability, which is defined as quality over time. This has given rise to programs like Design for Six Sigma. This talk discusses the relationship between engineering quality and reliability, plus outlines the role of statistics and statisticians in the field of reliability. A brief introduction to the statistical tools used in engineering reliability is provided and some predictions for the future of statistics in engineering reliability are made.

 

Lingyan Ruan and Jye-Chyi Lu, Georgia Institute of Technology

Planning of Accelerated Degradation Tests Considering Robust Design of Manufacturing Quality Parameters

 

Abstract: Typical experimental designs for accelerated life or degradation tests assume that products are from the same manufacturing condition. In searching the best combination of controllable manufacturing variables product reliability is as important as quality characteristic, especially for electronic or semiconductor devices. The literature in experimental designs considering both quality and reliability metrics is scarce. This presentation proposes a framework of designing accelerated degradation tests for selecting manufacturing controllable variables that lead to longest product percentile lifetime, minimized variance of lifetime estimates and also least sensitivity of environmental noise factors creating variation of product reliability.

 


 

Dong Ho Park, Hallym University

Bayesian approach on software reliability growth model

 

Abstract: As far as the software system operates, it does not experience the degradation process as does the hardware system. Instead, the software system stops operating due to the faults latent in the system; and thus, the software reliability can be improved by removing the faults during the testing phase. The software reliability is defined as the probability of no failure-occurrence during a certain length of mission period. The fault detection and debugging process are essential to improve software reliability. In this talk, we discuss a new software reliability growth model, which is the extension of the one by Kimura, M., Toyota, T. and Yamada, S. (1999), Economic Analysis of Software Release Problems with Warranty Cost and Reliability Requirement, Reliability Engineering & System Safety, vol. 66, pp.49-55, and apply the Bayesian method to determine the optimal software release time, while  minimizing the expected total software cost. Under this growth model, we assume that the intensity function is the mixture of reliability growth and constant reliability after the software is released to the user at the end of testing phase. To apply the Bayesian approach, we treat three parameters, the initial number of faults in the software, the fault detection rate and the weighted factor, as random variables and assign appropriate prior distributions. Based on such an approach, we propose a Bayesian method to determine the best possible software release time, plus provide comparison with the non-Bayesian method.

 

Timothy M. Young, The University of Tennessee

Using data mining tools of decision trees in quality and reliability applications: brief example on modern engineered wood

 

Abstract: We provide guidance and warnings for using the important data mining tools of decision trees (DT) in quality and reliability applications. A recently developed DT called GUIDE (Generalized, Unbiased, Interaction Detection and

Estimation) is discussed. GUIDE, modified with ANCOVA (Analysis of Covariance), modeling is compared to multiple linear regression approaches for assessing and improving reliability. A small case study in the international manufacture of modern engineered wood products is presented to illustrate the usefulness of GUIDE and DT.

 


 

Invited Session

Robert Mee, The University of Tennessee

Bayesian Advances in Experimental Design and Analysis

 

Bradley Jones, SAS Institute

How Bayesian Thinking Can Help in Designing Experiments

 

Abstract: The suitability of any given experimental design depends on the complexity of the model that proves to be adequate to describe the system being studied. In factor

screening studies, the researcher is not sure which or even how many of the possible factors are driving the responses of interest. Given this level of uncertainty giving much consideration to model complexity may seem premature. Yet, choosing a design without any thought for how one is going to fit the data one obtains is ill advised. What to do?

The Bayesian paradigm provides a structure for designing experiments when there are a multiplicity of possible models to consider. This talk reviews some of the research applying Bayesian thinking for choosing designs. Topics include the initial choice of a screening design, augmentation of designs, and design for models that are nonlinear in the parameters.

 

Roshan Joseph Vengazhiyil, Georgia Institute of Technology

Design and Analysis of Experiments Using Functionally Induced Priors

 

Abstract: Specifying a prior distribution for the large number of parameters in the statistical model is a critical step in a Bayesian approach to the design and analysis of experiments. We show that the prior distribution can be induced from a functional prior on the underlying transfer function. The functionally induced prior requires the specification of only a few hyper-parameters and therefore can be easily implemented in practice. The prior incorporates the well-known principles such as effect hierarchy and effect heredity, which helps to resolve the aliasing problems in fractional designs almost automatically. The usefulness of the approach is demonstrated through the analysis of some experiments. We also propose a new class of design criteria and establish their connections with the minimum aberration criterion.

 


 

Robert Kessels, Catholic University of Leuven

Bradley Jones, SAS Institute

Hans Nyquist, University of Stockholm

Peter Goos, University of Antwerp

Martina Vandebroek, Catholic University of Leuven

Bayesian optimal design of choice experiments

 

Abstract: Choice experiments are widely used in marketing to measure how the attributes of a product or service jointly affect consumer preferences. In a choice experiment, a product or service is represented by a combination of attribute levels called a profile. Respondents then choose one from a group of profiles called a choice set. The study design is a specified number of choice sets submitted to each respondent. Their preferences provide the basis for estimating the importance of each attribute. The knack of designing an efficient choice experiment involves selecting the choice sets that result in high-quality estimates. Recently, Kessels, Goos and Vandebroek(2006) developed a way to produce Bayesian G-and V-optimal designs for the multinomial logit model. These designs allow for precise response predictions which is the goal of choice experiments. The authors showed that the G-and V-optimality criteria outperform the D-and A-optimality criteria in terms of prediction capabilities. However, their G-and V-optimal design algorithm is computationally intensive, which is a barrier to their use in practice. In this talk, we compare the

relative efficiencies of the designs created using various optimality criteria and introduce ways to speed up the calculation of the Bayesian G-and V-optimal designs.

 

Invited Session:

Joseph Voelkel, Rochester Institute of Technology

Measurement Studies

 

Joseph Voelkel, Rochester Institute of Technology

The Comparison of Two Measurement Devises

 

Abstract: Measurement devices sometimes have no reference standards with which they may be compared, and in these cases they are often compared to each other. This frequently occurs when a new type of device is built and is to be compared to the current best device. Frequently-used methods of comparison include regression, correlation, or the so-called Bland-Altman plotting method. We review some of these, including any shortcomings they may have. We also compare our problem to the Gage R&R studies that are commonly performed in industry. Under standard assumptions, we illustrate that the problem is non-identifiable when each device can only make one measurement on each unit. In the case where multiple measurements can be made for each device, we show how the devices may be compared by a sequence of likelihoodratio tests. An example based on two devices that are used to measure intra-ocular pressure of the human eye is used to illustrate the technique.

These methods and many of the results we present, while not new, do not appear to be commonly used.

 


 

 

Sarah Michalak, Michael Hamada, and Nicolas Hengartner, Los Almos National Laboratory

A Bayesian Analysis of Interval-Censored Failure Time Data with Measurement Error

 

Abstract: Measurement error may lead to interval-censored failure data where the interval endpoints are not known exactly. We consider data with this characteristic that were collected during an experiment assessing the susceptibility of a memory device to soft errors resulting from cosmic-ray induced neutrons. (A soft error is a transient error, i.e., bit flip, that causes no permanent damage to the memory device.) We use a Weibull model and take a Bayesian approach to the analysis.

 

Invited Session:

George Michalididis, University of Michigan

Statistics and Information Technology

 

Earl Lawrence, Los Alamos National Laboratory

Mixture Modeling with Spatial Components for Active Delay Tomography

 

Abstract: The field of active network tomography is concerned with the estimation of link-level performance measures, e.g. delay distributions for packets on a link, based on measured endto- end performance of injected traffic, e.g. total path delay for a probe packet. One area of continuing research is the choice of appropriate distributional forms for estimating delay as many standard parametric distributions are inappropriate. In particular, most parametric distributions are inadequate for modeling the tail behavior of delay distributions. This talk will explore the use of mixture modeling to overcome this limitation. Mixture models provide a flexible tool for capturing overall shape and tail behavior. Further, we will consider the modeling of spatial correlations in order to account for traffic similarity on neighboring links, a problem much ignored in the current literature. Applications to real and simulated data will be considered.

 

Bowei Xi, Purdue University

The Characteristics of Voice over IP Traffic

 

Voice over Internet Protocol (VoIP) is a new and fast developing technology. Voice data, traditionally carried by the "public switched telephone network", are transmitted along with other applications on the IP network. An empirical study of the VoIP data collected from the Global Crossing Network is presented. Several key factors that play a critical role in traffic engineering are examined: the multiplexed packet process, call arrivals and duration distributions, and silence suppression. They exhibit distinctively different characteristics than the traditional telephone call traffic.

 


 

Natallia Katenka, University of Michigan

Local-Vote Decision Fusion for Target Detection in Wireless Sensor Networks

 

In this talk, we examine the problem of target detection by a wireless sensor network. Sensors acquire measurements emitted from the target that are corrupted by noise and initially make individual decisions about the presence/absence of the target. We propose the Local-Vote Decision Fusion algorithm, in which sensors first correct their decisions using decisions of neighboring sensors, and then make a collective decision as a network. We show that, for a fixed system false alarm, this local correction achieves significantly higher target detection rate. We examine both distance- and nearest neighbor-based versions of the algorithm for grid and random sensor deployments. Further, an explicit formula that approximates the decision threshold for a given false alarm rate is derived, using limit theorems for random fields. This is joint work with Liza Levina and George Michailidis

 

Contributed Session:

Quality and Process Control Enhancement II

 

Abdyuday Mandal, University of Georgia

Tirthankar Dasgupta, Georgia Institute of Technology

Estimation of Process Parameters to Determine the Optimum Diagnosis Interval for Control of Defective Items

 

The on-line quality monitoring procedure for attributes proposed by Taguchi has been critically studied and extended by a few researchers. Determination of the optimum diagnosis interval requires estimation of some parameters related to the process failure mechanism. Improper estimates of these parameters may lead to incorrect choice of the diagnosis interval and consequently huge economic penalties. In this paper, we highlight both the theoretical and practical problems associated with the estimation of these parameters, and propose a structured approach to solve them. For the so-called Case II model, two estimation methods, one based on Bayesian procedure and the other on the EM algorithm, are developed and compared using extensive simulations. These two methods are demonstrated using a case study from a hot rolling mill. A Bayesian method is proposed for estimation of parameters in Case III. A systematic way to utilize available engineering knowledge in eliciting the prior for the parameters is also discussed.

 


 

Andrea Long, Integritaet

Statistical Event-History Problems for Differential Diagnosis in Clinical Medicine: Application to Apudoma-Caused Hypertention

 

Techniques including discriminant analysis and predictive assessments, such as maximum likelihood and Bayesian methods, have been applied to cross-sectional data on clinic patients, to identify differential diagnosis or to assess reliability of a differential diagnosis when evaluating a new patient. As medical record automation and technologies for time-series measurements on individuals improve, statistical process control techniques and event-history modeling techniques superficially appear promising to aid in differential diagnosis in clinical settings. This paper identifies statistical problems associated with event-history biomedical data made available by advancing technologies. With the clinical objective of differential diagnosis in mind, the advantages and limitations associated with applying SPC and event-history modeling techniques to these data are enumerated. Methodological problems for SPC and event-history modeling are illustrated in the context of differential diagnosis of apudoma among patients presenting with hypertension.

 

LeRoy A. Franklin and William E. Sarrell, Eli Lilly & Company

A Case Study of Batch Manufacturing of a Pharmaceutical Active Ingredient and the Associated Stability Assessment Program

 

The case study presented concerns an Active Pharmaceutical Ingredient (API) that was going off of patent. Because of factors related to its going off of patent, the last 4 lots manufactured had one of the measured parameters beyond the limits of what is called Critical Process Parameters. The details of why that happened, how these 4 lots were then placed into the "stability program" and monitored is the substance of the talk. The modeling, the statistical estimations, the human factors involvement, the presence of a federal agency monitoring the activity, and the management decisions rendered give an engaging account of the complex nature of many chemical manufacturing processes and the role the statistician can and does often play.

 


 

Scott Dickenson, Indiana State University

The Importance of Attribute Recognition in the Quality Analysis and Control Process at Multinational Automotive Parts Suppliers in North America

 

The changing structure of automotive market share in North America has provided a decided nod to non-indigenous producers of automobiles such as the Toyota Motor Corporation which now operates from newer factories and utilizes alternative approaches to manufacturing. As a consequence of this growth there has been significant linear growth in the number of multinational parts suppliers supplying such factories in North America. These firms, with the majority tracing their origins to Japan, have initially relied on the significant and historic relationships developed in their home country to perpetuate their growth. Yet, quality systems and structures that were effective in their home countries have not been universally successful due to various attributes that were not understood when initially developing their North American strategy. It is the impact and understanding of the contribution of such attributes in the quality investigative process at such factories that is to be explored.

An actual quality failure that occurred during the launch activity of a new automotive component at a Japanese parts supplier in the Mid-West is utilized to illustrate the importance of attribute consideration in the quality design and control structure. In this instance, general quality principals of control and statistical capability were insufficient in determining the true root cause of this failure and it was only after all know quantitative methods of analysis were exhausted and non quantifiable causes were considered that the true root cause related to cultural interpretations was revealed. This element was later factored into future failure analysis and design related activity such as failure modes and effects analysis.

 

Plenary Session:       

Edward G. Schilling, Rochester Institute of Technology

Lessons From a Career in Quality

 

There is much to be learned from experience. This presentation follows a career in quality and reveals some of the practical lessons that come out of the experience. It will treat various aspects of real life application of statistical quality control.

 


 

Plenary Session:       

William Q. Meeker, Iowa State University

Using Simulation and Graphics as an Aid in Planning Complicated Experiments

Abstract: The combination of Monte Carlo simulation and graphics provides powerful tools for helping to plan complicated experiments. Although the ideas apply more generally, this talk will describe a collection of methods and procedures that have been developed for planning engineering reliability experiments. Such experiments include life tests, accelerated life tests, repeated measures degradation tests, and accelerated destructive degradation tests. The design of such reliability experiments typically requires answering questions about sample size, length of the test and, for accelerated tests, allocation of test units to different levels of the accelerating variable(s). Models for the data from such experiments must accommodate complications such as random effects, nonlinear estimation, and censoring. As such, standard experimental design tools need to be extended. I will describe methods that employ graphical displays for combinations of large-sample approximations for precision metrics and for the display of simulation results. Simulation will be shown to be a particularly versatile and valuable tool for providing insights into such complicated experimental design problems.

 

Invited Session:         

Halima Bensmail, The University of Tennessee

Bayesian Statistical Models in Pattern Recognition

 

James Wicker, The University of Tennessee

Regularized Mahalanobis distance-based clustering with Genetic algorithm

 

Abstract: We propose a fast Genetic-Algorithm based clustering method that can separate complex structures efficiently. While this was based on the Genetic K-Means algorithm and the Hyperellipsoidal Clustering (HEC) algorithm, we demonstrate how this algorithm can separate more complex structures than the Genetic K-Means algorithm and converge faster than other HEC algorithms. Performance of the algorithm is tested on simulated and real data.

 

Invited Session:         

T.N. Goh, National University of Singapore

Statistics in Asia: Panel Discussion

 

Veronica Czitrom, Statistical Training and Consulting

T.N. Goh, National University of Singapore

Dennis Lin, National University of Sinapore

Bovas Abraham, University of Waterloo

Ai-Chu Wu, Statistical Consulting

 

 

 

 

Invited Session:  

Randy Sitter

Technometrics Session

 

Mu Zhu, University of Waterloo

LAGO: A Computationally Efficient Approach for Statistical Detection

 

Abstract: We study a general class of statistical detection problems where the underlying objective is to detect items belonging to a rare class from a very large database. We propose a computationally efficient method to achieve this goal. Our method consists of two steps. In the first step, we estimate the density function of the rare class alone with an adaptive bandwidth kernel density estimator. The adaptive choice of the bandwidth is inspired by the ancient Chinese board game known today as Go. In the second step, we adjust this density locally depending on the density of the background class nearby. We show that the amount of adjustment needed in the second step is approximately equal to the adaptive bandwidth from the first step, which gives us additional computational savings. We name the resulting method LAGO for “locally adjusted Go-kernel density estimator." We then apply LAGO to a real drug discovery data set and compare its performance with a number of existing and popular methods.

 

David Mease, San Jose State University

Derek Bingham, Simon Fraser University

Latin Hyper-Rectangle Sampling for Computer Experiments

 

Abstract: Latin hypercube sampling is a popular method for evaluating the expectation of functions in computer experiments. However, when the expectation of interest is taken with respect to a non-uniform distribution, the usual transformation to the probability space can cause relatively smooth functions to become extremely variable in areas of low probability. Consequently, the equal probability cells inherent in hypercube methods often tend to sample an insufficient proportion of the total points in these areas. In this talk we introduce Latin hyper-rectangle sampling to address this problem. Latin hyper-rectangle sampling is a generalization of Latin hypercube sampling which allows for non-equal cell probabilities. A number of examples are given illustrating the improvement of the proposed methodology over Latin hypercube sampling with respect to the variance of the resulting stimators. Extensions to orthogonal-array based Latin hypercube sampling, stratified Latin hypercube sampling and scrambled nets are also described.

 

 

 

 

 

 

Contributed Sessions:

Some Applications in Industrial Statistics

 

Joanne Wendelberger, Los Alamos National Laboratory

Investigation of Functional Data Analysis Techniques for Chemical Spectra

 

In many applications, the entity of interest is a function, rather than a single numerical value. Applications involving chemical spectra are introduced to illustrate statistical issues that arise in the analysis of functional data generated by chemical analysis instruments. Functional data analysis tools may be used to visualize and analyze the chemical data. Statistical issues such as quantification of variability, data visualization, dimension reduction, and data comparison are addressed.

 

Jorge Luis Romeu, Alion’s Systems Reliability Center and Syracuse University

Ways of Learning Engineering Statistics after Leaving College

Many engineering curricula, graduate and undergraduate, include (if any) only one or two statistics courses. As a result, engineers do not learn all the statistics they need to work with, during their college life, and have to acquire it after leaving school, while practicing engineers. They then follow different paths, in such quest, including readings (both hard copy and in the Internet), taking short courses, undergoing the certification procedures, mentoring, and hands-on (trial and error), among others. In our talk we will discuss the components and results, of a survey (http://web.syr.edu/~jlromeu/Survey/COTS.html) on ways that practicing engineers acquire statistics knowledge after leaving college (we will first give the survey to the audience, not to bias their responses). The author has submitted the mentioned survey to scores of practicing engineers, as part of his research on this topic. This presentation is also part of the research Romeu is undertaking for an invited, refereed paper and presentation, to be delivered at the International Conference on Statistics Education, ICOTS-7, in Brazil, in July 2006.


 

Tom Burr, David Beddingfield, and Stephen Tobin, Los Almos National Laboratory

The Increasing Role of Computer Models in Nuclear Material Assay

Measurement methods for special nuclear material (SNM) are increasingly relying on computer models at various stages. After a brief overview, we describe two examples where the Monte Carlo N-Particle (MCNPX) code plays an important role in the assay. First, in stand-off measurement of SNM stored in drums in vault-type rooms, MCNPX estimates the neutron count rate at various detector locations as a function of the neutron source strength. This leads to MCNPX-based "predictors" in a weighted least squares regression, and commensurate "error in predictors" issues. Second, MCNP-based bias corrections are considered for a shuffler-assay, which is another type of neutron-based assay in which a physically small, intense active source is shuttered/shuffled back and forth from the item to be assayed, and the resulting delayed neutrons are counted while the active source is away from the assay item. The delayed neutron count rate per unit SNM source strength depends significantly on the neutron transport (energy moderation, absorption and scattering), which can be modeled using MCNP. Therefore, MCNP plays a key calibration role, with new challenges in uncertainty quantification.


 

Jin Dong and Wenjun Yin, IBM China Research Laboratory

Jia Chen, IBM China Research Laboratory and Chinese Academy of Sciences

Retail Trade Area Analysis and Market Potential Forecasting Based on the Statistics of Geographic and Demographic Data Evolvement

 

"Location is everything" is one of the classical principles for retail industry. In another word, how to know exactly the high potential trade areas in a region or a city and forecast their trends is significant to solve retail store or branch site selection problem, especially the trends of those trade areas need to be forecasted by category or sub-category of the merchandise, which is sold to the final customers through retail store or branch. Apparently the accurate and scientific solutions to the above problem will need a quantitative statistics analysis of the ocean geographic and demographic data in a city, of course with the information of competition branch or store included in the model as well.

In this paper, we introduce an ocean data statistics analysis engine - iFAO (IBM Facility Analysis and Optimization Engine), developed in IBM China Research Laboratory, for the trade areas and their potential trend analysis through the statistics of ocean geographic and demographic data evolvement over years in the city. At the start, the iFAO modules (iFAO-view, iFAO-modeler, iFAO-cluster and iFAO-miner) and the operations mechanism are addressed. Then different statistics algorithms (e.g. multivariate regression analysis, time series analysis and genetic algorithms) are incorporated into the experimental analysis based on a 5-year ocean geographic and demographic data evolvement set, where regression analysis is used to establish a linkage between business potential and geographic and demographic information, and then time series analysis is applied for forecasting. Some experimental results are analyzed and demonstrate the business intuitions. Finally we describe the statistics analysis engine (iFAO)'s implementation in one of the biggest commercial banks in China for its retailing branch site location improvement, and some practical data cleaning techniques were adopted to enhance iFAO's adaptability in practice.

 


 

Contributed Sessions:

Product Reliablilty and Development

 

Anupap Somboonsavatdee, University of Michigan

Graphical Estimators of Location and Scale from Probability Plots with Censored Data

 

Probability plots are popular graphical tools for assessing distributional assumptions in reliability and survival analysis. It is common, especially among reliability engineers, to fit a line to the probability plot to estimate the parameters of the log location-scale distributions. The current version of Minitab uses this as the default estimation method. In this talk, we investigate the properties of this quick-and-easy method of estimation with censored data and compare it to maxium likelihood. Specifically, we consider estimators from two types of least squares lines fitted to the probability plots and obtain their asymptotic distributions. Small-sample behavior is studied through simulation for several common choices of failure and censoring distributions.

 

Ellen Barnes, Ford Motor Company

Distilling a Minimal Set of Significant Characteristics from Functional Responses in Engineering Design: A Case Study

One common issue in industrial statistics is that the response of interest is a function versus a univariate response. The key is to find a way of reducing the function to a minimal set of statistics, where the statistics each have engineering significance. This paper will illustrate an example of reducing the output from a new test to characterize air bag performance into a meaningful, minimal set of statistics. The process involved a piecewise non-linear fit of the data. The first portion of the fit utilized a modification of the classic Weibull function, while the second portion involved a quarter of an ellipse. This paper includes alternatives considered and the criteria that were used to select these two functions to model the output, and the methodology used to develop the new component specifications based on the test and the modeled results. The use of the new test and the associated specifications is reducing the "rework" loops in air bag development, thus streamlining product development.


 

James Williams, General Electric Global Research Center

Jeffrey Birch, Virginia Tech

William Woodall, Virginia Tech

Nancy Ferry, DuPont Crop Protection

Statistical Monitoring of Heteroscedastic Dose-Response Profiles from High-throughput Screening

In pharmaceutical drug discovery and agricultural crop product discovery in vivo bioassay experiments are used to identify promising compounds for further research. The reproducibility and accuracy of the bioassay is crucial to be able to correctly distinguish between active and inactive compounds. In the case of agricultural product discovery, a replicated dose-response of commercial crop protection products is assayed and used to monitor test quality. The activity of these compounds on the test organisms, the weeds, insects, or fungi, is characterized by a dose-response curve measured from the bioassay. These curves are used to monitor the quality of the bioassays. If undesirable conditions in the bioassay arise, such as equipment failure or problems with the test organisms, then a bioassay monitoring procedure is needed to quickly detect such issues. In this paper we illustrate a proposed nonlinear profile monitoring method to monitor the variability of multiple assays, the adequacy of the dose-response model chosen, and the estimated dose-response curves for aberrant cases in the presence of heteroscedasticity. We illustrate these methods with in vivo bioassay data collected over one year from DuPont Crop Protection.

Jave Pascual, Washington State University

Accelerate Life Test Planning with Independent Weibull Competing Risks with Known Shape Parameter

We present methodology for accelerated life test (ALT) planning when there are two or more failure modes or competing risks which are dependent on one accelerating factor. We assume that the failure modes have respective latent failure times, and the minimum of these times corresponds to the product lifetime. The latent failure times are assumed to be independently distributed Weibull with known common shape parameter. We present expressions for the Fisher information matrix and test plan criteria. We apply the methodology to ALT of Class-H insulation for motorettes where temperature is the accelerating factor. We also present two-level and 4:2:1 allocation test plans based on determinants and on estimating quantiles or hazard functions.


 

Invited Session:  

Robert Mee, The University of Tennessee

Role of Optimal Design for Quality Improvement in the 21st Century: Panal Discussion

 

Robert Mee, The University of Tennessee

Christopher Natchtsheim, University of Minnesota

G. Geoffrey Vining, Virginia Tech

Jeff Wu, Georgia Institute of Technology

Bradley Jones, SAS Institute Inc.

Dennis Lin, Pennsylvainia State University

 

 

Invited Session:  

Wei Chen, Northwestern University

Computer Experiments and Model Validation Engineering Applications

 

Max Morris, Iowa State University

Leslie Moore and Michael McKay, Los Alamos National Laboratory

Input Uncertainty and Potential-to-Validate: Sampling Plans for Monte Carlo Assessment

 

Abstract: In complex settings, validation of mechanistic computer models is often difficult because appropriate input values are not precisely known. Input uncertainty limits the degree to which models can be realistically validated. The most optimistic pre-assessment of model validity, or “potential-to-validate,” is closely associated with ideas and indices used in probabilistic sensitivity and uncertainty analysis. This talk will review a relevant nonparametric sampling-based approach to ensitivity/uncertainty analysis of computer models, and discuss recent work in input sampling plans that support the assessment of potential-tovalidate.

 


 

Roger Ghanem, University of Southern California

John Red-Horse, Sandia National Laboratory

Alireze Doostan, John Hopkins University

Error budget for the validation of physics-based predictive models

 

Abstract: The significant recent growth in computing resources has ushered in the new field of prediction science where the objective has evolved from solving a system of governing equations, to approximating reality. This new research program encompasses the emerging fields of model validation and uncertainty quantification.

Essentially, one wishes to rely on available knowledge to predict the future evolution of a design quantity, in a useful manner. Available knowledge is typically in the form of physical laws and experimental evidence. Stochastic representations of both measurements and physical models presents a possible venue for combining the two pieces of knowledge in an integral manner. Benefits from such a representation include the ability to allocate resources, in a rational manner, between experimental and numerical efforts, as well as the ability to better mitigate risk in the design

process. A more tangible benefit will be a reduction in the cost of production as reliance on full scale tests is significantly shifted to confident reliance on predictive

models. Another tangible benefit will be the ability to meet design criteria with preset confidence. Recent research on the Polynomial Chaos approach to stochastic computational mechanics has enabled us to develop the concept of an error budget that permits the rational allotment of prediction error to experimental, computational, statistical, and modeling sources. This error budget is the natural evolution of error estimation (originally developed for computational science) to the realm of prediction science. This talk will review the Polynomial Chaos Expansion (PCE) approach to the analysis of stochastic systems. Particular attention will be given to the assumptions underlying this approach as well as to the computational challenges facing its implementation. Methods and algorithms will be detailed for addressing both these issues, thus opening the way to the application of PCE to very large scale engineering systems. The path from stochastic predictions to model validation will also be outlined.

 


 

Wei Chen, Northwestern University

Using Computer Experiments and Understanding the Effect of Model Uncertainty in Engineering Design under Uncertainty

 

Abstract: The effectiveness of using Computer Aided Engineering (CAE) tools to support design decisions is often hindered by the enormous computational costs of complex analysis models, especially when uncertainty is considered. Approximations of analysis models, also known as “metamodels” built upon computer experiments are widely used for design concept exploration and optimization. However, most existing approaches for metamodeling have been developed for deterministic optimization and are not applicable to design under uncertainty. In this talk, recent

developments of using computer experiments and metamodels for engineering design under uncertainty are discussed. An efficient algorithm for constructing optimal design of computer experiments and the techniques for probabilistic sensitivity analysis (PSA) and uncertainty analysis (UA) via the use of metamodels are presented. We also present a methodology developed within a Baysian framework for quantifying the impact of interpolation uncertainty due to the use of metamodels in robust design. The Bayesian prediction interval approach provides a simple, intuitively appealing tool for distinguishing the best design alternative and conducting more efficient computer experiments in robust design.

 

Invited Session:  

Jianjun Shi, University of Michigan

New Advancements in Variation Modeling, Analysis and Control for Complex Systems

 

Li Zeng and Shiyu Zhou, University of Wisconsin

Building Direct Influence Graph for Manufacturing Processes with Complex Topologies

 

Abstract: This paper presents an iterative model building methodology to identify the underlying interaction among operation units in complex manufacturing processes through the integration of advanced statistical techniques in graphical models and engineering insights to manufacturing processes. This technique lays a foundation for effective quality control of processes with complex topologies.

 

Haifeng Xia, Yu Ding, and Jyhwen Wang, Texas A&M University

Bayesian Spatial Model for Form Error Assessment using Multiple Coordinate Sensor Data

 

Abstract: We present a Bayesian spatial model for assessing the form errors using coordinate sensor data. The ability to simultaneously characterize systematic and random form errors is essential to developing reliable conformance checking methods. The resulting Bayesian solution can provide better estimates of form errors and its uncertainty than conventional methods.

 

Yong Chen, University of Iowa

Sensor System Reliabilty Analysis for Manufacturing Variation Control

 

Abstract: Variation source identification using sensor systems is essential for achieving manufacturing quality and productivity improvement. Sensor failure may result in misdetections and false alarms, leading to inferior manufacturing quality and unexpected downtime. The objective of the research is to develop a systematic methodology for sensor system reliability analysis and optimization.

 

Invited Session:  

Russell Zaretzki, The University of Tennessee

Computational Techniques for Statistical Inference

 

William Briggs, Weill Cornell Medical College

Improvements to the ROC Curve: Skill Plots for Forecast Evaluation

 

Abstract: We start by reviewing the ROC curve, a standard method in the literature for evaluating a diagnosis or forecast. Next, the skill score and skill test of Briggs and

Ruppert(2005) are introduced and advantages of this new technique are discussed.

With this background, we apply the skill score to simple discrimination problems with a single variable. In this context, we prove that the skill maximizing decision rule for problems such as classifying patients with a disease coincides with Bayes Rule for optimal classification. This same separation rule is also indicated as optimal by ROC curve analysis. Finally, we address the question of inference for this optimal

point, called x_max, and construct two types of confidence intervals. The first interval is a likelihood ratio interval based on inverting the skill test mentioned above. A second interval based on bootstrapping a logistic regression model is also introduced and a small coverage study is performed to evaluate the precision of the estimated optimal cutoff.

 

Matthew Tom, Emmanuel College

A Test for Two Poisson Posisson Procsses in the Presence of Background Events

 

Testing whether the means of two Poisson random variables have a fixed ratio lambda is a well-known and solved problem. The model has sufficient statistics and conditional inference is possible. If instead each Poisson mean is a mixture of signal parameter and a known background noise parameter then we lose sufficiency and testing the hypothesis that the signal parameters are equal becomes more difficult. In this talk, we will look at different exact tests we can use to compare signal parameters despite the background noise. As an example, we will look at an application from cosmic ray particle physics.

 


 

Russell Zaretzki, The University of Tennessee

A Parametric Bootstrap Likelihood Ratio Statistic for time censored data with applications in Reliablity

 

Abstract: Building on the work of Jeng, Lahiri and Meeker(2005), we consider bootstrap based likelihood ratio inference for time censored data. A simulation study based on data from a Weibull distribution under Type I censoring computes finite sample coverage probabilities for bootstrap based inference of a modified signed root statistic. The results are contrasted with the better performing methods discussed by Jeng, Lahiri and Meeker such as the ordinary bootstrap signed root statistic and the bootstrap-t. Heuristic explanations are given to explain why the modified bootstrap may outperform the ordinary bootstrap.

 

 

Contributed Session:    

Computer Experiment and Control Chart Related Applications

 

Zhiguang Qian, Georgia Institute of Technology

Yasue Amemiya, IBM Research

A Structural Equation Method for Modeling Data Center Thermal Distribution

Temperature management is a key in designing and running a reliable data center with many computer equipments operating constantly and generating heat. How different configurations affect the data center thermal distribution is largely unknown. This is because the physical thermal process is complex, depending on many factors, and detailed temperature measurements are not monitored in actual data centers. It is possible to build physics-based mathematical models, implemented in computer code, to study the air movement and temperature distribution mechanisms. A run in such a computer experiment under a given set of conditions takes several days, requiring the stabilization of the algorithm with a large number of reference points. Hence, the use of an efficient and informative experimental design is necessary. A statistical method based on latent variables is introduced for analyzing the multivariate temperature readings produced by the computer experiment, and for building a surrogate model to be used for prediction. A two-stage estimation procedure is developed for the proposed latent variable model by making use of sufficient statistics and ordinary least square estimation. Also discussed is a method for obtaining practical choices of factor levels for a given set of physical and usage requirements.


 

Bianca Maria Colosimo, Politecnico de Milano

Massimo Pacella, Universita’ degli Studi di Le Lecce

Quirico Semeraro, Politecnico di Milano

Quality Control of Geometric Features: Monitoring Roundness Profiles Obtained by Turning

 

Manufacturing processes leave on the machined surface a specific "fingerprint" of the process used, which can be usefully adopted to improve the quality control strategy (i.e., to reduce the time required to detect out-of-control conditions). Approaches proposed up to now in the literature are mainly devoted to monitoring simple signatures, where data measured on the profile are not autocorrelated. Unfortunately, most of the times data collected on a machined profile are autocorrelated because they are obtained in similar condition of the machining process and because they are related to local properties of the material machined. This paper presents a novel method for monitoring bi-dimensional profiles when the autocorrelation structure is modeled as a part of the manufacturing signature. The proposed method is based on combining a regression model with autocorrelated errors to a multivariate Hotelling T2 control chart and it is applied to real process data in which the roundness of items obtained by turning has to be monitored. A simulation study indicates that the proposed approach outperforms competing method (based on monitoring the out-of-roundness value for each profile) in terms of the average number of samples required to detect out-of-control conditions.

J. Brooke Marshall, Virginia Tech

A Wavelet-Based Method for the Prospective Monitoring of Disease Incidence Counts in Space and Time

Abstract:  Statistical process control (SPC) methodology is commonly used in the area of public-health surveillance. A specific application in the field of epidemiology is the use of Cumulative Sum (CUSUM) control charts to monitor disease occurrences in space and time. Using CUSUM charts in prospective monitoring allows for quicker detection of disease clusters so that preventative measures can be taken. Here we present a prospective method for monitoring a profile of disease occurrences in a geographical region. In this method, a surface of incidence counts is modeled over time in the region of interest. This surface is modeled using Poisson regression where the regressors are wavelet functions from the Haar wavelet basis. The surface is estimated each time new incidence data is obtained using both past and current observations, weighing current observations more heavily. The flexibility of this method allows for the detection of several types of changes in the incidence surface through the use of (CUSUM) control charts.


 

Plenary Speaker

Thomas Mason, Oak Ridge National Laboratory

The Spallation Neutron Source: Scientific Opportunitites and Challenges in Data Analysis and Visualization

 

The Spallation Neutron Source will use an accelerator to produce the most intense beams of pulsed neutrons in the world when it is complete in June 2006. It will serve a diverse community of users with interests in condensed matter physics, chemistry, engineering materials, biology, and beyond. The combination of improved source intensity and a new generation of high performance scattering instruments will produce structural and dynamic information of greater quality, and in much greater volumes, than has previously been available. The scientific opportunities and the challenges posed by these new capabilities will be described together with the hardware and software underpinning the science.

 

Plenary Speaker

Way Kuo, The University of Tennessee

Issues Related to Reliability of Nanoelectronics

 

Nanoelectronics is a driving force for strong economic growth in the U.S., and some analysts predict that its impact will bring about the next industrial revolution. In the 2005 National Academy’s publication of Keck Futures Initiative, yield/reliability is cited as the key element of the success of nano fabrication and manufacturing. However, very little actual research and development has been conducted on yield/reliability assessment and improvement in nanoelectronics. One reason for this dearth of research is that reliability modeling in nanoelectronics presents an interdisciplinary subject that heavily involves new physics phenomena and statistics. In nanoelectronics, feature sizes are so small that it may affect global behavior and cause system failure, and, consequently, reliability modeling of both the time-dependent and the spatio-temporal evolution of nano defects must take into consideration. All these issues will be addressed in this talk.

 

Invited Session

Gwen Stimely, Minitab, Inc.

Empowering Non-Statisticians with Statistical Thinking and Statistical Tools

 

Angie Patterson, GE’s Global Research

What does it mean to be an “Empowering Statistician”?

 

Abstract: The expectation to empower non-statisticians with statistical thinking, methods and tools is enough to put statisticians outside of their comfort zone. After all... how do you "deliver" empowerment? And when/where did we (statisticians) get trained to do this? Through a case study at General Electric, we'll discuss a model for empowerment, the benefits, and the ongoing role of the statistician.

 

 

Bill Parr, University of Tennessee

Can Statisticians Be Effective in Educating Others in Statistical Thinking?

 

Abstract: Long ago, W. Edwards Deming cautioned us that the need was for large numbers of statistically literate managers, not for a radical increase in the numbers of people with advanced degrees in statistics. We have seen, in the last two decades, a major (if partial) move in this direction with Six Sigma. We examine trends in the actual activity of professional statisticians in industry and government, how these have been affected by the Six Sigma movement, and look at recommendations on how to deal with resistance, how to effectively educate others in the use of statistical thinking, and on what must change to raise the level of statistical thinking in industry and government.

 

Ronald Snee, Tunnell Consulting

Accepting Shewhart’s Challenge – Developing Statistically – Minded Leaders

 

Abstract: Almost seven decades ago Walter Shewhart challenged us with  admonishment that “The long-range contribution of statistics depends not so much on getting a lot of highly trained statisticians into industry as it does in creating a statistically minded generation of physicists, chemists, engineers and others who will in any way have a hand in developing and directing the productive processes of tomorrow." The importance, benefits and implementation of such action is the subject of this presentation. Attention is focused on how to develop statistically minded leaders. “Leader” is interpreted broadly to mean anyone who is working to improve on how the organization runs its business. Needs of the customer – leaders of our organizations – are addressed. Benefits and fears on the parts of non-statisticians and statisticians alike are discussed. It is argued that one effective method for enabling leaders to make greater use of statistical thinking is to promote the widespread use of the DMAIC approach – Define, Measure, Analyze, Improve, and Control - to process improvement and problem solving.

 

Invited Session

Jeroen de Mast, University of Amsterdam

Statistics in European Business and Industry

 

Fabrizio Ruggeri, CNR-IMATI

On the Reliability of Repairable Systems: Methods and Applications

 

Abstract: Repairable systems subject to minimal repair are those systems whose reliability is the same just before a failure and after the corresponding repair. Failures of such systems are often described by means of non-homogeneous Poisson processes (NHPP). We present some results and illustrate them in some case studies.

 


 

Jeroen de Mast, University of Amsterdam

Hypothesis generation in improvement projects

 

Abstract: In quality improvement projects — such as Six Sigma projects — an exploratory phase can be discerned, during which possible causes, influence factors or variation sources are identified. In a later, confirmatory phase the effects of these possible causes are experimentally verified. Whereas the confirmatory phase is well understood, in both the statistical sciences and philosophy of science, the exploratory phase is poorly understood. This paper aims to provide a framework for the type of reasoning in the exploratory phase by reviewing relevant theories in philosophy of science, artificial intelligence and medical diagnosis. Data-driven, explanation-driven (or abductive) and coherence-driven discovery will be discussed. Furthermore, the presentation provides a classification and description of approaches that could be followed for the identification of possible causes. Finally, the theory and practice of exploratory data analysis will be briefly reviewed.

 

Peter Goos and J.M. Lucus, Antwerp University

Optimal two-level split-plot designs

 

Abstract: Split-plot designs are very often used in industrial experimentation, both advertently and inadvertently. The present paper focuses on the optimal design of two-level split-plot designs for the estimation of main effects. In doing so, the attention is not restricted to regular split-plot designs, but non-regular split-plot designs with odd and heterogeneous whole plot sizes are studied too. Simple strategies for outperforming completely randomized designs in terms of A-, D-, G- and V-efficiency are presented, and conditions for optimally arranging the runs of two-level designs in split-plot designs with given numbers and (even and odd) sizes of whole plots are given.

 


 

Invited Session

Karan Singh, University of North Texas

Statistical Methods for Analysis of Microarray Data

 

Al Bartolucci and David B. Allison, University of Alabama at Birmingham

Sejong Bae and Karan P. Singh, University of North Texas

Illustrating the Usefulness of a Mixture Model for Analysis of Microarray Gene Expression Data

 

Abstract: There is no doubt that the analysis of microarray data remains a challenge as one wish to investigate the possibility of expressive genes in a sample of thousands of such genes. Naturally the issue of multiplicity arises as one examines the significance of large numbers of genes. Recently, one of the coauthors, DBA, and colleagues developed a mixed model approach to this very problem with successful application to a mouse data model. In this particular setting one circumvents the false positive issue using a mixture distribution of the p-values. Simultaneously one addresses several issues such as 1) whether we have any statistically significant evidence in any of the genes, 2) what is the best estimate of the number of genes in which there is a true difference in gene expression?, 3) is there a threshold which signals a criteria above which genes should be investigated further?, and 4) what is the possible proportion of false negatives in those genes declared “not interesting” ?

This paper investigates this procedure further and illustrates its usefulness and relevance in the current work on microarray data analysis.

 

Yoonkyung Lee, Ohio State University

A Sparse Solution Approach to Gene Selection for Cancer Diagnosis Using Microarray Data

Abstract: Cancer diagnosis or prognosis based on gene expression profiles has been studied as a potentially more accurate means of predicting the disease status than standard methods based on histological observations. Presence of much larger number of genes than the sample size in the problem poses a challenge in building reliable and interpretable classification schemes. This talk will present a sparse solution approach for simultaneous gene selection and classification via component penalization of Support Vector Machines. The proposed method selects relevant genes in a principled way by taking into account their joint effects, remedying the limitation of common approaches of filtering genes marginally. Real data analysis will be given for illustration of the method and related issues will be discussed.

 


 

David B. Dahl, Texas A&M University

Using Clustering to Enhance Hypothesis Testing

 

Abstract: Both multiple hypothesis testing and clustering have been the subjects of extensive research for genomic and other high dimensional data, yet they have traditionally been treated separately. We propose a hybrid statistical methodology that uses clustering information to increase testing sensitivity. A test for an object that uses data from all objects clustered with it will be more sensitive than one that uses data from this object in isolation. While the true clustering is unknown, there is increased power if the clustering can be estimated relatively well. We first consider a simplified setting which compares the power of the standard Z-test to the power of a test using an estimated cluster. Theoretical results show that if the cluster is estimated sufficiently well, the new procedure is more powerful. In the setting of gene expression data, we develop a model-based analysis using a carefully formulated conjugate Dirichlet process mixture model. The model is able to borrow strength from objects likely to be clustered. Simulations reveal this new method performs substantially better than its peers. The proposed model is illustrated on a large microarray dataset.

 

Arnold Saxton, Brynn Voy, and Michael Langston, University of Tennessee and Oak Ridge National Laboratory

Statistical Tools are Needed for Microarray Expression and Co0expression Information

 

Abstract: Microarray technology provides a measure of the activity of virtually all genes in an organism's genome, for example the estimated 25,000 genes in the mouse genome. Typical experiments produce millions of observations, and a complex statistical process has evolved to extract meaning from the data. The technology is noisy, with CV's of 100%, and correction for background noise, removal of outliers, and loess correction of scanned intensity readings are essential. Following this "normalization", standard statistical models can be used to identify treatment differences (differential expression), and correction for multiple testing (25,000 tests!) is clearly needed. We will briefly describe the statistical procedures we have developed for differential expression. We will then discuss our current interests in extracting coexpression information to study multivariate biological pathways activated in response to a treatment or condition. Note that one array measures simultaneous activities of ~ 22,000 genes, and several arrays then allow a 22,000 by 22,000 correlation matrix to be estimated. We have used graph algorithms to identify "cliques", groups of genes that are strongly and completely inter-correlated. These cliques then must be compared among experimental treatments. The many statistical problems that arise will be illustrated.

 


 

Don Kulasiri, Lincoln University

A Review of Evolving Clustering Methods for Microarray data Analysis

 

Abstract: Microarray data analysis involves various statistical and computational methods including principal component analysis, k-means clustering, neural networks, and selforganised maps. We review these methods within the context of microarray data analysis. An application of evolving clustering methods based on neural networks is discussed.

 

Contributed Session:

Design of Experiment Techniques

 

Frederick K.H. Phoa, University of California

Two Applications of Using Quaternary Code to Nonregular Designs

The use of quaternary codes to derive nonregular designs was first studied by Xu and Wong (2005), who proposed a procedure to generate designs with generalized resolution 3.5 and maximized number of columns. Here we present a simple rule that maximizes the number of columns of a design with generalized resolution 4.0 through the usage of quaternary codes. Furthermore, an algorithm is derived from quaternary code to generate the 2^(n-2) designs which have generalized minimum aberration among all possible designs with the same number of runs and factors, both regular and nonregular designs, and have higher generalized resolution than regular designs. Examples of designs with 16, 32, 64, 128, 256, 512 and 1024 runs are presented. Further extensions on these two applications will be discussed.

R. Selvi, Guindy Anna University

Quality Enhancement in Car Wiper Motors Applying Shainin DOE Approach

In the quest for continuous quality improvement of both products and processes experimental design based on shainin’s approach a recent one plays a major role in the manufacturing community. This project presents a simple methodology in a step-by-step manner for shainin’s components search and variables search method, in terms of uncovering the key product variables or factors which influence a response or quality characteristic of interest. The objective of the project is to identify the Red X, Pink X, and Pale pink X variables and study the interactions among the key variables using variable search method. In order to illustrate the potentiality of this powerful methodology as a problem solving tool, a case study is carried out in wiper motor, at one of the leading auto electrical parts manufactures in Chennai. This study demonstrates that shainin’s quality control methodology is very practical and easily executable one in a variety of settings making it the most approachable among all the existing quality techniques.

 

 

 

Contributed Session:

Outliers Issues and Other Topics in Multivariate Methods

 

David Drain and Elizabeth A. Cudney, University of Missouri

A Statistician’s View of the Mahalanobis-Taguchi System

The Mahalanobis-Taguchi system (MTS) has been promoted as an engineering tool to detect outliers and also as a means to reduce the number of variables that must be measured to make a decision. Recent applications have been documented in a variety of industrial applications ranging from semiconductor processing to the automotive industry. This presentation begins with an overview of the method and an example of its application in the automotive industry. We then describe how an industrial statistician might approach the same problem and compare the results obtained from the two approaches. MTS has become widely known in the engineering community and statisticians cannot afford to ignore this new wave interest in quantitative engineering methodology. However, MTS has met with some controversy in the statistical community where criticisms of the technique and its applications have been raised. We discuss some of these concerns and suggest more appropriate alternatives where we believe necessary.

Hua Fang, Gordon P. Brooks, Mario L. Rizzo, and Robert S. Barcikowski, Ohio University

An Empirical Power Analysis of Multilevel Linear Model under Three Covariance Structures in Longitudinal Data Analysis

This paper examines the empirical power of multilevel linear model (MLM) under three covariance structures in longitudinal data analysis. The three covariance structures are called random slope with homogeneous level-1 variance, unstructured and first-order autoregressive. A stacked SAS macro (Fang, 2006) is written to generate standard hierarchical multivariate data and to compute power under each covariance structure. The power is examined by varying correlation, reliability, effect size, and ratio of group sample size to time points. The bootstrap estimates for the fixed treatment effect are calculated under each covariance structure. Power patterns and bootstrap estimates under each covariance structure are compared through tables and figures. The conclusion discusses importance of covariance specification in the application of MLM to the longitudinal data analysis.


 

Kostas Triantis, Virginia Tech University

Finding Outliers in Multivariate Regression: Simply and Not So Simply

More and more, there is an interest in fitting multivariate regression models in manufacturing and service sector applications. Some of this has been in efficiency analysis where two or three outputs are a function of several inputs. However, dealing with individual or subsets of outliers in a multivariate regression situation has not been automated in statistical software. The intent of this research is to take a couple of examples from the literature and try several simple and not so simple approaches to identify subsets of outlier. Simple approaches might include using univariate robust regression methods, multivariate chi-square probability plots on residuals or robust multivariate chi-square probability plots, and principal component analysis or robust principal component analysis on X and Y appended together as Z. Not so simple approaches might include robust multivariate regression, bootstrapping, or a fuzzy clustering strategy that identifies subsets and then tests those subsets with multivariate regression outlier diagnostics and combinatorial analysis. Practical suggestions will be made in light of the findings, but an oxymoron exists here between simple and not so simple.

Xinwei Deng, Georgia Institute of Technology

Principal Directions for Anomaly Detection

We propose a novel method to detect outliers among multivariate observations. The proposed method chooses a new coordinate system for the data set such that the greatest departure of potential outliers from the others by any projection of the data set comes to lie on the first axis (then called the first principal direction for anomaly, PDA), the second greatest on the second axis, and so on. The method can be naturally extended to account for heterogeneity among observations. We also study the theoretical properties of the new method. Simulations and a real example in financial service demonstrate the competitive performance of our method when compared with other popular techniques.