### Advancing Clarity and Scale in Statistical Computing

Organizer: George Ostrouchov, Oak Ridge National LaboratorySession Chair: John Lu, NIST

**Statistical Algorithms for Polyenergetic Breast Tomosynthesis Image Reconstruction**

Julianne Chung

University of Maryland

Digital tomosynthesis imaging is becoming increasingly significant in a variety of medical imaging applications. Tomosynthesis imaging involves the acquisition of a series of projection images over a limited angular range, and reconstruction results in a pseudo-3D representation of the imaged object. In breast cancer imaging, tomosynthesis is a viable alternative to standard mammography; however, current algorithms for image reconstruction do not take into account the polyenergetic nature of the x-ray source beam entering the object. This results in inaccuracies in the reconstruction, making quantitative analysis challenging and allowing for beam hardening artifacts. We develop a mathematical framework based on a polyenergetic model and develop statistically based iterative methods for polyenergetic tomosynthesis reconstruction for breast imaging. Large-scale problems pose significant computational challenges, and implementation concerns are discussed.

**Detecting Tiny Signals in Massive Data from High-Energy Physics**

Karen Kafadar

Indiana University

Experiments in high-energy physics provide terabytes of data, from which critical information about the state of matter, governed by the theory outlined in the "Standard Model", must be extracted. Opportunities abound for increased efficiencies in approaches to the data, from the design of experiments, to the collection of data, and finally to analysis and inference. Due to the massive amounts of data, from various sources (different experiments from different collaborations, experiment-based simulations, etc.), new ways of analyzing the data to answer questions of interest are devised. This talk describes the framework for these experiments and illustrates methods for analyzing massive data sets from such experiments (with some mention of data sets from genomics and the Internet). This is joint work with Robert L. Jacobsen from the University of California - Berkeley.

**Data-Parallel Statistical Computing: a Model Based Clustering Example**

George Ostrouchov

Oak Ridge National Laboratory

Recent changes in computer hardware design are forcing a data-parallel approach in new algorithm development. To receive full benefit from future hardware improvements, statistical algorithms and software need to be designed with data-parallel scalability. Scalable algorithms can take advantage of today's large highly-parallel computing resources to tackle terabyte data sets. An example of a data-parallel algorithm that was developed for model based clustering will be discussed. The initial implementation is for k-means clustering that uses multiple random starts for determining an appropriate k. Considerations of parallel I/O, data reader, location of data, location of intermediate results, and possible need for interactivity will be discussed.