Advancing Clarity and Scale in Statistical Computing
Organizer: George Ostrouchov, Oak Ridge National LaboratorySession Chair: John Lu, NIST
Statistical Algorithms for Polyenergetic Breast Tomosynthesis Image Reconstruction
Julianne Chung
University of Maryland
Digital tomosynthesis imaging is becoming increasingly significant in a variety of medical imaging applications. Tomosynthesis imaging involves the acquisition of a series of projection images over a limited angular range, and reconstruction results in a pseudo-3D representation of the imaged object. In breast cancer imaging, tomosynthesis is a viable alternative to standard mammography; however, current algorithms for image reconstruction do not take into account the polyenergetic nature of the x-ray source beam entering the object. This results in inaccuracies in the reconstruction, making quantitative analysis challenging and allowing for beam hardening artifacts. We develop a mathematical framework based on a polyenergetic model and develop statistically based iterative methods for polyenergetic tomosynthesis reconstruction for breast imaging. Large-scale problems pose significant computational challenges, and implementation concerns are discussed.
Detecting Tiny Signals in Massive Data from High-Energy Physics
Karen Kafadar
Indiana University
Experiments in high-energy physics provide terabytes of data, from which critical information about the state of matter, governed by the theory outlined in the "Standard Model", must be extracted. Opportunities abound for increased efficiencies in approaches to the data, from the design of experiments, to the collection of data, and finally to analysis and inference. Due to the massive amounts of data, from various sources (different experiments from different collaborations, experiment-based simulations, etc.), new ways of analyzing the data to answer questions of interest are devised. This talk describes the framework for these experiments and illustrates methods for analyzing massive data sets from such experiments (with some mention of data sets from genomics and the Internet). This is joint work with Robert L. Jacobsen from the University of California - Berkeley.
Data-Parallel Statistical Computing: a Model Based Clustering Example
George Ostrouchov
Oak Ridge National Laboratory
Recent changes in computer hardware design are forcing a data-parallel approach in new algorithm development. To receive full benefit from future hardware improvements, statistical algorithms and software need to be designed with data-parallel scalability. Scalable algorithms can take advantage of today's large highly-parallel computing resources to tackle terabyte data sets. An example of a data-parallel algorithm that was developed for model based clustering will be discussed. The initial implementation is for k-means clustering that uses multiple random starts for determining an appropriate k. Considerations of parallel I/O, data reader, location of data, location of intermediate results, and possible need for interactivity will be discussed.