Joint Research Conference

June 24-26, 2014

Regularized Regression and Variable Selection in Genomic and Financial Health Prediction

Abstract:

Classical linear regression requires that number of rows N be greater than the number of columns M in the explanatory variables matrix. We discuss situations where M>N or M>>N from genomics and financial analysis. Two approaches for selecting the most informative regressors among the candidate variables in linear regression models and quadratic multivariate Taylor expansion models are compared. It is found that differential stage-wise regression algorithm and Lasso algorithm provide a good tool for effective variable selection both for regression and classification problems. Stability of the two algorithms has been also examined. Subsequently, the variables are used in a classification model for predicting biological and financial health in two real problems with actual data. Crossvalidation is used to compare performance of the classification models and classification algorithms including Artificial Neural Network, Support Vector Machines and Fisher linear classifier. On an example of two very distant application areas we demonstrate general applicability of both regularization techniques and the investigated classification models.