Fourth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'04)
Download PDF

Abstract

In this paper, we propose a combination of logistic regression and genetic algorithm for the association study of the binary disease trait. We use a logistic regression model to describe the relation of multiple SNPs, environments and the target binary trait. The logistic regression model can capture the continuous effects of environments without categorization, which causes the loss of the information. To construct an accurate prediction rule for binary trait, we adopted Akaike information criterion (AIC) to find the most effective set of SNPs and environments. That is, the set of SNPs and environments that gives the smallest AIC is chosen as the optimal set. Since the number of combinations of SNPs and environments is usually huge, we propose the use of the genetic algorithm for choosing the optimal SNPs and environments in the sense of AIC. We show the effectiveness of the proposed method through the analysis of the case/control populations of diabetes patients. We succeeded in finding an efficient set to predict types of diabetes and some SNPs which have strong interactions to age while it is not significant as a single locus.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles