Completed on 10 Nov 2014 by Seunghak Lee.
Login to endorse this review.
Author response is in blue.
This paper presents a method for detecting linear and non-linear genetic effects on phenotypic traits using lasso. Specifically, the proposed method is a two-step procedure. In the first step, the method detects only main effects using lasso; in the second step, pairs of genetic variants with main effects (found in the first step) are added to the model used in the first step. As a result, one can detect genetic variants with main effects as well as pairwise interaction effects on the trait of interest.
Zhao, P., and Yu, B. "On model selection consistency of Lasso." The Journal of Machine Learning Research 7 (2006): 2541-2563. Zou, H. "The adaptive lasso and its oracle properties." Journal of the American statistical association 101.476 (2006): 1418-1429. Meinshausen, Nicolai, and Bin Yu. "Lasso-type recovery of sparse representations for high-dimensional data." The Annals of Statistics (2009): 246-270.
Park, M., and Hastie, T. Regularization path algorithms for detecting gene interactions. Department of Statistics, Stanford University, 2006. Wu, J., et al. "Screen and clean: a tool for identifying interactions in genome‐wide association studies." Genetic epidemiology 34.3 (2010): 275-285. Lee, S., and Xing, E. P. "Leveraging input and output structures for joint mapping of epistatic and marginal eQTLs." Bioinformatics 28.12 (2012): i137-i146.
To motivate the models in Eq. 2.1 and 2.2, it would be great to either add references or biological motivations. Eq. 2.3 needs a reference.
It would be useful to specify lasso model used in the paper, and how such a problem could be solved (i.e., algorithm used in the experiments).
One of the most popular ways to select lambda parameter is cross validation, BIC or AIC. Comparison between these techniques and the proposed method for lambda selection in page 4 (below Eq. 2.3) would be useful.
In practical genome-wide association studies, it is important to control false positives. It would be useful to demonstrate that the proposed method can control false positives under a certain user-specified level (e.g. FDR 0.05).
In the experiments, I would suggest comparing between the proposed method and existing methods for detecting non-linear interaction effects.
Furthermore, analysis on a real dataset would improve the paper.
Minor comments: 1. Below Eq. 2.2, it was argued that var(epsilon)=0.3 is a realistic range for highly heritable complex traits. Reference would be useful here. 2. In page 3, “(see figures)” needs to include figure numbers. 3. In page 4, explanation about sample size settings is missing.
Level of interest An article of limited interest Quality of written English Needs some language corrections before being published Statistical review Yes, and I have assessed the statistics in my report. Declaration of competing interests I declare that I have no competing interests.
Authors' response to reviews: (http://www.gigasciencejournal.com/imedia/1055861928166366_comment.pdf)