Avoiding pitfalls in L1-regularised inference of gene networks.

Tjärnberg A, Nordling TE, Studham M, Nelander S, Sonnhammer EL

Mol Biosyst 11 (1) 287-296 [2015-01-00; online 2014-11-07]

Statistical regularisation methods such as LASSO and related L1 regularised regression methods are commonly used to construct models of gene regulatory networks. Although they can theoretically infer the correct network structure, they have been shown in practice to make errors, i.e. leave out existing links and include non-existing links. We show that L1 regularisation methods typically produce a poor network model when the analysed data are ill-conditioned, i.e. the gene expression data matrix has a high condition number, even if it contains enough information for correct network inference. However, the correct structure of network models can be obtained for informative data, data with such a signal to noise ratio that existing links can be proven to exist, when these methods fail, by using least-squares regression and setting small parameters to zero, or by using robust network inference, a recent method taking the intersection of all non-rejectable models. Since available experimental data sets are generally ill-conditioned, we recommend to check the condition number of the data matrix to avoid this pitfall of L1 regularised inference, and to also consider alternative methods.

Affiliated researcher

PubMed 25377664

DOI 10.1039/c4mb00419a

Crossref 10.1039/c4mb00419a