Tjärnberg A, Nordling TE, Studham M, Sonnhammer EL
J. Comput. Biol. 20 (5) 398-408 [2013-05-00; online 2013-05-07]
Gene regulatory network inference (that is, determination of the regulatory interactions between a set of genes) provides mechanistic insights of central importance to research in systems biology. Most contemporary network inference methods rely on a sparsity/regularization coefficient, which we call ζ (zeta), to determine the degree of sparsity of the network estimates, that is, the total number of links between the nodes. However, they offer little or no advice on how to select this sparsity coefficient, in particular, for biological data with few samples. We show that an empty network is more accurate than estimates obtained for a poor choice of ζ. In order to avoid such poor choices, we propose a method for optimization of ζ, which maximizes the accuracy of the inferred network for any sparsity-dependent inference method and data set. Our procedure is based on leave-one-out cross-optimization and selection of the ζ value that minimizes the prediction error. We also illustrate the adverse effects of noise, few samples, and uninformative experiments on network inference as well as our method for optimization of ζ. We demonstrate that our ζ optimization method for two widely used inference algorithms--Glmnet and NIR--gives accurate and informative estimates of the network structure, given that the data is informative enough.