Table 1The sensitivity, PPV, and F1 values in four scenarios of the simulation studies.As the number of genes p increases, the number of parameters to be estimated increases rapidly. Due to the sparsity of partial correlation coefficient matrix, selleck chem the number of true zero parameters increases much more than that of true non-zero parameters, causing the chance of estimating a zero parameter to be non-zero increases more than that of estimating a non-zero parameter to be zero. As presented in Table 1, although the sensitivity of LEP did not change significantly as p increasing from 10 to 20, its PPV reduced obviously from ~90% in scenario 1 and 2 to ~80% in scenario 3 and 4. The LASSO and SCAD showed similarly.
Note that beside the penalty term, the performances of different methods also depend on the true value of covariance matrix ��, which was generated at the beginning of each scenario.Across all the scenarios, although LASSO reached the highest sensitivity, its PPV was far lower than that of SCAD and LEP, which means that LASSO could identify more gene regulatory relationships, but there might be many false positives. Among these three methods, LEP achieved the highest PPV with its sensitivity controlled at similar level to that of SCAD. Its F1 score also reached the highest value in scenario 1, 3, and 4. More importantly, using the algorithm proposed by [14], LEP was the fastest, whose computation time was almost 1/18, 1/10, 1/7 and 1/9 of LASSO and 1/5, 1/4, 1/3, and 1/3 of SCAD in four scenarios, respectively.
For intuitive illustration, we also plotted the relative frequency matrix for each method in each scenario, where the (i, j)-element indicates the relative frequency of non-zero estimates among 100 repetitions. The darker the color is, the higher the frequency of non-zero estimates is. The true partial correlation coefficient matrix was shown in the first panel of each row in Figure 1. From Figure 1, we can see that the color of LASSO is significantly darker than others, especially the Carfilzomib truth, which means that LASSO estimated many true zero parameters to be non-zero, resulting in many false positives. Comparing to LASSO, the SCAD plot became much closer to the truth and LEP made a further improvement upon the SCAD plot.Figure 1The relative frequency matrices in four scenarios of the simulation studies. The first, second, third and forth rows correspond to scenario 1, 2, 3 and 4, respectively.3.2. A Real Data ExampleIn this section, the publicly available gene expression dataset “type”:”entrez-geo”,”attrs”:”text”:”GSE6536″,”term_id”:”6536″GSE6536 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=”type”:”entrez-geo”,”attrs”:”text”:”GSE6536″,”term_id”:”6536″GSE6536) was investigated.