About the Structural Inference Methods
The structural inference methods benchmarked with StructInf are collected from multiple discinplinaries such as biology and computer science. We follow the original implementation of these methods, but with slight modification to intergrating data loading and metric calculations. In the following paragraphs, the implementation of the structural inference methods in this work will be discussed in details.
Structural Inference Methods in this Work
Methods |
Paper |
Official Implementation |
Our Implementation |
|---|---|---|---|
ppcor: An R Package for a Fast Calculation to Semi-partial Correlation Coefficients (ppcor) |
/src/models/ppcor |
||
TIGRESS: Trustful Inference of Gene REgulation using Stability Selection (TIGRESS) |
/src/models/TIGRESS |
||
ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context (ARACNe) |
/src/models/ARACNE |
||
Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles (CLR) |
/src/models/CLR |
||
Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures (PIDC) |
/src/models/PIDC/ |
||
Inferring Causal Gene Regulatory Networks from Coupled Single-Cell Expression Dynamics Using Scribe (Scribe) |
/src/models/scribe |
||
dynGENIE3: dynamical GENIE3 for the inference of gene networks from time series expression data (dynGENIE3) |
/src/models/dynGENIE3 |
||
Inference of gene regulatory networks based on nonlinear ordinary differential equations (XGBGRN) |
/src/models/GRNs_nonlinear_ODEs |
||
Neural Relational Inference for Interacting Systems (NRI) |
/src/models/NRI |
||
Amortized Causal Discovery: Learning to Infer Causal Graphs from Time-Series Data (ACD) |
/src/models/ACD |
||
Neural Relational Inference with Efficient Message Passing Mechanisms (MPM) |
/src/models/MPM |
||
Iterative Structural Inference of Directed Graphs (iSIDG) |
/src/models/iSIDG |
⋆ Methods based on Classical Statistics
Unless otherwises specified, the following args are used to select the trajectories to be used for evaluation:
parser = add_option(parser, c("--data-path"), type="character", default="/work/projects/bsimds/backup/src/simulations/",
help="The folder where data are stored.")
parser = add_option(parser, c("--save-folder"), type="character", default="",
help="The folder where resulting adjacency matrixes are stored.")
parser = add_option(parser, c("--b-portion"), type="numeric", default=1.0,
help="Portion of data to be used in benchmarking.")
parser = add_option(parser, c("--b-time-steps"), type="integer", default=49L,
help="Portion of time series in data to be used in benchmarking")
parser = add_option(parser, c("--b-network-type"), type="character", default="",
help="What is the network type of the graph.")
parser = add_option(parser, c("--b-directed"), action="store_true", default=FALSE,
help="Default choose trajectories from undirected graphs.")
parser = add_option(parser, c("--b-simulation-type"), type="character", default="",
help="Either springs or netsims.")
parser = add_option(parser, c("--b-suffix"), type="character", default="",
help='The rest to locate the exact trajectories. E.g. "50r1_n1" for 50 nodes, rep 1 and noise level 1. Or "50r1" for 50 nodes, rep 1 and noise free.')
ppcor
We use the official implementation of ppcor from the R package with a customized wrapper. Our wrapper will parse multiple arguments to select a set of targeted trajectories for inference, transform trajectories into a suitable format, feed each trajectory into the ppcor algorithm, and store the output into designated directories. Our implementation can be found at /src/models/ppcor in the provided Anonymous GitHub repository. The method is implemented in R with the help of NumPy Python package to store generated trajectories, reticulate from https://github.com/rstudio/reticulate to load Python variables into the R environment, stringr from https://stringr.tidyverse.org for string operation, and optparse from https://github.com/trevorld/r-optparse to produce Python-style argument parser.
TIGRESS
We use the official implementation of TIGRESS by the author at https://github.com/jpvert/tigress with a customized wrapper. Our wrapper will parse multiple arguments to select a set of targeted trajectories for inference, transform trajectories into a suitable format, feed each trajectory into the TIGRESS algorithm, and store the output into designated directories. Our implementation can be found at /src/models/TIGRESS in the provided Anonymous GitHub repository. The method is implemented in R with the help of NumPy Python package to store generated trajectories, reticulate from https://github.com/rstudio/reticulate to load Python variables into the R environment, stringr from https://stringr.tidyverse.org for string operation, and optparse from https://github.com/trevorld/r-optparse to produce Python-style argument parser.
⋆ Methods based on Information Theory
Unless otherwises specified, the following args are used to select the trajectories to be used for evaluation:
parser = add_option(parser, c("--data-path"), type="character", default="/work/projects/bsimds/backup/src/simulations/",
help="The folder where data are stored.")
parser = add_option(parser, c("--save-folder"), type="character", default="",
help="The folder where resulting adjacency matrixes are stored.")
parser = add_option(parser, c("--b-portion"), type="numeric", default=1.0,
help="Portion of data to be used in benchmarking.")
parser = add_option(parser, c("--b-time-steps"), type="integer", default=49L,
help="Portion of time series in data to be used in benchmarking")
parser = add_option(parser, c("--b-network-type"), type="character", default="",
help="What is the network type of the graph.")
parser = add_option(parser, c("--b-directed"), action="store_true", default=FALSE,
help="Default choose trajectories from undirected graphs.")
parser = add_option(parser, c("--b-simulation-type"), type="character", default="",
help="Either springs or netsims.")
parser = add_option(parser, c("--b-suffix"), type="character", default="",
help='The rest to locate the exact trajectories. E.g. "50r1_n1" for 50 nodes, rep 1 and noise level 1. Or "50r1" for 50 nodes, rep 1 and noise free.')
ARACNe
We use the implementation of ARACNe by the Bioconductor package minet with a customized wrapper. Our wrapper will parse multiple arguments to select a set of targeted trajectories for inference, transform trajectories into a suitable format, feed each trajectory into the ARACNe algorithm, and store the output into designated directories. Our implementation can be found at /src/models/ARACNE in the provided Anonymous GitHub repository. The method is implemented by minet in R with the help of NumPy Python package to store generated trajectories, reticulate from https://github.com/rstudio/reticulate to load Python variables into the R environment, stringr from https://stringr.tidyverse.org for string operation, and optparse from https://github.com/trevorld/r-optparse to produce Python-style argument parser.
CLR
We use the implementation of CLR by the Bioconductor package minet with a customized wrapper. Our wrapper will parse multiple arguments to select a set of targeted trajectories for inference, transform trajectories into a suitable format, feed each trajectory into the CLR algorithm, and store the output into designated directories. Our implementation can be found at /src/models/CLR in the provided Anonymous GitHub repository. The method is implemented by minet in R with the help of NumPy Python package to store generated trajectories, reticulate from https://github.com/rstudio/reticulate to load Python variables into the R environment, stringr from https://stringr.tidyverse.org for string operation, and optparse from https://github.com/trevorld/r-optparse to produce Python-style argument parser.
PIDC
Following args are used to select the trajectories to be used for evaluation:
s = ArgParseSettings()
@add_arg_table s begin
"--data-path"
help = "The folder where data are stored."
arg_type = String
default = "/work/projects/bsimds/backup/src/simulations/"
"--save-folder"
help = "The folder where resulting adjacency matrixes are stored."
arg_type = String
required = true
"--b-portion"
help = "Portion of data to be used in benchmarking."
arg_type = Float64
default = 1.0
"--b-time-steps"
help = "Portion of data to be used in benchmarking."
arg_type = Int
default = 49
"--b-shuffle"
help = "Shuffle the data for benchmarking?"
action = :store_true
default = false
"--b-network-type"
help = "What is the network type of the graph."
arg_type = String
default = ""
"--b-directed"
help = "Default choose trajectories from undirected graphs."
action = :store_true
"--b-simulation-type"
help = "Either springs or netsims."
arg_type = String
default = ""
"--b-suffix"
help = "The rest to locate the exact trajectories. E.g. \"50r1_n1\" for 50 nodes, rep 1 and noise level 1. Or \"50r1\" for 50 nodes, rep 1 and noise free."
arg_type = String
default = ""
end
We use the official implementation of PIDC by the author at https://github.com/Tchanders/NetworkInference.jl with a customized wrapper. Our wrapper will parse multiple arguments to select a set of targeted trajectories for inference, transform trajectories into a suitable format, feed each trajectory into the PIDC algorithm, and store the output into designated directories. Our implementation can be found at /src/models/PIDC in the provided Anonymous GitHub repository. The method is implemented in Julia with the help of NumPy Python package to store generated trajectories, ArgParse.jl from https://github.com/carlobaldassi/ArgParse.jl to parse command line arguments, CSV.jl from https://github.com/JuliaData/CSV.jl to save and load .csv files, DataFrames.jl from https://github.com/JuliaData/DataFrames.jl to manipulate data array, and NPZ.jl from https://github.com/fhs/NPZ.jl to load .npy into the Julia environment.
Scribe
Following args are used to select the trajectories to be used for evaluation:
parser.add_argument('--data-path', type=str,
default="/work/projects/bsimds/backup/src/simulations/",
help="The folder where data are stored.")
parser.add_argument('--save-folder', type=str, required=True,
help="The folder where resulting adjacency matrixes are stored.")
parser.add_argument('--b-portion', type=float, default=1.0,
help='Portion of data to be used in benchmarking.')
parser.add_argument('--b-time-steps', type=int, default=49,
help='Portion of time series in data to be used in benchmarking.')
parser.add_argument('--b-shuffle', action='store_true', default=False,
help='Shuffle the data for benchmarking?')
parser.add_argument('--b-network-type', type=str, default='',
help='What is the network type of the graph.')
parser.add_argument('--b-directed', action='store_true', default=False,
help='Default choose trajectories from undirected graphs.')
parser.add_argument('--b-simulation-type', type=str, default='',
help='Either springs or netsims.')
parser.add_argument('--b-suffix', type=str, default='',
help='The rest to locate the exact trajectories. E.g. "50r1_n1" for 50 nodes, rep 1 and noise level 1. Or "50r1" for 50 nodes, rep 1 and noise free.')
parser.add_argument('--pct-cpu', type=float, default=1.0,
help='Percentage of number of CPUs to be used.')
We optimize the official implementation of Scribe by the author at https://github.com/aristoteleo/Scribe-py with a customized wrapper. Our wrapper will parse multiple arguments to select a set of targeted trajectories for inference, transform trajectories into a suitable format, feed each trajectory into the Scribe algorithm, and store the output into designated directories. Our implementation has customized causal_network.py and information_estimators.py scripts so as to modify the hyperparameters directly from command line arguments. We also have optimized the parallel support and computation efficiency and kept minimal functionality for benchmarking purposes, at the same time maintaining its general mechanism. Our implementation can be found at /src/models/scribe in the provided Anonymous GitHub repository. The method is implemented in Python with the help of NumPy package to store generated trajectories and tqdm from https://github.com/tqdm/tqdm to create progress bars.
⋆ Methods based on Tree Algorithms
Following args are used to select the trajectories to be used for evaluation:
parser.add_argument('--data-path', type=str,
default="/work/projects/bsimds/backup/src/simulations/",
help="The folder where data are stored.")
parser.add_argument('--save-folder', type=str, required=True,
help="The folder where resulting adjacency matrixes are stored.")
parser.add_argument('--b-portion', type=float, default=1.0,
help='Portion of data to be used in benchmarking.')
parser.add_argument('--b-time-steps', type=int, default=49,
help='Portion of time series in data to be used in benchmarking.')
parser.add_argument('--b-shuffle', action='store_true', default=False,
help='Shuffle the data for benchmarking?')
parser.add_argument('--b-network-type', type=str, default='',
help='What is the network type of the graph.')
parser.add_argument('--b-directed', action='store_true', default=False,
help='Default choose trajectories from undirected graphs.')
parser.add_argument('--b-simulation-type', type=str, default='',
help='Either springs or netsims.')
parser.add_argument('--b-suffix', type=str, default='',
help='The rest to locate the exact trajectories. E.g. "50r1_n1" for 50 nodes, rep 1 and noise level 1. Or "50r1" for 50 nodes, rep 1 and noise free.')
parser.add_argument('--pct-cpu', type=float, default=1.0,
help='Percentage of number of CPUs to be used.')
dynGENIE3
We optimize the official Python implementation of dynGENIE3 by the author at https://github.com/vahuynh/dynGENIE3 with a customized wrapper. Our wrapper will parse multiple arguments to select a set of targeted trajectories for inference, transform trajectories into a suitable format, feed each trajectory into the dynGENIE3 algorithm, and store the output into designated directories. Following the principle of maintaining dynGENIE’s general mechanism, we have modified the dynGENIE3.py script so as to tune the hyperparameters directly from command line arguments, increase computation efficiency on big datasets, enable calculation of self-influence, and retain minimal functionality for benchmarking purposes. Our implementation can be found at /src/models/dynGENIE3 in the provided Anonymous GitHub repository. The method is implemented in Python with the help of NumPy package to store generated trajectories.
XGBGRN
We use the official implementation of XGBGRN by the author at https://github.com/lab319/GRNs_nonlinear_ODEs with a customized wrapper. Our wrapper will parse multiple arguments to select a set of targeted trajectories for inference, transform trajectories into a suitable format, feed each trajectory into the XGBGRN algorithm, and store the output into designated directories. Our implementation can be found at /src/models/GRN nonlinear ODEs in the provided Anonymous GitHub repository. The method is implemented in Python with the help of NumPy package to store generated trajectories.
⋆ Methods based on VAEs
In general, we added following arguments to the argparse variable in these methods:
parser.add_argument('--save-probs', action='store_true', default=False,
help='Save the probs during test.')
parser.add_argument('--b-portion', type=float, default=1.0,
help='Portion of data to be used in benchmarking.')
parser.add_argument('--b-time-steps', type=int, default=49,
help='Portion of time series in data to be used in benchmarking.')
parser.add_argument('--b-shuffle', action='store_true', default=False,
help='Shuffle the data for benchmarking.')
parser.add_argument('--data-path', type=str, default='',
help='Where to load the data. May input the paths to edges_train of the data.')
parser.add_argument('--b-network-type', type=str, default='',
help='What is the network type of the graph.')
parser.add_argument('--b-directed', action='store_true', default=False,
help='Default choose trajectories from undirected graphs.')
parser.add_argument('--b-simulation-type', type=str, default='',
help='Either springs or netsims.')
parser.add_argument('--b-suffix', type=str, default='',
help='The rest to locate the exact trajectories. E.g. "50r1_n1" for 50 nodes, rep 1 and noise level 1.'
' Or "50r1" for 50 nodes, rep 1 and noise free.')
NRI
We use the official implementation code by the author from https://github.com/ethanfetaya/NRI with customized data loaders for our chosen datasets. The customized data loaders are named “load_customized_springs_data” and “load_customized_netsims_data”. Both of them are implemented in the “utils.py” file. The metric calculation pipeline is integrated into the “test” function. Besides that, the remaining part are in consistent with its official implementation. The code of our implementation can be found at /src/models/NRI in the provided Anonymous GitHub repository.
ACD
We use the official implementation code by the author https://github.com/loeweX/AmortizedCausalDiscovery with a customized data loader for our datasets. The customized data loader is named “load_data_customized”, and is implemented in “data_loader.py”. The metric calculation pipeline is integrated into the function “forward_pass_and_eval” of “foward_pass_and_eval.py” file. Besides that, the remaining part are in consistent with its official implementation. The code of our implementation can be found at /src/models/ACD in the provided Anonymous GitHub repository.
MPM
We use the official implementation code by the author at https://github.com/hilbert9221/NRI-MPM with a customized data loader for our chosen datasets. The customized data loader function is named “load_customized_data”, and with data preprocessing functions “load_nri” and “load_netsims”. The first function is implemented in “run.py”, while the rest are implemented in “load.py”. The metric calculation pipelines are integrated into the “test” function of “XNRIIns” class in “XNRI.py” file. Besides that, the remaining part are in consistent with its official implementation. The code of our implementation can be found at /src/models/MPM in the provided Anonymous GitHub repository.
iSIDG
We use the official implementation sent by the authors. We modified it with a customized data loader function: “load_data_benchmark”, which is implemented in “utils.py”. Besides that, the remaining part are in consistent with its official implementation. The code of our implementation can be found at /src/models/iSIDG in the provided Anonymous GitHub repository.