About the Dataset

All datasets are provided in standard binary file format (NPY). The format stores all of the shape and data type information necessary to reconstruct the array correctly even on another machine with a different architecture.

Downloading NPY datasets

All datasets are hosted on https://structinfer.github.io/download/, where the links to download raw and split datasets in NPY format can be found at. After downloading the datasets, please move the corresponding files into /src/simulations/[name of the underlying graph]/directed [or undirected]/springs [or netsims]/. For the graph types are gene coexpression networks and landscape networks, the trajectories should be saved under /src/simulations/[name of the underlying graph]/undirected/springs [or netsims]/. For the others, please save them under /src/simulations/[name of the underlying graph]/directed/springs [or netsims]/.

Loading datasets in Python

After downloading an NPY dataset, it is easy to load it into Python with Numpy. You can also load a dataset from a directory of files in any supported structural data format by creating customized data-loading pipelines.

Naming policies of datasets

By default, daatasets contain all trajectories and underlying interacting graphs in each folder with certain training-validation-test split. For example, for the following properties: “directed”, “CRNA”, “15 nodes”, “springs simulation”, “noise-free”, and “the first repetition” the data can be found at: /src/simulations/chemical_reaction_networks_in_atmosphere/directed/springs/. The files are:

  • Trajectories for training: loc_train_springs15r1.npy, vel_train_springs15r1.npy,

  • Groundtruth graphs for training: edges_train_springs15r1.npy,

  • Trajectories for validation: loc_valid_springs15r1.npy, vel_valid_springs15r1.npy,

  • Groundtruth graphs for validation: edges_valid_springs15r1.npy,

  • Trajectories for test: loc_test_springs15r1.npy, vel_test_springs15r1.npy,

  • Groundtruth graphs for test: edges_test_springs15r1.npy.

For the following properties: “directed”, “BN”, “30 nodes”, “netsims simulation”, “noise-free”, and “the second repetition” the data can be found at: /src/simulations/brain_networks/directed/netsims/. The files are:

  • Trajectories for training: bold_train_netsims30r2.npy,

  • Groundtruth graphs for training: edges_train_netsims30r2.npy,

  • Trajectories for validation: bold_valid_netsims30r2.npy,

  • Groundtruth graphs for validation: edges_valid_netsims30r2.npy,

  • Trajectories for test: bold_test_netsims30r2.npy,

  • Groundtruth graphs for test: edges_test_netsims30r2.npy.

For the following properties: “undirected”, “LN”, “50 nodes”, “netsims simulation”, “noise level 2”, and “the third repetition” the data can be found at: /src/simulations/landscape_networks/undirected/netsims/. The files are:

  • Trajectories for training: bold_train_netsims50r3_n2.npy,

  • Groundtruth graphs for training: edges_train_netsims50r3_n2.npy,

  • Trajectories for validation: bold_valid_netsims50r3_n2.npy,

  • Groundtruth graphs for validation: edges_valid_netsims50r3_n2.npy,

  • Trajectories for test: bold_test_netsims50r3_n2.npy,

  • Groundtruth graphs for test: edges_test_netsims50r3_n2.npy.

More comments

All of the trajectories are in the shape of: [trajectories, nodes, features, timesteps]. For trajectories generated by springs simulations, for example, with a directed graph consisting of 15 nodes and with the first repetition, both “loc_train_springs15r1.npy”, and “vel_train_springs15r1.npy” have the shape of [8000, 15, 2, 49]. Meanwhile, the ground truth graph has the shape: [nodes, nodes], which is an adjacency matrix, and if the element at row i, column j is one, it represents that there is an directed edge from node i to j. In order to get the full features, we have to concatenate both files on the feature dimension, and obtain new trajectories with the shape: [8000, 15, 4, 49].

But for the trajectories generated by netsims simulations, for example, with a directed graph consisting of 30 nodes and with the second repetition, “bold_train_netsims30r2.npy” has the shape of [8000, 30, 1, 49].

The trajectories for validation and test, each have 2000 trajectories, respectively.