2. AstroLogics Tutorial, Boolean Network Analysis and Clustering : Herault Hematopoiesis

This tutorial demonstrates the AstroLogics framework for analyzing and comparing Boolean network model ensembles. AstroLogics is designed for benchmarking Boolean models through three major evaluation criteria: network evaluation, logical function evaluation, and dynamic evaluation.

2.1. Overview of AstroLogics Framework

AstroLogics addresses a critical gap in Boolean network modeling: while multiple methods exist for Boolean model synthesis (like Bonesis, BN-sketch), there hasn’t been a standardized way to evaluate and compare generated model ensembles. The framework focuses on:

  • Dynamic properties: Examining state transition graphs and model behaviors through simulation

  • Logical function evaluation: Analyzing and comparing logical rules that govern node behaviors

  • Model clustering: Identifying groups of models with similar dynamics and logical features

2.2. Required Libraries

[ ]:
import pandas as pd
import os
import astrologics as ast
import seaborn as sns
import matplotlib.pyplot as plt

2.2.1. Dataset: Herault hematopoeisis model

We’ll use the Herault Hematopoiesis model from Herault et al., 2018, which provides an excellent framework for studying blood cell differentiation. This model contains 33 nodes representing key transcription factors and signaling molecules involved in hematopoietic lineage specification. The model demonstrates hierarchical decision-making processes during blood cell development from hematopoietic stem cells to differentiated blood cell types.

2.3. Step 1: Load Model Ensemble

[ ]:
model_path = '../models/herault_hematopoiesis/'
model = ast.ensemble(model_path, project_name = 'herault_hematopoiesis')
model.create_simulation()
Simulation object created

The ensemble object is the core component of AstroLogics that handles:

  • Loading multiple Boolean network models from a directory

  • Managing simulation parameters and configurations

  • Coordinating analysis across the model ensemble

2.4. Step 2: Simulate the model ensemble

In this part of the script we first simulate all the BN within the model ensemble. We utilize the MaBoSS engine as the main simulator.

This creates an initial state where all nodes have equal probability (0.5) of being active. This represents a neutral starting condition that allows the system to evolve according to its inherent dynamics.

We then start the simulation using MaBoSS.

MaBoSS (Markovian Boolean Stochastic Simulator) is crucial for the AstroLogics approach because:

  • It converts Boolean network dynamics into continuous-time Markov processes

  • Provides probabilistic approximation of complex state transition graphs

  • Enables analysis of both transient and steady-state behaviors

  • Scales computationally better than exhaustive state space exploration

[ ]:
# Configure simulation parameters
model.simulation.update_parameters(max_time = 20,thread_count = 15, sample_count = 2000)
model.simulation.run_simulation()
Start simulation
Simulation completed

2.5. Step 3 : Calculate the trajectory and visualize the distance matrix using MDS

In this part of the script, we compare the two method of calculating the distance between models.

In this function calculate_distancematrix, users can select the two options of data used to calculate the distance.

  • endpoint : The endpoint utilize the node activation probability at the endpoint of MaBoSS simulation. User could also defines the timepoint to define a specific timepoint they want to use to define the distance

  • trajectory : This options utilize the whole MaBoSS simulation trajectory and the dtw method to calculate the distances between models.

In this part, we show different results that could be obtained from two different method.

  • Using endpoint we showed no clear separation of the model, which coresspond to the single attractors that all the model could reach.

  • On the other hand, when using trajectory we can observe two major cluster of models based on the different signaling path that two groups of BNs took.

[5]:
model.create_trajectory()
Trajectory object created
[6]:
model.trajectory.calculate_distancematrix(mode = 'endpoint')
# Perform MDS (Multidimensional Scaling) for visualization
model.trajectory.calculate_MDS()
model.trajectory.plot_MDS(s = 100, fig_size = (8,8))
Calculating distance matrix for endpoint simulation...
Distance matrix calculated successfully.
_images/02_Analysis_Herault_hematopoiesis_15_1.png
[7]:
model.trajectory.calculate_distancematrix(mode = 'trajectory')
# Perform MDS (Multidimensional Scaling) for visualization
model.trajectory.calculate_MDS()
model.trajectory.plot_MDS(s = 100, fig_size = (8,8))
Calculating distance matrix for whole trajectory...
Distance matrix calculated successfully.
_images/02_Analysis_Herault_hematopoiesis_16_3.png

2.6. Step 4: Model Clustering

Clustering reveals distinct groups within the model ensemble. In this example, we found 2 major clusters corresponding to different attractor groups, representing distinct cellular fate decisions.

[8]:
model.trajectory.calculate_kmean_cluster(n_cluster = 2,
                              random_state = 0)
Calculated k-means clustering with 2 clusters.
[9]:
model.trajectory.plot_MDS(s = 100, fig_size = (8,8),plot_cluster = True)
_images/02_Analysis_Herault_hematopoiesis_19_0.png

2.7. Step 5: Logic Function Analysis

This step implements the logical function evaluation component of AstroLogics:

  • Converts Boolean equations to Disjunctive Normal Form (DNF)

  • Creates feature matrices comparing logical rules across models

  • Identifies constant, varied, and marker clauses

[10]:
model.create_logic()
model.logic.model_logic
model.logic.create_flattend_logic_clause()
Loading models logics
Concatenate results into matrix
Logic object created
Flatten models logic clauses
Concatenate results into matrix
Flattend logic clause created

2.8. Step 6 : Calculate statistic of Logic features (clauses)

In this steps, we have already featurized the logical equations into model logics or clauses.

We can then integrate the clusters obtained from the trajectory analysis into the .logic and perform chi-square statistical test to categorize logic features (clauses) into 3 major groups

  • Constatnt : core regulatory features that appears across BNs in the model ensemble

  • Varied : Features that may differ between individual BNs but show no statistical significant

  • Marker : Key discriminatory features that statistically distinguish between different model clusters.

We can define the p-value of the chi-square test using the function pval_threshold.

[11]:
model.logic.map_model_clusters(model.trajectory.cluster_dict)
model.logic.calculate_logic_statistic(pval_threshold = 0.0001)
Model clusters mapped to logic clauses

The results of the analysis can be visualized in the form of Manhattan Plot shown below.

[12]:

model.logic.plot_manhattan()
_images/02_Analysis_Herault_hematopoiesis_26_0.png

Or the results can be summarized into the barplot shown here

[13]:
model.logic.plot_logicstat_summary()
_images/02_Analysis_Herault_hematopoiesis_28_0.png

2.9. Step 7: Advanced Trajectory Analysis

These visualizations help identify:

  • Most variable nodes: Components showing greatest differences between models

  • Critical regulators: Nodes whose activity patterns distinguish model clusters

  • Temporal patterns: How specific nodes behave over simulation time

In this first plot, we check what are the features that shows the highest variance in their dynamics accross simulation. We calculate the variance of node activation probabily of all BNs in the model ensemble across all timepoints and plotted using the heatmap.

[14]:
model.trajectory.plot_trajectory_variance()
_images/02_Analysis_Herault_hematopoiesis_30_0.png

We identify the key node of interest (CDK46CycD) which shows the highest variance along the timepoints. Finally, we could visualize the dynamics of this node between two identified clusters using the lineplot.

From this, we found that the dynamics of this node differ during the early timepoint of the simulation, while finally converge to 0 at the later timepoint.

[15]:
model.trajectory.plot_node_trajectory(node = ['CDK46CycD'])
_images/02_Analysis_Herault_hematopoiesis_32_0.png

2.10. Step 8 : Logic Feature Heatmap

This analysis typically reveals that Myt1L is the key distinguishing node between clusters:

  • Cluster 0: CDK46CycD is regulated by Bclaf1 & Myc

  • Cluster 1: CDK46CycD is regulated by Bclaf1 | Myc

[16]:
model.logic.plot_node_logic_heatmap(node = ['CDK46CycD'],
                                     fig_size = (8, 8))
<Figure size 800x800 with 0 Axes>
_images/02_Analysis_Herault_hematopoiesis_34_1.png