Background

In this section we introduce basic concepts, definitions and notations.

Boolean functions

Boolean regulatory functions (BRFs) are Boolean functions used in the context of biological networks and modeling. A mathematical presentation of such a function associates the activity output of a target biological entity with the Boolean input values of \(n\) variables (the regulators), such that \(f_{BRF}:\{0,1\}^n \rightarrow \{0,1\}\). Thus, the target’s output state is binary, i.e. either \(0\) (FALSE, denoting an inactive/inhibited state) or \(1\) (TRUE, indicating an active state).

The most intuitive representation of a Boolean function is its truth table, which is a list of all possible Boolean input configurations of the \(n\) regulators along with their associated function output. Since for every regulator, there are two possible values (\(0\) and \(1\)), the total number of input configurations (i.e. rows) in a truth table is \(2^n\). For example, a Boolean function \(f(x_1,x_2,x_3)\) with \(3\) regulators has a total of \(2^3=8\) rows in its corresponding truth table, starting from the input configuration \((0,0,0)\) and ending with \((1,1,1)\) (see Table 1).

The total number of BRFs with \(n\) regulators is \(2^{2^n}\) and the easiest way to think about it is to see that for each input configuration (i.e. row of the truth table) there can be only two possible function outcomes (\(0\) or \(1\)). In our example with \(3\) regulators and a total of \(8\) rows in the truth table, that would be a total of \(2^8=256\) functions, three of which are shown in Table 1 below.

\(x_1\)	\(x_2\)	\(x_3\)	\(f_1=(x_1 \land \lnot x_3) \lor (x_2 \land \lnot x_3)\)	\(f_2=x_1 \lor (\lnot x_2 \land \lnot x_3)\)	\(f_3 = 1\)
0	0	0	0	1	1
0	0	1	0	0	1
0	1	0	1	0	1
0	1	1	0	0	1
1	0	0	1	1	1
1	0	1	0	1	1
1	1	0	1	1	1
1	1	1	0	1	1

Table 1: Truth table of three Boolean functions with three input variables. Functions \(f_1\) and \(f_2\) are expressed in disjunctive normal form (DNF) with the minimum possible number of terms. \(f_3\) is a tautology.

Truth Density

Definition 1 Truth density (TD) of a Boolean function is the fraction of all input configurations in its corresponding truth table that yield a TRUE outcome (i.e. the function outputs 1).

Note that the truth density of a Boolean function rests in the \([0,1]\) interval. Using the example with the three functions from Table 1, we have \(TD_{f_1}=3/8=0.375\), \(TD_{f_2}=5/8=0.625\) and \(TD_{f_3}=8/8=1\), where the last function is the tautology, with the maximum possible truth density.

We will say that a Boolean regulatory function is biased, when it’s truth density is close to \(0\) or 1. Since the size of a truth table grows exponentially with the number of inputs a Boolean function has (\(n\) input regulators correspond to \(2^n\) rows), the existence of bias conveys the information that most of the input regulatory configurations result in either an activated or an inhibited target (bias towards \(0\) or \(1\) respectively). On the other hand, we shall say that a Boolean function is balanced, if it takes on the values \(0\) and \(1\) equally often, or equivalently, it’s truth density is close to \(\frac12\) (Benjamini, Schramm, and Wilson 2005).

The DNF form

The most practically used form of a Boolean function is its analytical form, where variables are connected with logical operators such as AND (\(\land\)), OR (\(\lor\)), NOT (\(\lnot\)), XOR (\(\oplus\)), etc. and the output of the function is calculated using basic Boolean algebra. In Table 1 for example, we provide the analytical forms for the functions \(f_1\) and \(f_2\). Note that there can be multiple analytical forms that essentially compute the same function, e.g. another form of the \(f_1\) function is \(f_1^{'}=(\lnot x_1 \land x_2 \land \lnot x_3) \lor (x_1 \land \lnot x_2 \land \lnot x_3) \lor (x_1 \land x_2 \land \lnot x_3)\).

This brings us to the notion of a general form which could be used to define useful metrics common to all Boolean functions (e.g. complexity), as well as the need to provide minimal forms. This need does not come only from a representation point of view (a more compact function form enhances readability, e.g. compare \(f_1\) with \(f_1^{'}\)), but is also a motivation for the design of optimal digital circuits, since the number of gates used (i.e. logical operators) is an indicator of computation time and affects the efficiency of a circuit (Wegener 1987).

Every Boolean function can be represented in a disjunctive normal form or DNF, requiring only AND (\(\land\)), OR (\(\lor\)) and NOT (\(\lnot\)) operators as building blocks. In such a representation, literals, which are variables (e.g. positive literal \(x\)) or their negations (e.g. negative literal \(\text{NOT }x\)), are connected by AND’s producing terms, which are then in turn connected by OR’s (Crama and Hammer 2011). For example, every function in Table 1 is expressed in DNF form, while the Boolean expressions \(\lnot (x_1 \lor x_2)\) and \(\lnot (x_1\land x_2)\lor x_3\) are not. Note that a Boolean function can have multiple DNF forms.

Complexity

The length of a DNF is defined as the number of its terms, i.e. the number of conjunctions separated by \(\lor\) operators. As such, \(l_{f_1}=l_{f_2}=2\), \(l_{f_3}=1\) and \(l_{f_1^{'}}=3\) from Table 1. Intuitively speaking, the more terms a DNF has, the more complex it is. This observation leads us to the following definition:

Definition 2 The complexity \(C\) of a Boolean function is the length of its shortest DNF, normalized by the total number of rows of its truth table (Gherardi and Rotondo 2016).

For example, using the functions from Table 1, for which the minimum DNF expression is given, we have that \(C_{f_1}=C_{f_2}=2/8\), \(C_{f_3}=1/8\), where the tautology \(f_3\) is assigned the smallest possible complexity.

Note that minimizing the DNF form of a Boolean function to get an equivalent expression with the shortest number of terms, is considered a computational difficult problem and there can be multiple such minimum DNF representations for a single Boolean function.

Link operator Boolean Regulatory Functions

We consider the class of BRFs that partitions the input variables (regulators) to two sets: the set of positive regulators (also called activators) and the set of negative regulators (also called inhibitors). Let \(f\) be such a Boolean function \(f(x,y):\{0,1\}^n \rightarrow \{0,1\}\), with \(m \ge 1\) activators \(x=\{x_i\}_{i=1}^{m}\) and \(k \ge 1\) inhibitors \(y=\{y_j\}_{j=1}^{k}\), that is a total of \(n = m + k\) regulators. The link operator BRFs have an analytical non-DNF form, which places the two distinct types of regulators in two separate expressions, while connecting them with a special logical operator that we call a link operator.

An example of such a function that has been used extensively in the logical modeling literature is the standardized BRF formula with the “AND-NOT” link operator (Mendoza and Xenarios 2006):

\[f_{AND-NOT}(x,y) = \left(\bigvee_{i=1}^{m} x_i\right) \land \lnot \left(\bigvee_{j=1}^{k} y_j\right)\]

A variation of the above function is the “OR-NOT” link operator function:

\[f_{OR-NOT}(x,y) = \left(\bigvee_{i=1}^{m} x_i\right) \lor \lnot \left(\bigvee_{j=1}^{k} y_j\right)\]

Note that the presence of the link operator is what forces the condition \(m,k \ge 1\) (at least one regulator in each category). For the rest of this work, we will not consider BRFs with only one type of regulator, since these can be represented by simple logical functions without loss of interpretability or biological consistency. Following the notation introduced in (Mendoza and Xenarios 2006), we have the case of only activators:

\[f(x)=\bigvee_{i=1}^{m} x_i\] where the presence of at least one activator makes the function output TRUE (active). Similarly, in the case of only inhibitors we have:

\[f(y)=\lnot \bigvee_{j=1}^{k} y_j=\bigwedge_{j=1}^{k} \lnot y_j\] where the presence of at least one inhibitor is enough to make the target inhibited (FALSE).

Borrowing notation from circuit theory, we could use other link operators like the NAND, NOR, XNOR (\(\odot\)) gates, combined with the NOT operator or not. Note that the operator used to connect the same type of regulators (e.g. the activators) is usually the OR, but other operators could be used as well.

For example, another link operator function is the “Pairs” function:

\[f_{Pairs}(x,y) = \bigvee_{\forall (i,j)}^{m,k}(x_i\land \lnot y_j) = \left(\bigvee_{i=1}^{m} x_i\right) \land \left(\bigvee_{j=1}^{k} \lnot y_j\right)\]

The intuition behind the name is derived from the fact that the function will return TRUE if there is at least one pair of regulators consisting of a present activator and an absent inhibitor. The function is also expressed in conjunction normal form (CNF), which is the reverse of the DNF: two separate clauses are connected with AND’s (\(\land\)) and inside the clauses the literals are connected with OR’s (\(\lor\)).

Cury, Monteiro, and Chaouiya (2019) defined the consistent Boolean regulatory functions and their respective complete DNF forms (CDNF).

The link operator functions listed above are a subset of these functions, since they respect the regulatory structure in their respective CDNF forms (for the Pairs function it’s evident since it’s expressed first in the CDNF form - for the AND-NOT and OR-NOT functions, see their respective (complete) DNF forms in the truth density proofs 1 and 2.

Threshold functions

Boolean threshold functions are a special kind of Boolean functions for which the output depends on the condition that the sum of (possibly weighted) activities of the input regulators surpass a given threshold value (Crama and Hammer 2011).

In this work, we will use two simple threshold functions, which both output \(1\) when the number of present activators is larger than the number of present inhibitors. As such, the activators and inhibitors are combined in an additive manner, with their respective assigned weights set to \(\pm1\) and the threshold parameter to \(0\), formulating thus a majority rule which defines the value of the function (Bornholdt 2008; Chaouiya, Ourrad, and Lima 2013). These functions differ with regards to their output when there is balance between the activities of the positive and negative regulators: the first outputs \(1\) (the activators “win”) while the second outputs \(0\) (the inhibitors “win”).

\[f_{Act-win}(x,y)=\begin{cases} 1, & \text{for } \sum_{i=1}^{m} x_i \ge \sum_{j=1}^{k} y_j\\ 0, & \text{otherwise} \end{cases}\] \[f_{Inh-win}(x,y)=\begin{cases} 1, & \text{for } \sum_{i=1}^{m} x_i \gt \sum_{j=1}^{k} y_j\\ 0, & \text{otherwise} \end{cases}\]

Note that: \(f_{Inh-win}(x,y) = \lnot f_{Act-win}(y,x)\).

Note that every Boolean threshold function has an equivalent combinatorial expression and I searched for an analytic formula for the two last threshold functions. More more info, see this math.stackexchange question.