[Personal Notes]: Probabilistic Graphical Models: Bayesian Networks Example With R, Python, SAMIAM

Here I try to practice examples of Bayesian networks as explained in the book written by Stanford Professor Daphne Koller, and Hebrew Professor Nir Friedman., “Probabilistic Graphical Models: Principle and Techniques”, chapter 3.2.

In the book, the final values of probabilities is given as it is, without derivation of the formulas. Here I tried to expand the formulas, as well to verify the results with the help of R, Python and SamIam.

Course: PGM Specialization (MATLAB / Octave).
Tools: R, Python, SamIam.

The DAG of student BN example is,

The domains of all random variables D, I, S, L are binary-valued {0, 1}, while domain of G is {1, 2, 3} representing {A, B, C} score.

The CPDs,

From chain-rule I have joint probability distribution,

P(I,D,G,S,L) = P(I) P(D) P(G|I,D) P(S|I) P(L|G)

Now I want to do some causal reasoning, step by step for clarity.
P(L=1) represents how likely a student gets strong recommendation letter.
Marginalization to the joint probability distribution equation,

So, without knowing anything about student’s records, probability is 50.2336% to obtain strong recommendation letter. That is kind of manual calculation which is tedious :(

R version:

With the power of magic thing called “computer” :D, I enjoy faster calculation in R,

library(gRbase)
library(Rgraphviz)
g1 = dag(~ D + I + G:D:I + S:I + L:G)
plot(g1)

# define BN:
val = c("false", "true")
valg = c("A", "B", "C")
D = cptable(~D, values=c(60,40), levels=val)
I = cptable(~I, values=c(70,30), levels=val)
S = cptable(~S|I, values=c(95,5,20,80), levels=val)
G = cptable(~G|D:I, values=c(30,40,30,5,25,70,90,8,2,50,30,20), levels=valg)
L = cptable(~L|G, values=c(10,90,40,60,99,1), levels=val)
plist = compileCPT(list(D,I,S,G,L))

plist
CPTspec with probabilities:
P( D )
P( I )
P( S | I )
P( G | D I )
P( L | G )

plist$D
D
false true
0.6   0.4
attr(,"class")
[1] "parray" "array"

plist$I
I
false true
0.7   0.3
attr(,"class")
[1] "parray" "array"

plist$S
       I
S       false true
false 0.95 0.2
true   0.05 0.8
attr(,"class")
[1] "parray" "array"

plist$G
I = false
   D
G   false true
A   0.3 0.05
B   0.4 0.25
C   0.3 0.70
I = true
   D
G   false true
A 0.90 0.5
B 0.08 0.3
C 0.02 0.2
attr(,"class")
[1] "parray" "array"

plist$L
       G
L         A   B    C
false 0.1 0.4 0.99
true 0.9 0.6 0.01
attr(,"class")
[1] "parray" "array"

Doing inference in R:

net = grain(plist)
querygrain(net, nodes=c("L"))
$L
L
false true
0.497664 0.502336

Note that I have the same value as the previous manual calculation (marginalization),
P(L=1) = 0.502336.

P(L=1 | I=0) inference:

Then, if I know the student is not so intelligent (I=0), probability is P(L=1 | I=0).
From Bayes’ rule,

P(L=1 | I=0) = P(L=1, I=0) / P(I=0)

From marginalization above we have the second half of the equation represents P(L=1, I=0):

In R,

net2 = setEvidence(net, evidence=list(I="false"))
querygrain(net2, nodes=c("L"))
$L
L
false true
0.6114 0.3886

So, the probability to obtain strong letter goes down to 0.3886 when the student is not so intelligent.
Furthermore, if the class’ difficulty is “easy” (D=0) the probability is P(L=1 | I=0, D=0):

net3 = setEvidence(net, evidence=list(I="false", D="false"))
querygrain(net3, nodes=c("L"))
$L
L
false true
0.487 0.513

As mentioned previously I have done causal reasoning or prediction. In that case, the query goes downstream. Given various factors, e.g. lecture’s difficulty and student’s intelligence, I can predict the downstream effects of those factors.

Another kind of reasoning is evidential reasoning or explanation.

As example, I have these facts:

A company want to hire a student.

A priori the company assumes the student is intelligent, with probability of 30% (as in the CPT above P(I=1) = 0.3).

The company obtains student’s grade is “C”, (G=3).

Knowing the C grade fact, probability that student is intelligent goes down significantly to 7.9%. That is P(I=1 | G=3) = 0.079.

In R,

net4 = setEvidence(net, evidence=list(G="C"))
querygrain(net4, nodes=c("I"))
$I
I
     false       true
0.92105263 0.07894737

If however, student’s SAT score is good (S=1) then probability that student is intelligent goes up dramatically to 57.8%, that is P(I=1 | G=3, S=1) = 0.578:

net5 = setEvidence(net, evidence=list(G="C", S="true"))
querygrain(net5, nodes=c("I"))

$I
I
    false      true
0.4216867 0.5783133

One more type of reasoning is intercausal reasoning, which is very common pattern in human reasoning. In this case, there is interaction among different causes of the same effect.

An example of intercausal reasoning is when (1) student’s grade is “C” and (2) class is difficult. Probability a student is intelligent goes up to P(I=1 | G=3, D=1) = 11%,

net6 = setEvidence(net, evidence=list(G="C", D="true"))
querygrain(net6, nodes=c("I"))$I
I
    false      true
0.8909091 0.1090909

Intuitively, because the class if difficult then it is more likely the student will get “C” grade. Resulting slightly better probability that student is intelligent, from 7.9% to 11%.

Python

There are several PGM Python packages available, two of them: libpgm and pgmpy.
This is example of using pgmpy, with the same BN as above:

from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import BeliefPropagation, VariableElimination

model = BayesianModel()
model.add_nodes_from(['D', 'I', 'G', 'S', 'L'])
model.add_edges_from([('D', 'G'), ('I', 'G'), ('I', 'S'), ('G', 'L')])

# conditional probability distribution
cpd_D = TabularCPD('D', 2, [[0.6], [0.4]])
cpd_I = TabularCPD('I', 2, [[0.7], [0.3]])
cpd_S = TabularCPD('S', 2, [[0.95, 0.2], [0.05, 0.8]],
                   evidence=['I'], evidence_card=[2])
cpd_G = TabularCPD('G', 3, [[0.3, 0.05, 0.9, 0.5],
                            [0.4, 0.25, 0.08, 0.3],
                            [0.3, 0.7, 0.02, 0.2]],
                   evidence=['I', 'D'],
                   evidence_card=[2, 2])
cpd_L = TabularCPD('L', 2, [[0.1, 0.4, 0.99], [0.9, 0.6, 0.01]],
                   evidence=['G'], evidence_card=[3])

print 'D', cpd_D.values
print 'I', cpd_I.values
print 'S', cpd_S.values
print 'G', cpd_G.values
print 'L', cpd_L.values

model.add_cpds(cpd_D, cpd_I, cpd_S, cpd_G, cpd_L)
print model.check_model()

# inference by belief propagation
bp = BeliefPropagation(model)
q1 = bp.query(variables=['L'],
              evidence={})
print q1['L']

q2 = bp.query(variables=['L'],
              evidence={'I': 0})
print q2['L']

# inference by variable elimination
ve = VariableElimination(model)
q1 = ve.query(variables=['L'],
              evidence={})
print q1['L']

q2 = ve.query(variables=['L'],
              evidence={'I': 0})
print q2['L']

+-----+----------+
| L   |   phi(L) |
|-----+----------|
| L_0 |   0.4977 |
| L_1 |   0.5023 |
+-----+----------+
+-----+----------+
| L   |   phi(L) |
|-----+----------|
| L_0 |   0.6114 |
| L_1 |   0.3886 |
+-----+----------+

Results of causal queries is the same as previous calculation with R: 50.23% and 38.86%.
Example by using libpgm, for the same BN,

from libpgm.graphskeleton import GraphSkeleton
from libpgm.nodedata import NodeData
from libpgm.discretebayesiannetwork import DiscreteBayesianNetwork
from libpgm.tablecpdfactorization import TableCPDFactorization

skel = GraphSkeleton()
skel.V = ['D', 'I', 'G', 'S', 'L']
skel.E = [('D', 'G'), ('I', 'G'), ('I', 'S'), ('G', 'L')]

nd = NodeData()
nd.Vdata = {"D": {
                "ord": 0,
                "numoutcomes": 2,
                "vals": ["0", "1"],
                "parents": None,
                "children": ['G'],
                "cprob": [0.6, 0.4]
            },
            "I": {
                "ord": 1,
                "numoutcomes": 2,
                "vals": ["0", "1"],
                "parents": None,
                "children": ['G'],
                "cprob": [0.7, 0.3]
            },
            "S": {
                "ord": 2,
                "numoutcomes": 2,
                "vals": ["0", "1"],
                "parents": ['I'],
                "children": None,
                "cprob": {
                    "['0']": [0.95, .05],
                    "['1']": [.2, .8]
                }
            },
            "G": {
                "ord": 3,
                "numoutcomes": 3,
                "vals": ["1", "2", "3"],
                "parents": ['I', 'D'],
                "children": ['L'],
                "cprob": {
                    "['0', '0']": [.3, .4, .3],
                    "['0', '1']": [.05, .25, .7],
                    "['1', '0']": [.9, .08, .02],
                    "['1', '1']": [.5, .3, .2]
                }
            },
            "L": {
                "ord": 4,
                "numoutcomes": 2,
                "vals": ["0", "1"],
                "parents": ['G'],
                "children": None,
                "cprob": {
                    "['1']": [0.1, 0.9],
                    "['2']": [0.4, 0.6],
                    "['3']": [0.99, 0.01]
                }
            }
           }

# load bayesian network
bn = DiscreteBayesianNetwork(skel, nd)
cpd_table = TableCPDFactorization(bn)

# query
q1 = cpd_table.specificquery(query={"L": ["1"]}, evidence={})
print q1

cpd_table.refresh()
q2 = cpd_table.specificquery(query={"L": ["1"]}, evidence={"I": "0"})
print q2

0.502336
0.3886

Results, as expected, are the same ;)

With the Python packages, I can also read the BN from file: JSON, BIF, XML. Read the docs ;)

SAMIAM

SamIam is a comprehensive GUI tool for modeling and reasoning with Bayesian networks, developed by the Automated Reasoning Group of Professor Adnan Darwiche at UCLA..

With SamIam, we can construct a Bayesian network WYSIWIG based (what you see is what you get).

From the same BN as above, I have following screenshots of the GUI:

Results of queries are just the same:
Without knowing anything about student,

We see probability = 50.23%, the same as previous calculations in R, Python.
Setting student to not so intelligent, P(I=0):

Again the same probability as previous calculation, 38.86%.
If lecture is assumed as easy, P(L=1 | I=0, D=0) :

The same result as previous calculations, 51.30%.

As for evidential reasoning, also resulting the same probabilities. P(I=1 | G=3) = 0.079 :

P(I=1 | G=3, S=1) = 0.578 :

In the example of evidential reasoning P(I=1 | G=3), note that S (SAT) is d-connected to I given G. In this case, G is a collider and G is in the conditioning set (G is observed). The path D → G ← I → S is active trail. If we change evidence G then posterior probability of S is changed, so they are d-connected. However it's found that in the calculation of P(I=1 | G=3) = 0.079, the summation over S of P(S | I) is summed up to 1. So consequently S can be excluded out of the equation.

More depth intuitive explanations: refer to the text book and the Prof. Daphne Koller’s video lectures. More reasonings: interventional reasoning, counterfactual reasoning (chap.21).
;)

Thursday, December 8, 2016

Probabilistic Graphical Models: Bayesian Networks Example With R, Python, SAMIAM