Andrew Shirokoff: 2016

In the first part of this post I gave an example of trivial NB classifier which is optimal solution according to Harry Zhang's article, despite presence of attribute dependencies. Now, let's play with it. Firstly, we will consider two classifiers: a NB classifier and according Bayesian network and see their equivalence. Secondly, we will consider another case of local dependency in which NB classifier will not be an optimal solution.

Let's run our NB classifier for all possible combinations of W1, W2:

W1, W2	C=T score	C=F score	Classifier's decision
W1=T, W2=T	0.018	0.048	F
W1=T, W2=F	0.042	0.072	F
W1=F, W2=T	0.162	0.112	T
W1=F, W2=F	0.378	0.168	T

See example of calculation of first row in the previous part. Now let's calculate probabilites for full Bayesian network. For convenience I will duplicate joint distribution from the first part:

C	W1	W2	P(C,W1,W2)
T	T	T	0.024
T	T	F	0.036
T	F	T	0.162
T	F	F	0.378
F	T	T	0.06
F	T	F	0.06
F	F	T	0.112
F	F	F	0.168

Then we can calculate category probabilities as follows:

P (C = T | W 1 = T, W 2 = T) = \frac{P (W 1 = T, W 2 = T, C = T)}{P (W 1 = T, W 2 = T, C = T) + P (W 1 = T, W 2 = T, C = F)}

P (C = T | W 1 = T, W 2 = T) = \frac{0.024}{0.024 + 0.06} = \frac{0.024}{0.084} = 0.29

Full table is calculated below:

W1, W2	P( C=T \| W1, W2)	P( C=F \| W1, W2)	Classifier's decision
W1=T, W2=T	0.29	0.71	F
W1=T, W2=F	0.38	0.62	F
W1=F, W2=T	0.59	0.41	T
W1=F, W2=F	0.69	0.31	T

These classifiers seem to be equal, but what if dependence W2 on W1 would have another conditional probabilities? Assume there is another conditional probability table:

C	W1	W2=T	W2=F
C=T	W1=T	0.2	0.8
C=T	W1=F	0.2	0.8
C=F	W1=T	0.5	0.5
C=F	W1=F	0.4	0.6

How it will affect on NB classifier? Updated ddr values and table are below:

d d r (T) \approx0.62

d d r (F) \approx1.2

C	W1	W2	P(C,W1,W2)
T	T	T	0.012
T	T	F	0.048
T	F	T	0.108
T	F	F	0.432
F	T	T	0.06
F	T	F	0.06
F	F	T	0.112
F	F	F	0.168

Now they significantly differ from 1. So, local dependence is not distributed evenly and NB classifier might produce wrong results. Let's recalculate table for Bayesian network:

W1, W2	P( C=T \| W1, W2)	P( C=F \| W1, W2)	Classifier's decision
W1=T, W2=T	0.16	0.84	F
W1=T, W2=F	0.44	0.56	F
W1=F, W2=T	0.49	0.51	F
W1=F, W2=F	0.72	0.28	T

We can see that value for W1=F, W2=T case was changed reflecting changes in local dependency. Consequently, in this case NB classifier is not an optimal solution.

Andrew Shirokoff

Saturday, 2 January 2016

An abstract example for "The Optimality of Naive Bayes". Part 2.