In the first part of this post I gave an example of trivial NB classifier which is optimal solution according to Harry Zhang's article, despite presence of attribute dependencies. Now, let's play with it. Firstly, we will consider two classifiers: a NB classifier and according Bayesian network and see their equivalence. Secondly, we will consider another case of local dependency in which NB classifier will not be an optimal solution.
Let's run our NB classifier for all possible combinations of W1, W2:
W1, W2 | C=T score | C=F score | Classifier's decision |
---|---|---|---|
W1=T, W2=T | 0.018 | 0.048 | F |
W1=T, W2=F | 0.042 | 0.072 | F |
W1=F, W2=T | 0.162 | 0.112 | T |
W1=F, W2=F | 0.378 | 0.168 | T |
See example of calculation of first row in the previous part. Now let's calculate probabilites for full Bayesian network. For convenience I will duplicate joint distribution from the first part:
Then we can calculate category probabilities as follows:
Full table is calculated below:C | W1 | W2 | P(C,W1,W2) |
---|---|---|---|
T | T | T | 0.024 |
T | T | F | 0.036 |
T | F | T | 0.162 |
T | F | F | 0.378 |
F | T | T | 0.06 |
F | T | F | 0.06 |
F | F | T | 0.112 |
F | F | F | 0.168 |
Then we can calculate category probabilities as follows:
W1, W2 | P( C=T | W1, W2) | P( C=F | W1, W2) | Classifier's decision |
---|---|---|---|
W1=T, W2=T | 0.29 | 0.71 | F |
W1=T, W2=F | 0.38 | 0.62 | F |
W1=F, W2=T | 0.59 | 0.41 | T |
W1=F, W2=F | 0.69 | 0.31 | T |
These classifiers seem to be equal, but what if dependence W2 on W1 would have another conditional probabilities? Assume there is another conditional probability table:
C | W1 | W2=T | W2=F |
---|---|---|---|
C=T | W1=T | 0.2 | 0.8 |
C=T | W1=F | 0.2 | 0.8 |
C=F | W1=T | 0.5 | 0.5 |
C=F | W1=F | 0.4 | 0.6 |
How it will affect on NB classifier? Updated ddr values and table are below:
C | W1 | W2 | P(C,W1,W2) |
---|---|---|---|
T | T | T | 0.012 |
T | T | F | 0.048 |
T | F | T | 0.108 |
T | F | F | 0.432 |
F | T | T | 0.06 |
F | T | F | 0.06 |
F | F | T | 0.112 |
F | F | F | 0.168 |
Now they significantly differ from 1. So, local dependence is not distributed evenly and NB classifier might produce wrong results. Let's recalculate table for Bayesian network:
W1, W2 | P( C=T | W1, W2) | P( C=F | W1, W2) | Classifier's decision |
---|---|---|---|
W1=T, W2=T | 0.16 | 0.84 | F |
W1=T, W2=F | 0.44 | 0.56 | F |
W1=F, W2=T | 0.49 | 0.51 | F |
W1=F, W2=F | 0.72 | 0.28 | T |
We can see that value for W1=F, W2=T case was changed reflecting changes in local dependency. Consequently, in this case NB classifier is not an optimal solution.