Check Conditional Independence…

The calculation of weights of evidence assumes conditional independence among the evidential themes input to the model. One technique to assess the conditional independence between pairs of binary evidential themes is to calculate a Chi-squared statistic to assess the variation between the expected and observed occurrences on and off the patterns in the two themes.

Pairwise test for conditional independence

The pairwise test between two evidential themes involves a contingency table calculation, applicable only to locations at which training points occur. The rows of the contingency table are the classes of one theme, and the columns of the contingency table are the classes of the second theme. Each cell (i,j) of the table records the number of training points occurring for a specific overlap of the i-th class of theme 1 and the j-th class of theme 2. Note that these numbers are independent of the unit cell area, because they are counts of points.

The calculation of chi-squared involves estimating the expected number of training points in each cell, under an assumption that theme i is independent of theme j. The expected value in a cell is calculated as the product of the marginal point totals divided by the grand total number of points. Chi-squared is a measure of the differences between the observed and expected frequencies, summed over all the cells of the table.

The null hypothesis of conditional independence is tested by determining if the measured chi-squared value exceeds a theoretical chi-squared value, given the number of degrees of freedom (ndf) and the level of significance. The ndf is the number of rows - 1 times the number of columns - 1. So for binary themes, the ndf=1. The level of significance for most tests is taken as 95%, equal to (1-probability)or p=0.05.

In WofE, three triangular matrices are produced as tables, one containing chi-squared values, one with number of degrees of freedom and the third with the corresponding probability value. The probability values are produced first, and the chi-squared and ndf tables are options.

A probability value <=0.05 means that there is no reason to reject the CI assumption at the 95% level of significance. A probability value of 0.001 means that there is no reason to reject the CI assumption at the 99.9% level of significance. Small values of probability indicate conditional independence--and the smaller the value the greater the indication of conditional independence.

Note that Yates' correction for small expected frequencies is applied automatically.

Warning: The chi-squared distribution becomes a poor test in tables where the expected frequencies are less than 5 in any cell (even with Yates' correction) . In practice, this implies that if evidential themes with many classes are used in the test, particularly in datasets with a small total number of training points, the resulting probability values may be in error, and should be interpreted accordingly.

See Bonham-Carter (1994, p.313-315) for a full discussion and example.

Arc-WofE's conditional independence test produces a table of probabilities.

In the example of prb-16.dbf, above, each pair of evidential themes was tested. The table is written like a spreadsheet with the first to second from last evidential themes written column headings and the second to the last evidential themes written to rows in the first column. A chi-squared statistic was calculated for the observed number of training points falling in each class compared to the expected number of training points. Null hypothesis of CI is not rejected (i.e. CI is accepted) at the (1-probability) level of significance.

The number of degrees of freedom depends on the number of classes, excluding missing data, that occur in both evidential themes being tested.

degrees of freedom = (# classes in theme1 - 1) * (# classes in theme2 - 1)

Optionally, two tables with the same structure as the probability table can be optionally produced: one contains the chi-squared statistics and the other, the degrees of freedom.

Note: The algorithm used for calculating the probabilities iterates a maximum number of times. If a probability is not determined before this maximum is reached, a Null value is returned and the cell in the probability table is blank. Blank cells correspond to high values for the Chi-squared statistic and suggest a problem with conditional independence.

How to create a table of probabilities to assess conditional independence

Select 'Check Conditional Independence...' from the Weights of Evidence menu.

From the dialog box titled 'Inputs to Conditional Independence Test - Themes', select the evidential themes you would like to test and add them to the list of selected themes. This dialog is the same as the one used to select themes in the Calculate Response Theme function. For more information, refer to the description of the dialog.

Click the 'Specify Fields...' button. For each of the evidential themes you selected in the previous step, specify the field that contains the classes for you would like to test conditional independence. Then click 'OK'. (For more information about this dialog, refer to the Calculate Response Theme function.)

Click the 'Conditional Independence' button.

You will be prompted for names for the output dBase file/Table document names in the following sequence:

Table of Probabilities – This table is always produced. The default name is Prb-<#>.dbf.

Table of Chi-squared Statistics and table of Degrees of Freedom

These two tables go together since one is meaningless without the other. They are optional. To skip over them, click 'Cancel' when prompted for a file name for the chi-squared table. If you skip the chi-squared table, the degrees of freedom table will also be skipped. (If you specify a name for the chi-squared table, but then 'Cancel' the degrees of freedom table, the chi-squared table will not be produced.) The default name for the chi-squared table is X2-<#>.dbf and the default name for the degrees of freedom table is Df-<#>.dbf.

Note: The algorithms that derive the probability from the Chi-squared statistic and number of degrees of freedom return a null value to the probability table after iterating a set number of times. A null value (blank cell) in the probability table corresponds to a large Chi-squared value. In the case of a null value, the probability can only be determined by referring to a Chi-squared table.

Next Top of Section Home