Calculate Response Theme

This option is used to:

For weights of evidence:

For logistic regression:

Section Contents:

This option performs the same weights calculations as those for individual evidential themes when you choose Calculate Theme Weights… but outputs only the W+, Contrast and Variance to tables with a format. Values in these summary tables are used to generate and append the output fields to the unique conditions attribute table. The output response themes are then produced by visualizing the unique conditions gird, symbolized according to attribute fields, such as posterior probability, posterior probability normalized by total uncertainty, and so on.

How the response theme is calculated

Each of the components is described in more detail following this section.

1. The user selects the evidential themes and attribute fields that contain the classes to analyze.

2. The evidential themes are combined by generating a unique conditions grid and attribute table.

3a. If weights of evidence is being calculated, weights, variances and contrast are calculated for each evidential theme and written to two tables, a weights table and a variances table.

Using the previously calculated weights and variances, statistics, including the sum of weights, the posterior logit, posterior probability, uncertainty due to weights, uncertainty due to missing data and the total uncertainty are calculated for each unique condition and appended to the unique conditions grid attribute table.

The evidential theme names (field names in the unique conditions grid attribute table) and classes occurring in each unique condition (record) are used to reference the weights and variances found in the weights and variances tables.

3b. If logistic regression is being performed, the unique conditions grid is modified for processing by:

4. The unique conditions grid is symbolized on one of the attributes to create a response theme.

Inputs... - Themes

This dialog box is also used by the 'Check Conditional Independence...' and 'Generate Neural Network Input Files...' functions.

When you choose the Calculate Response Theme… option, the Inputs... - Themes dialog box is displayed. The title of the dialog will indicate the process for which inputs are being selected, for example, 'Logistic Regression, Weights of Evidence' is displayed in the screen capture, preceding. The names of all of the evidential themes found in the current View are displayed in the left hand column with the exception of the theme specified as the study area theme. Weights will be calculated for all of the evidential themes appearing in the right hand column after the ‘Calculate Responses’ button is clicked.

Note: For the Calculate Response Theme... option, evidential theme names should be kept to 13 characters, or to 10 characters if the pair-wise conditional independence test is run. Grids can have names containing up to thirteen characters. If this totals more than 13 characters, the name will be truncated. The pair-wise conditional independence test produces a set of dBase tables. The maximum number of characters a dBase field name can have is 10. While the aliases will appear correctly, the underlying field names will be truncated to conform with this standard.

Select a theme by clicking on its name. Add to the selection set by holding down the Shift-key while clicking on other theme names.

Click...

will move

from to
any selected themes, or the first one if none are selected

Available

Selected
all themes

Available

Selected

any selected themes, or the first one in the list if none are selected

Selected

Available

all themes

Selected

Available

 

Inputs... – Classes

When at least one evidential theme appears in the Selected… column, the Specify Fields... button is enabled. Clicking it displays the Inputs... – Classes dialog.

This dialog box is also used by the 'Check Conditional Independence...' function and 'Generate Neural Network Input Files...'.

 
Parameter Description How to specify
Evidential Themes The names of the evidential themes selected in the previous dialog. N/A
Class Fields The list of fields in the evidential theme attribute table containing only integer values. That is, any fields potentially defining the classes you want to analyse. Default is the valid field farthest right in the attribute table. Select the field from the drop-down box.
Missing Data The integer defining areas of missing data. Defaults to the value specified in the Analysis Parameters dialog, if no other value has been set for the evidential theme.
  • If the initial value is incorrect, enter a new value.
  • If the initial value was stored previously, you will be prompted to specify to verify the change.
Data Type Either Free or Ordered. Initial default is 'Ordered'.
  • Select the correct value from the drop-down box.
  • This parameter has no effect on binary evidential themes.
  • If the initial value was stored previously, you will be prompted to verify the change.

Generation of a Unique Conditions Grid and Table

Weights of evidence and logistic regression, as well as other functions in Arc-SDM described in their respective sections, use a unique conditions table. Within ArcView and Spatial Analyst, this table is also the attribute table of a grid, that can be described as a unique conditions grid. Its cell values range from 1 to n, each integer identifying a unique condition or combination of cells values found in the input evidential themes. The grid and its attribute table are created by a single Avenue request, Combine.

The table created by the Combine request automatically has a Value and Count field, plus one field for each input evidential theme. The field name for each of these fields corresponds to the name of the evidential theme:

Because of the way in which the Combine request works, a temporary grid is created from each evidential theme actually input to the process. The characteristics of this temporary theme are compared to those source evidential theme in the following table:

  Source Evidential Theme Temporary Evidential Theme
If the source evidential theme is a polygon feature theme, it is converted to a grid theme. polygon feature theme or integer grid theme integer grid theme
Grid cell values are the... values contained in field 'Value' values contained in the class field of the source evidential theme, specified by the user
Grid cells containing 'No Data' that lie within the study area have the value: 'No Data' Missing Data Integer
Grid cell values lying outside the study area have the value: any value 'No Data'

 

Arc-SDM also calculates the following for each unique conditions and appends them as attributes to the unique conditions table:

Field Alias Field Name Description
Training Points Trngpoints number of training points occurring in that condition
Area (sq. m) Area_sqm area, measured in square metres

The unique conditions table is used 'as is' by the weights of evidence scripts. The logistic regression code method requires some modifications. These are described in the Logistic Regression section, following:

Weights Table

The default name for the weights table is woe#.dbf. It has the following structure:

Field Alias Field Name Description
Evidential Theme Evidence_t name of the evidential theme
Class Field Class_fiel this records the name (not the alias) of the field that contains the classes for which the weights were calculated
W<#> W<#> this is the template name for each of the fields containing the calculated weights, one for each class that occurs in any of the input evidential themes. If a class of the particular number does not occur in an evidential theme, its cell in that field will be blank
Although it can be easier to read the table if a convention, such as 2 = presence and 1 = absence, any integer values can be used to identify classes. The number of classes that this table format can accommodate is very large, however it is recommended that multi-class evidential themes be limited to small number (typically not more than 5) to facilitate interpretation.
Contrast* Contrast_ the difference between the highest weight and the smallest weight. Note that the 'true' contrast is defined only for binary themes
Confidence Confidence This is the studentized contrast*, which is the contrast divided by its standard deviation

The last row:

Several parameters are written to the last row of the weights table, as a convenient place to reference them. The name of the training point theme is written to the Evidential Theme field; the total number of training points is written to the first weight field; the total study area in units is written to the second weight field; and the prior probability (the total number of training points divided by the total study area) is written to the third weight field, or contrast* field. Note that these totals are not the values used in weights calculations for an evidential theme that contains areas with missing data.

Variances Table

The default name for the variances table is woevar#.dbf. It has the same structure as the weights table with the following exceptions:

  1. The names of fields containing variances have a V<#> template.
  2. Contrast* and Confidence are not reported.
  3. The training point theme and study area are not reported in the final row.
Next Section Contents Home

The Weights of Evidence Table

Arc-SDM calculates the following statistics for each unique condition and writes them to an ArcView table. The table is based on a dBase file with a default name of Arc-SDM<#>.dbf. The table name defaults to 'Weights of Evidence <#>'. The table is automatically joined to the unique conditions table.

Field Alias Field Name Description
ID ID Unique condition ID
Posterior Probability Post_prob the posterior logit converted to a probability
Normalized Probability Pstprbnrm the posterior probability rescaled so that the overall measure of conditional independence is satisfied *
Posterior Logit Post_logit the sum of weights added to the prior logit
Sum of Weights Sum_weights the sum of the weights for each evidential theme class occurring in the unique condition
Uncertainty Uncertainty the uncertainty due to the calculation of weights (standard deviation)
Missing Data Msng_data uncertainty due to missing data (standard deviation)
Total Uncertainty Tot_uncrty the combined uncertainty due to weights and due to missing data (standard deviation)
* re-scaled probablity by multiplying by Training Points / Sum of (area * probaiblity), where the summation is over all unique conditions. This normalization is not applied to the response theme in logit units.

How the weights table is used to calculate posterior probabilities

One unique condition, or record, in the unique conditions table is processed at a time. For each evidential theme included in the response theme, determined by reading the field names, the class occurring in that unique condition is read. The evidential theme name is then located in the weights table, and the weights calculated for that class is read and added to the sum of weights. The correct weight is identified by the field name in the weights table, i.e. the weight for class 4 in Theme 3 is found in the cell located at the intersection of the record where 'Theme 3' is written in the Evidential Theme field, and the field named W4.

Missing Data

If any of the data input data sets have areas where data are missing, this should be identified during the setting of weights of evidence analysis parameters. Any integer, including zero and negative numbers, may be used to identify areas of missing data. The same number, however, must be used for all data sets when creating a response theme or testing conditional independence (i.e., multiple data sets are being input). Refer to Integer that defines Missing Data.

If areas with missing data are defined using 'No Data' in a grid evidential theme, these areas will be filled in "on-the-fly" with the specified integer.

Weights of evidence handles missing data in the following way:

During the calculations of weights for an evidential theme, the total area is calculated as the total study area less any area where data are missing. The total number of training points is calculated as the total number of points in the study area less any points located in areas where data are missing.

If at least one input evidential theme contains missing data, a field named W<missing data integer> will be included in the weights table. If an evidential theme contains areas of missing data, the cell in the missing data class column will contain zero. If a theme has no missing data, the cell will be blank.

Uncertainty due to missing data

The extension requires that missing data be identified by a value (rather than 'No Data', for example) so that these areas can be captured in the unique conditions grid and attribute table. With the areas of missing data identified in each unique condition, a measure of uncertainty in the posterior probability can be calculated. Depending on the number of classes and evidential themes, and therefore number of unique conditions, and the number of themes in which data are missing, calculating uncertainty due to missing data may be time consuming. An estimation of the length of time it will take to calculate the uncertainty is made, and reported to the user if it is longer than one minute.

At the time of reporting, the user can choose to skip over the calculation of uncertainty due to missing data. The missing data and total uncertainty fields will be omitted from the unique conditions table. (Without the missing data component, total uncertainty will be equal to uncertainty due to weights.) The time estimate is based on processing times for a Pentium 133 notebook computer with 48 Mb of RAM. A more powerful computer, or a desk-top computer with the same parameters, will usually perform these calculations much faster. In some situations, such as processing data located across a network, may be considerably slower. The time required in most cases, however, is an over-estimate.

Expert Weights Option

This option allows the user to manipulate the weights that are generated for one or more of the evidential themes input to the model. Instead of using a set of training points to determine weights, the user specifies the model weights, either directly or by allocating the proportion of training points that fall in each class, or by specifying likelihood ratios for each class. This technique can be useful if the study area has not been previously explored, and a set of training points is small or not available. The "points" in this case are purely notional. It is often convenient to use 100 points, then estimate the % points occurring on the class of a theme as a way of subjectively defining importance. As each evidential theme is processed, the user is prompted by the following dialog:

You can set expert weights for up to 10 classes. If there are more than 10 classes in the class field you specified, you will be asked if you want to cancel the weights calculations or if you want to omit the evidential theme from your model.

On initial display, the dialog has the following settings:

  1. The evidential theme name appears at the far left of the dialog box title.
  2. The initial number of hypothetical "training points" is set to 100, and displayed in a text line in the upper right corner. You can type in any number of points, and can control the value of the prior probability in this way. Note that these points are not given any actual location.
  3. Data about the evidential theme is displayed in the following columns:

Class – The classes found in the specified class field in the theme's attribute table.

% Points – The percentage of points allocated to each class. Initially this is set to be equal to the percentage of the total study area occupied by each class, resulting in weight values of zero.

Area – The percentage of the total study area occupied by each class.

Likelihood Ratio – The likelihood ratio calculated based on the specified percentage of points and percentage area for the class, as well as the total number of points, and total area. Initially this value is set to 1. The W+ value is the natural log of the likelihood ratio.

Weights – The W+ calculated for the current class, based on the specified percentage of points and percentage area for the class, as well as the total number of points, and total area. Initially this value is set to 0.

Inputting values

You can edit the % Points, Likelihood Ratios or Weights columns by clicking on the associated radio button, found in the upper left corner of the dialog and editing the values in the text lines. As you change any of these three values, the calculated values for the other two will be updated.

Note: Weights are always calculated based on the % Points displayed in the dialog.

When you close the dialog, the total percentage of points must sum to 100%. You can automatically adjust your percentages so they total 100 by clicking the 'Normalize' button.

Reading Weights from an Existing Table

You can read weights that have been previously calculated and written to a weights table. To do this:

  1. Select a weights table from the combo-box located above the display area on the right side of the dialog.
  2. Click the 'Read Weights' button.
  3. Arc-SDM will look for the current theme name and field name in the specified weights table and, if found, will update the text lines in the Weights column with the values from the table.

You can then modify the weights, and when you are done, click 'OK' to continue with calculations.

Except for the user interaction with this dialog, all of the calculations and output are the same as for the 'regular' weights option.

NOTE: It is not possible to check for conditional independence in expert weights because the actual locations of points is notional.

Next Section Contents Home

Logistic Regression

Evidential Themes and the Unique Conditions Table

Logistic regression handles multi-class evidential themes of ordered data but not multi-class free data. This problem is dealt with after the unique conditions grid has been created. Arc-SDM determines if there are any multi-class free evidential themes and expands them to a series of binary themes in preparation for running logistic regression. In this way, the same evidential themes can be input to both weights of evidence and logistic regression at the same time.

Actual data sets are not created. A unique conditions table is written. For example, if one of the evidential themes was a geology map with three classes, identified by 1, 2 and 3, three binary "themes" would be generated with the values mapped in the following way:

Theme Initial Class New Class Initial Classes New Class
1 1 1 2 and 3 0
2 2 1 1 and 3 0
3 3 1 1 and 2 0

Missing Data

Logistic regression does not process missing data directly. Instead an area weighted mean of the known class values within the study area is calculated for each evidential theme that contains missing data and substituted for the missing data class. For binary themes that have been generated by the expansion of multi-class free data, the area weighted mean is between 0 and 1.

Temporary Files

During logistic regression processing, several temporary files are written to a directory created by Arc-SDM, ~sdmtemp. (NOTE: Please do not use this directory for any other files.) These files are not deleted by Arc-SDM but are overwritten the next time that logistic regression is run.

File Name Description
case.dat unique conditions table, processed for input to logistic regression
cumfre.tba cumulative frequencies of probabilties calculated by logistic regression
logco.dat* summary of the coefficients for each evidential theme and their standard deviations
logpol.dat data showing the convergence of the logistic regression coefficients through each iteration on the calculations
logpol.tba* the posterior probability as well as a Student-t, standard deviation, chi-square coefficient, and deviance coefficient for each probability

* Values from logco.dat and logpol.tba are read to ArcView tables.

Logistic Regression Table

The posterior probability for each unique condition, along with its studen-T value and standard deviation are written to an ArcView table. The table is based on a dBase file, default name logpol<#>.dbf and the default name of the table is 'Logistic Regression <#>'. The table is automatically joined to the unique conditions table.

Field Alias Field Name Description
ID ID unique condition ID
(LR) Posterior Probability Lrpostprob the posterior probability
(LR) TValue Lrtvalue student-T value
(LR) Std. Dev. Lr_std_dev standard deviation

Table of Coefficients

Arc-SDM automatically creates a table of the final coeffcients generated by logistic regression. In the example of the following table, the evidential theme 'Geolm' is a multi-class free data type evidential theme so it was expanded to three binary theme, each corresponding to the class value reported in brackets in the 'Evidential Theme' field. The coeffcient for a theme indicates its relative importance in determining the posterior probabilities. In this case, class 1 of the Geolm theme is most important.

Field Alias Field Name Description
Theme ID Theme_id Unique identifier for the evidential themes
Evidential Theme Theme Theme name, field name (class value (if expanded))
Coefficient Coefficien the coefficient
Standard Deviation Std_dev the standard deviation of the coefficient

 

Next Section Contents Home

Options to run CI test and Associate probabilities with point theme functions

Once the response theme has been calculated, an option is given to run the pair-wise conditional independence test. Running the test at this point is slightly faster than running it from the menu option because the unique conditions table has already been created for the response theme and is used as the basis for the test.

Overall Test of Conditional Independence

Once the response theme is complete, Arc-SDM reports a ratio that can be used as an overall assessment of conditional independence among your data sets. This ratio is calculated as follows:

The product of area and posterior probability summed over each unique condition is the number of points predicted by the model. A ratio is calculated by dividing the actual number of training points input to the model by this predicted number of points. This ratio will always be between 0 and 1. A value of 1 (never occurs in practice) indicates conditional independence among the evidential themes used in the model. Values much smaller than 1 indicate a conditional independence problem.

If you choose to run the pair-wise test of conditional independence when prompted, the overall test result will be written to the last row of the probability table.

Next Section Contents Home

Symbolization of the Response Theme

The Response (Grid) Theme (actually the unique conditions grid) is automatically added to the current View and symbolized based on the Posterior Probability attribute, using 7 classifications defined by ArcView’s natural breaks method. (Fewer classifications are applied if there are fewer than 7 records or values in the response theme attribute table.)

The following is the RGB colour palette used:

Classification RGB Code
1 0,106,255
2 0,233,255
3 85,255,0
4 191,255,0
5 255,212,0
6 255,106,0
7 255,0,0

For options on symbolizing the reponse theme, see the section describing symbolization tools.

Making a Confidence Map: Normalizing the Posterior Probability by the Total Uncertainty

You can also normalize the probability values by Uncertainty (due to weights) if, for example, you did not elect to calculate uncertainty due to missing data.

  1. Make the Response Theme you want to normalize active.
  2. Double-click the theme’s legend to open the legend editor dialog.
  3. From the ‘Normalize by:’ combo box, select ‘Total Uncertainty’.
  4. Click ‘Apply’.
  5. Click the ‘X’ button to close the dialog.

You can change the Theme’s name to reflect the legend by selection Properties from the Theme menu.

Dividing the posterior probability (not the normalized posterior probability) by the total uncertainty provides a map of the informatl "studentized" posterior probability. If enough training points are being used, then regions with values > about 2 have a high degree of "certainty" (with regard to variances of weights and variance due to missing data). This map is useful in a relative sense for highlighting regions with low or high confidence.

Next Section Contents Home