Calculate Response Theme

This option is used to:

For weights of evidence:

For logistic regression:

Guidance Note

The response theme will be given a name, such as SDMUC1, but in the Properties/Source this raster will be seen to stored in the ~sdmtemp folder in your ArcMap working folder, and this raster will be a temporary raster until it is made permanent or the project is saved. Therefore, when the user decides that a response theme should be kept and before the project is saved, it is a good practice to make the raster permanent, move it to the working folder, and give it the name such as SDMUC1. It is best to keep this SDMUC name because all the associated tables will have this prefix in their name. Then if the project MXD file is lost or corrupted the response-theme rasters will still have recognizable names. All of this can be done simply with the Make Permanent tool, which is available when you right click on the raster name in the table of contents.

The evidence rasters selected the Response theme will be recorded in the Description box of the General tab in the raster Properties window.

Section Contents:

This option performs the same weights calculations as those for individual evidential themes when you choose Calculate Theme Weights… but outputs only the W+, Contrast and Variance to tables with a format. Values in these summary tables are used to generate and append the output fields to the unique conditions attribute table. The output response themes are then produced by visualizing the unique conditions gird, symbolized according to attribute fields, such as posterior probability, posterior probability normalized by total uncertainty, and so on. Unlike the Calculate Theme Weights, however, Calculate Response deals with generalization classes that contain no training sites by adding a fractional training site count so the calculation can be completed. This produces an artificial low, but appropriate, weight that allows Calculate Weights to finish.

How the response theme is calculated

Each of the components is described in more detail following this section.

1. The user selects the evidential themes and Gen table that contain the classes to analyze.

2. The evidential themes are combined by generating a unique conditions grid and attribute table.

3a. If weights of evidence is being calculated, weights, variances and contrast are calculated for each evidential theme and written to two tables, a weights table (the WOE table) and a variances table (the WOEVAR table).

Using the previously calculated weights and variances, statistics, including the sum of weights, the posterior logit, posterior probability, uncertainty due to weights, uncertainty due to missing data and the total uncertainty are calculated for each unique condition and appended to the unique conditions grid attribute table. These terms are defined in the papers in the references, particularly WofE3.pdf.

The evidential theme names (field names in the unique conditions grid attribute table) and classes occurring in each unique condition (record) are used to reference the weights and variances found in the weights and variances tables.

3b. If logistic regression is being performed, the unique conditions grid is modified for processing by:

4. The unique conditions grid is symbolized on one of the attributes to create a response theme.

Inputs... - Themes

This dialog box is also used by the  'Generate Neural Network Input Files...' functions.

When you choose the Calculate Response Theme… option, the Inputs... - Themes dialog box is displayed. The title of the dialog will indicate the process for which inputs are being selected, for example, 'Logistic Regression, Weights of Evidence' is displayed in the screen capture, preceding. The names of all of the evidential themes found in the current View are displayed in the left hand column with the exception of the theme specified as the study area theme. Weights will be calculated for all of the evidential themes appearing in the right hand column after the ‘Calculate Responses’ button is clicked.

Note: For the Calculate Response Theme... option, evidential theme names should be kept to 13 characters. Grids can have names containing up to thirteen characters. If this totals more than 13 characters, the name will be truncated. The maximum number of characters a dBase field name can have is 10. While the aliases will appear correctly, the underlying field names will be truncated to conform with this standard. The tables associated with this unique conditions raster will have the same prefix name as the default name to help identify the association.

Select a theme by clicking on its name. Add to the selection set by holding down the Shift-key while clicking on other theme names.

Click...

will move

from to
any selected themes, or the first one if none are selected

Available

Selected
all themes

Available

Selected

any selected themes, or the first one in the list if none are selected

Selected

Available

all themes

Selected

Available

Inputs... – Classes

When at least one evidential theme appears in the Selected… column, the Specify Table... button is enabled. Clicking it displays the Inputs... – Classes dialog.

This dialog box is also used by the 'Generate Neural Network Input Files...'.

 
Parameter Description How to specify
Evidential Themes The names of the evidential themes selected in the previous dialog. N/A
Generalization Tables The list of GEN tables for the evidential theme. In the GEN table, the value2 attribute contains the generalization to be applied. Select the table from the drop-down box.
Missing Data The integer defining areas of missing data. Defaults to the value specified in the Analysis Parameters dialog, if no other value has been set for the evidential theme.
  • If the initial value is incorrect or blank, enter a new value.
  • If the initial value was stored previously, you will be prompted to specify to verify the change.
Data Type Either Free or Ordered. Initial default is 'Ordered'.
  • Select the correct value from the drop-down box.
  • This parameter has no effect on binary evidential themes.
  • If the initial value was stored previously, you will be prompted to verify the change.

Generation of a Unique Conditions Grid and Table

Weights of evidence and logistic regression, as well as other functions in ArcSDM described in their respective sections, use a unique conditions table. Within ArcMap and Spatial Analyst, this table is also the attribute table of a grid, that can be described as a unique conditions raster. Its cell values range from 1 to n, each integer identifying a unique condition or combination of cells values found in the input evidential themes. The raster and its attribute table are created by a single ArcMap request, Combine.

The table created by the Combine request automatically has a Value and Count field, plus one field for each input evidential theme. The field name for each of these fields corresponds to the name of the Generalization Theme that is defined in the computation by the computer. An example of this is shown below in the Attributes of SDMUC14 table. The relationship of these Generalization Theme names to the Evidential Theme name can be found in the WOE (Weigths table), an example is below in a part of Attributes of the sdmuc14_woe table.:

Because of the way in which the Combine request works, a temporary raster is created from each evidential theme actually input to the process. The characteristics of this temporary theme are compared to those source evidential theme in the following table:

  Source Evidential Theme Temporary Evidential Theme
Grid cell values are the... values contained in field 'Value' values contained in the class field of the source evidential theme, specified by the user
Grid cells containing 'No Data' that lie within the study area have the value: 'No Data' Missing Data Integer
Grid cell values lying outside the study area have the value: any value 'No Data'

 

ArcSDM3 also calculates the following for each unique conditions and joins them as attributes to the unique conditions table:

Field Alias Field Name Description
Training Points Trngpoints number of training points occurring in that condition
Area (sq. m) Area_sqm area, measured in square metres

This information is stored in a DBF table with a  name such as SDMUC17_Tbl.dbf. This table is joined to the unique conditions attribute table.

The unique conditions table is used 'as is' by the weights of evidence scripts. The logistic regression code method requires some modifications. These are described in the Logistic Regression section, following:

Weights (WOE) Table

The default name for the weights table is SDMUC#_woe.dbf. It has the following structure:

Field Alias Field Name Description
Evidential Theme Evidence_t name of the evidential theme
Generalization Theme   this is the temporary raster that was created for the generalization of the evidence and used as input to the Combine to make the unique conditions raster. This name appears in the attribute table of the unique conditions raster.
Generalization Table Class_fiel this records the name (not the alias) of the Generalization table for which the weights were calculated
W<#> W<#> this is the template name for each of the fields containing the calculated weights, one for each class that occurs in any of the input evidential themes. If a class of the particular number does not occur in an evidential theme, its cell in that field will be blank
Although it can be easier to read the table if a convention, such as 2 = presence and 1 = absence, any integer values can be used to identify classes. The number of classes that this table format can accommodate is very large, however it is recommended that multi-class evidential themes be limited to small number (typically not more than 5) to facilitate interpretation.
Contrast* Contrast_ the difference between the highest weight and the smallest weight. Note that the 'true' contrast is defined only for binary themes
Confidence Confidence This is the studentized contrast*, which is the contrast divided by its standard deviation

The last row:

Several parameters are written to the last row of the weights table, as a convenient place to reference them. The name of the training point theme is written to the Evidential Theme field; the total number of training points is written to the Generalization Theme field; the total study area in units is written to the first weight field; and the prior probability (the total number of training points divided by the total study area) is written to the third weight field, or contrast* field. Note that these totals are not the values used in weights calculations for an evidential theme that contains areas with missing data.

Variances  (WOEVAR) Table

The default name for the variances table is SDMUC#_woevar.dbf. It has the same structure as the weights table with the following exceptions:

  1. The names of fields containing variances have a V<#> template.
  2. Contrast* and Confidence are not reported.
  3. The training point theme and study area are not reported in the final row.
Next Section Contents Home

The Weights of Evidence (WOFE) Table

ArcSDM calculates the following statistics for each unique condition and writes them to an ArcMap table. The table is based on a dBase file with a default name of SDMUC#_wofe.dbf. The table name defaults to 'Weights of Evidence <#>'. The table is automatically joined to the unique conditions table.

Field Alias Field Name Description
ID ID Unique condition ID
Posterior Probability Post_prob the posterior logit converted to a probability
Normalized Probability Pstprbnrm the posterior probability rescaled so that the overall measure of conditional independence is satisfied *
Posterior Logit Post_logit the sum of weights added to the prior logit
Sum of Weights Sum_weights the sum of the weights for each evidential theme class occurring in the unique condition
Uncertainty Uncertainty the uncertainty due to the calculation of weights (standard deviation)
Missing Data Msng_data uncertainty due to missing data (standard deviation)
Total Uncertainty Tot_uncrty the combined uncertainty due to weights and due to missing data (standard deviation)
* re-scaled probablity by multiplying by Training Points / Sum of (area * probaiblity), where the summation is over all unique conditions. This normalization is not applied to the response theme in logit units.

How the weights table is used to calculate posterior probabilities

One unique condition, or record, in the unique conditions table is processed at a time. For each evidential theme included in the response theme, determined by reading the field names, the class occurring in that unique condition is read. The evidential theme name is then located in the weights table, and the weights calculated for that class is read and added to the sum of weights. The correct weight is identified by the field name in the weights table, i.e. the weight for class 4 in Theme 3 is found in the cell located at the intersection of the record where 'Theme 3' is written in the Evidential Theme field, and the field named W4.

Missing Data

If any of the data input data sets have areas where data are missing, this should be identified during the setting of weights of evidence analysis parameters. Any integer, including zero and negative numbers, may be used to identify areas of missing data. The same number, however, must be used for all data sets when creating a response theme or testing conditional independence (i.e., multiple data sets are being input). Refer to Integer that defines Missing Data.

If areas with missing data are defined using 'No Data' in a grid evidential theme, these areas will be filled in "on-the-fly" with the specified integer.

Weights of evidence handles missing data in the following way:

During the calculations of weights for an evidential theme, the total area is calculated as the total study area less any area where data are missing. The total number of training points is calculated as the total number of points in the study area less any points located in areas where data are missing.

If at least one input evidential theme contains missing data, a field named W<missing data integer> will be included in the weights table. If an evidential theme contains areas of missing data, the cell in the missing data class column will contain zero. If a theme has no missing data, the cell will be blank.

Uncertainty due to missing data

ArcSDM requires that missing data be identified by a value (rather than 'No Data', for example) so that these areas can be captured in the unique conditions grid and attribute table. With the areas of missing data identified in each unique condition, a measure of uncertainty in the posterior probability can be calculated. Depending on the number of classes and evidential themes, and therefore number of unique conditions, and the number of themes in which data are missing, calculating uncertainty due to missing data may be time consuming. An estimation of the length of time it will take to calculate the uncertainty is made, and reported to the user if it is longer than one minute. With rapidly increasing computer speeds, this is only a crude, often inaccurate estimate that is useful guide. The time estimate is based on processing times for a Pentium 133 notebook computer with 48 Mb of RAM. A more powerful computer, or a desk-top computer with the same parameters, will usually perform these calculations much faster. In some situations, such as processing data located across a network, may be considerably slower.

Expert Weights Option

This option allows the user to manipulate the weights that are generated for one or more of the evidential themes input to the model. Instead of using a set of training points to determine weights, the user specifies the model weights, either directly or by allocating the proportion of training points that fall in each class, or by specifying likelihood ratios for each class. This technique can be useful if the study area has not been previously explored, and a set of training points is small or not available. The "points" in this case are purely notional. It is often convenient to use 100 points, then estimate the % points occurring on the class of a theme as a way of subjectively defining importance. As each evidential theme is processed, the user is prompted by the following dialog:

There is not limit to the number of classes that can be used.

On initial display, the dialog has the following settings:

  1. The evidential theme name appears at the far left of the dialog box title.
  2. The initial number of hypothetical "training points" is set to the number specified by the user, such as100, and displayed in a text line in the upper right corner. You cannot change the total number of points from this menu. Note that these points are not given any actual location.
  3. Data about the evidential theme is displayed in the following columns:

Class – The classes found in the specified class field in the theme's attribute table.

% Points – The percentage of points allocated to each class. Initially this is set to be equal to the percentage of the total study area occupied by each class, resulting in weight values of zero. Because the sum of the percentages of points must always add to 100%, the first class is automatically adjusted to make the total equal 100%. If too large of percentages are entered in the higher classes, then this number can become negative. When this first class has a negative percentage, the OK button is inactivated. The classes symbolized in red can be entered by the user. The first class is symbolized in gray to indicate it is not available for user input.

Area – The percentage of the total study area occupied by each class.

Likelihood Ratio – The likelihood ratio calculated based on the specified percentage of points and percentage area for the class, as well as the total number of points, and total area. Initially this value is set to 1. The W+ value is the natural log of the likelihood ratio.

Weights – The W+ calculated for the current class, based on the specified percentage of points and percentage area for the class, as well as the total number of points, and total area. Initially this value is set to 0.

Inputting values

You can edit the % Points, Likelihood Ratios or Weights columns by clicking on the associated radio button, found in the upper left corner of the dialog and editing the values in the text lines. As you change any of these three values, the calculated values for the other two can be updated with the Update button. This will activate the OK button. If the Weights of Likelihood Ratio do not accommodate a percent of points that added to 100% this column will be grayed out.

Note: Weights are always calculated based on the weights displayed in the dialog.

As discussed above in % Points, if you are entering percentage of points, the sum must total 100% to activate the OK button.

Reading Weights from an Existing Table

You can read weights that have been previously calculated and written to a weights table. To do this:

  1. Select a weights table from the combo-box located above the display area on the right side of the dialog.
  2. Click the 'Read Weights' button.
  3. ArcSDM3 will look for the current theme name and field name in the specified weights table and, if found, will update the text lines in the Weights column with the values from the table.

You can then modify the weights, and when you are done, click 'Update' and then 'OK' to continue with calculations.

Except for the user interaction with this dialog, all of the calculations and output are the same as for the 'regular' weights option.

NOTE: It is not possible to check for conditional independence in expert weights because the actual locations of points is notional.

Next Section Contents Home

Logistic Regression

Evidential Themes and the Unique Conditions Table

Logistic regression handles multi-class evidential themes of ordered data but not multi-class free data. This problem is dealt with after the unique conditions grid has been created. ArcSDM determines if there are any multi-class free evidential themes and expands them to a series of binary themes in preparation for running logistic regression. In this way, the same evidential themes can be input to both weights of evidence and logistic regression at the same time.

Actual data sets are not created. A unique conditions table is written. For example, if one of the evidential themes was a geology map with three classes, identified by 1, 2 and 3, three binary "themes" would be generated with the values mapped in the following way:

Theme Initial Class New Class Initial Classes New Class
1 1 1 2 and 3 0
2 2 1 1 and 3 0
3 3 1 1 and 2 0

Missing Data

Logistic regression does not process missing data directly. Instead an area weighted mean of the known class values within the study area is calculated for each evidential theme that contains missing data and substituted for the missing data class. For binary themes that have been generated by the expansion of multi-class free data, the area weighted mean is between 0 and 1.

Temporary Files

During logistic regression processing, several temporary files are written to a directory created by ArcSDM, ~sdmtemp. (NOTE: Please do not use this directory for any other files.) These files are not deleted by ArcSDM but are overwritten the next time that logistic regression is run.

File Name Description
case.dat unique conditions table, processed for input to logistic regression
cumfre.tba cumulative frequencies of probabilties calculated by logistic regression
logco.dat* summary of the coefficients for each evidential theme and their standard deviations
logpol.dat data showing the convergence of the logistic regression coefficients through each iteration on the calculations
logpol.tba* the posterior probability as well as a Student-t, standard deviation, chi-square coefficient, and deviance coefficient for each probability

* Values from logco.dat and logpol.tba are read to ArcMap tables.

Logistic Regression (LOGPOL) Table

The posterior probability for each unique condition, along with its student-T value and standard deviation are written to an ArcMap table. The table is based on a dBase file, default name SDMUC#_logpol.dbf and the default name of the table is 'Logistic Regression <#>'. The table is automatically joined to the unique conditions table.

Field Alias Field Name Description
ID ID unique condition ID
(LR) Posterior Probability Lrpostprob the posterior probability
(LR) TValue Lrtvalue student-T value
(LR) Std. Dev. Lr_std_dev standard deviation

Table of Coefficients

ArcSDM automatically creates a table of the final coefficients generated by logistic regression. In the example of the following table, the evidential theme 'Geolm' is a multi-class free data type evidential theme so it was expanded to three binary theme, each corresponding to the class value reported in brackets in the 'Evidential Theme' field. The coefficient for a theme indicates its relative importance in determining the posterior probabilities. In this case, class 1 of the Geolm theme is most important.

Field Alias Field Name Description
Theme ID Theme_id Unique identifier for the evidential themes
Evidential Theme Theme Theme name, field name (class value (if expanded))
Coefficient Coefficien the coefficient
Standard Deviation Std_dev the standard deviation of the coefficient

 

Next Section Contents Home

Overall Test of Conditional Independence

Once the response theme is complete, ArcSDM reports two measures of conditional independence, a conditional independence (CI) ratio that can be used as an overall assessment of conditional independence among your data sets and the Agterberg-Cheng test of conditional independence.

The CI ratio is calculated as follows:

The product of area and posterior probability summed over each unique condition is the number of points predicted by the model. A ratio is calculated by dividing the actual number of training points input to the model by this predicted number of points. This ratio will generally be between 0 and 1. A value of 1 (never occurs in practice) indicates conditional independence among the evidential themes used in the model. Values much different than 1 indicate a conditional independence problem.

The Agterberg-Cheng test reports the probability that the is not conditionally independent. A large probability indicates that the Response Theme has significant conditional dependency. By calculating response themes with combinations of evidence, it is possible to identify which combination of evidence causes the conditional dependency. Refer to the paper for an explanation of the mathematics.

Next Section Contents Home

Symbolization of the Response Theme

The Response (Raster) Theme (actually the unique conditions raster) is automatically added to the current Data Frame but must be symbolized based on the Posterior Probability attribute by the user. Natural breaks is often a good way to symbolize the Response theme.

Next Section Contents Home

Making a Confidence Map: Normalizing the Posterior Probability by the Total Uncertainty

You can also normalize the probability values by Total Uncertainty. 

  1. Make the Response Theme you want to normalize active.
  2. Double-click the theme’s legend to open the properties dialog and click the symbology tab.
  3. From the Value combo box, Select 'Posterior Probability'
  4. From the ‘Normalization:’ combo box, select ‘Total Uncertainty’.
  5. Click ‘Apply’.
  6. Click the ‘X’ button to close the dialog.

You can change the Theme’s name to reflect the legend by selection General tab in the Properties from the Theme menu.

Dividing the posterior probability (not the normalized posterior probability) by the total uncertainty provides a map of the informal "studentized" posterior probability. If enough training points are being used, then regions with values > about 2 have a high degree of "certainty" (with regard to variances of weights and variance due to missing data). This map is useful in a relative sense for highlighting regions with low or high confidence that the reported posterior probability is not zero.

Next Section Contents Home