Neural Network

Lecture notes presented by Carl Looney as part of a training workshop on Arc-SDM and DataXplore, held at the Geological Survey of Canada, Ottawa, October 26-29, 1999.

Neural Network Method

Arc-SDM uses a separate module called DataXplore to perform neural network analysis. (DataXplore can also be run separately from ArcView using data from other sources. The file formats required by DataXplore are documented in readme.txt as well as this user guide.) This section describes terms and concepts in neural network analysis and DataXplore as they relate both to Arc-SDM and mineral exploration.

DataXplore	Arc-SDM
Dictionary of Terms
Feature or attribute	Evidential theme class
Feature Vector	Unique condition
Training vector	Unique condition at location of training point

DataXplore provides two types of neural network algorithms: a radial basis functional link algorithm (RBFLN) and a fuzzy clustering algorithm. The RBFLN neural network requires a set of training vectors on which the algorithm trains or learns. The training process results in a series of parameters that define a systematic relationship between the input (training) vectors and the target vector. The classification part of the algorithm then uses this relationship to classify data for which the target is unknown.

In the case of mineral exploration, the target vector consists of the presence or absence of a mineral occurrence. The training vectors contain data about the characteristics associated with occurrences, i.e., the evidence at the locations of, the presence or absence of mineral occurrences. The training vectors consist of the values from a set of evidential themes found at the location of two types of training points, those of known mineral occurrences and those locations where no mineralization occurs. They are determined by generating a unique conditions grid/table from the selected evidential themes, then reading the values from the unique conditions attribute table at the location of training points and writing the vectors to a text file for input to DataXplore.

The entire unique conditions table is also written to a separate text file. This is the data for which the target vector is unknown. In the context of neural network analysis, each unique condition is described as a feature vector. The RBFLN algorithm uses what it has learned from the training point data and uses it to find patterns and classify the entire unique conditions table. It provides as output a measure between 0 and 1 as to the similarity to the

The unique conditions grid and table generated as input to DataXplore is essentially the same as that generated for weights of evidence and logistic regression. Generally, however, the evidential themes used to create the unique conditions grid will be less generalized, in order to allow the neural network algorithms as much information as possible to classify the data. As the number of classes in the evidential themes is increased, the number of unique conditions can increase exponentially. While theortically there isn't a limit on the number of unique conditions, there may be a practical limit when processing. The training data is a subset of the unique conditions.

In Arc-SDM, the interface for creating the neural network input files is the same as that for other Arc-SDM functions that generate a unique conditions grids (i.e., Calculate Response Theme...; Check Conditional Independence...). The unique conditions data and training data are automatically written to files in the correct format for DataXplore.

The user, however, performs each step for Neural Network anlaysis separately:

generate the input files
run the neural network module, DataXplore
read the results from DataXplore into ArcView

The following section describes the neural network methods used by DataXplore and their algorithms. This material was presented by Carl Looney as part of an Arc-SDM and DataXplore training workshop held in October, 1999.

Fuzzy Clustering
A Fuzzy Clustering Algorithm
Radial Basis Functional Link Nets (Neural Networks)
Our New Radial Basis Functional Link Algorithm
The Format of the Input Data Files

DataXplore contains 2 major functions:

the Fuzzy Clustering function
the Radial Basis Functional Link Net function

These operate on a table of data to partition the data into useful sets.

1. Fuzzy Clustering

A feature vector is a vector

x = (x₁,...,x_N)

where each component x_n is the value of a feature, or attribute. (unique condition) There are N features for any object in a population of objects. Each represents a measurement of a certain attribute of the objects.

A feature determines a column (or layer) in a data set. For example: suppose there is a set of Q feature vectors

{x⁽¹⁾,..., x^(q)}

to be tabularized as follows:

x₁⁽¹⁾, x₂⁽¹⁾,..., x_N⁽¹⁾ (feature vector 1)
x₁⁽²⁾, x₂⁽²⁾,..., x_N⁽²⁾ (feature vector 2)
...........
...........
x₁^(Q), x₂^(Q),..., x_N^(Q) (feature vector Q)

Each vector represents an object (or entity) in a a population of objects. The goal is to partition the population into classes (or subpopulations) of objects by partitioning the set of feature vectors that represent the objects.

The objects are rather similar within classes and rather different between classes. A class represents objects with certain ranges of feature values (associates with certain class properties). The process of partitioning a set of feature vectors into classes is called clustering (also classification).

Consider the following set of vectors in the plane.

The feature values include errors and measurement noise but in an average sense they fall into 2 subpopulations (classes) here.

2. A Fuzzy Clustering Algorithm

The weighted fuzzy expected value (WFEV) of a set of values s₁,... s_p is obtained by

i) initializing with sample average µ = (1/P)(s₁+ ... + s_p)

ii) computing fuzzy weights

		(P = 1,..., P)
		(standardize weights)

iii) computing WFEV

µ = _(p=1,P)W_ps_p

(weighted average)

iv) if (stop_criterion) then stop

The WFEV weights more densely located points more than those farther away and weights outliers less. Our fuzzy clustering algorithm uses the WFEV. The goal is to cluster the set of feature vectors {x⁽¹⁾, ..., x^(q)}.

i) input a number K of classes that is larger than the expected number (extra classes will be merged)

ii) assign first K of the Q vectors as cluster centers (after MacQueen's K-means algorithm) z ⁽¹⁾, ..., z^(K)

iii) for q = 1 to Q do

assign x^(q) to closest center z^(k) by c[q]=k (after MacQueen's algorithm)

iv) find WFEV of each cluster to obtain new centers {z^(k)}

v) if (any center changes more than ) goto Step iii) above

else, continue

vi) compute weighted fuzzy variance of each cluster and WFEV d_WFEV of distances between centers

vii) for k = 1 to K-1 do

for kk = k+1 to K do

if distance(z^(k), z^(kk)) < d_WFEV then merge(k, kk)

3. Radial Basis Functional Link Nets (Neural Networks)

The radial basis functional link network is one type of neural network. Two others are the multiple layered perceptron (MLP) and the radial basis function neural network (RBFN). A radial basis functional link net transforms each N-dimensional input feature vector into an output target vector

x = (x₁,..., x_N) NN t = (t₁,...,t_J)

The target vector t is a codeword that represents a class. This is called supervised learning because the network must be told the class for each input feature vector x.

Neural networks have a relative large number of parameters that can be thought of as knobs (or dials). The parameters are also known as weights. During training a set of feature vectors are presented to the network and the knobs are adjusted until each feature vector is mapped to its known target vector. These feature vectors are called training vectors when used to train the network.

Any input feature vector is mapped into an actual output vector z by the network

feature vector network actual output vector target vector

x = (x₁,...,x_N) NN z = (z₁,..., z_J) e t = (t₁,..., t_J)

The error to be minimized over all Q input feature vectors is

A radial basis function (RBF) is a Gaussian function. It has a center vector v and processes any input vector x via

y = f(x;v) = exp[-(x - v)²/(2²)] (0 < y 1)

Each middle-layer node in a RBFN or RBFLN contains an RBF whose output fans out to each node in the output layer.

For an input vector x, the outputs from the m^th node in the middle-layer and j^th node in the output-layer are

We adjust the weights u_mj, v_mj and b_j (the knobs) by steepest descent (with gain ) via to minimize the total sum-squared error

4. Our New Radial Basis Functional Link Algorithm

= (1/4)(1/M)1/N ;	// set for the RBF width
draw_weights();	// draw random weights, -0.5 to 0.5
E_old = evalnet();	// evalnet() updates the NN, gets E
do
for j = 1 to J do	// adjust weights on input node lines
for n = 1 to N do v_nj = v_nj - ß₁(E/v_nj);
E_new = evalnet();
if (E_new < E_old) then ß₁ = ß₁ * 1.24;
else ß1 = ß₁ * 0.96;
E_old = E_new;
for j = 1 to J do	// adjust weights on hidden node lines
for m = 1 to M do u_mj = u_mj - ß₂(E/u_mj)
Enew = evalnet();
if (E_new < E_old) then ß₂ = ß₂ * 1.24;
else ß₂ = ß₂ * 0.96
E_old = E_new;
Iterations = Iterations + 1;
if (Iterations > I) then exit;
} while (E_new > 0.02);

The function draw_weights() draw initial weights and the function evalnet() updates the actual outputs {z^(q)} and error E. The biases b_j are adjusted similarly (not shown here).

5. The Format of the Input Data Files

The data that are to be either i) clustered/merged into classes; or ii) used to train the RBFLN or be processed by the trained RBFLN are of the same format for convenience.

NMJQ
x1(1), x2(1),..., xN(1), t1(1)
x1(2), x2(2),..., xN(2), t1(2)
. . . . . . . . .
. . . . . . . . .
x1(Q), x2(Q), ..., xN(Q), t1(Q) (input/output pair 1)
(input/output pair 2)
.. .. ..
.. .. ..
(input/output pair Q)

where

N = number of features in each vector
M = number of nodes (RBFs) in middle layer
J = number of components in output codeword (for class no.)
Q = number of feature vector/output pairs

x1⁽¹⁾, x2⁽¹⁾, ..., x_N⁽¹⁾ = first input feature vector
t1⁽¹⁾= first target output value (vector, more generally)

Note: Fuzzy clustering does NOT use the outputs {t₁^(q)} not M.

Because the input data is geological/geographical data, the actual data files contain 3 extra values that are not used in both clustering and in training of the RBFLN. The input data files are:

N M J Q g₁⁽¹⁾, g₂⁽¹⁾, g₃⁽¹⁾, x₁⁽¹⁾, x₂⁽¹⁾, ..., x_N⁽¹⁾, t₁⁽¹⁾g₁⁽²⁾, g₂⁽²⁾, g₃⁽²⁾, x₁⁽²⁾, x₂⁽²⁾, ..., x_N⁽²⁾, t₁⁽²⁾................ ................ g₁^(Q), g₂^(Q), g₃^(Q), x₁^(Q), x₂^(Q), ..., X_N^(Q), t₁(Q)

The first value g₁^(q) in each row is the ID number of specimen from which the features were obtained.

The outputs after the fuzzy clustering of the RBFLN opeartion have the following output results files have format:

g₁⁽¹⁾, c⁽¹⁾, f₁⁽¹⁾, ..., f_K⁽¹⁾g₁⁽²⁾, c⁽²⁾, f₁⁽²⁾, ..., f_K⁽²⁾.... .... g₁^(Q), c^(Q), f₁^(Q), ..., f_K^(Q)

where

g₁^(q) is the same as in the input file above and c^(q) is the class number (fuzzy clustering) or output code (RBFLN) that represents the class. The values f₁^(q), ..., f_K^(q) are the fuzzy membership values, respectively that input vector q belongs to class k = 1, ..., K.