An Adaptive Conflict-Resolving Neural-Fuzzy Classifier€¦ · · 2006-09-30An Adaptive...

An Adaptive Conflict-Resolving Neural-Fuzzy Classifier

SHING CHIANG TAN1 MACHAVARAM VENKATA CHALAPATHY RAO2

Faculty of Information Science & Technology1 Faculty of Engineering & Technology2

Multimedia University Melaka Campus, Jalan Ayer Keroh Lama, Bukit Beruang, 75450 Melaka

MALAYSIA

CHEE PENG LIM

School of Electrical & Electronic Engineering University of Science Malaysia

Engineering Campus, 14300 Nibong Tebal, Penang MALAYSIA

Abstract: - This paper presents a novel soft computing classifier design based on neural networks and a constructive conflict-resolving algorithm. Specifically, the classifier is a hybrid model using the supervised Fuzzy ARTMAP (FAM) network and the Dynamic Decay Algorithm (DDA). The proposed model, called FAM-DDA, inherits the advantages of the predecessors, i.e., stable and incremental learning, compounded by dynamic width adjustment of the weights of conflicting classes. Classification performance of the FAM-DDA is evaluated using a number of benchmark datasets. The results are analyzed and compared with those from a number of machine learning approaches. Implications of the effectiveness of the FAM-DDA are discussed. Key-Words:-Adaptive Resonance Theory, Fuzzy ARTMAP, Dynamic Decay Algorithm, Pattern Classification 1 Introduction Adaptive Resonance Theory (ART), which originated from the pioneering work of Grossberg [1], was designed primarily to undertake the stability-plasticity dilemma. To date, numerous neural network models, which incorporate ART�s feature for unsupervised/supervised learning of binary/analog patterns, have been proposed, e.g. ART-1 [2-3], ARTMAP [4], Fuzzy ART [5], Fuzzy ARTMAP [6], ART-EMAP [7], and Distributed ART [8]. In general, all ART-based models possess an incrementally growing architecture, and are capable of learning new data continuously without forgetting previously learned information. This incremental learning property, in addition to the unique fast learning mode, makes the ART networks a promising tool to deal with real-world tasks.

Learning of the ART networks normally incurs self-organizing similar input patterns by means of categorical prototypes. On presentation of an input pattern to an ART network, similarity between the input pattern and the prototypes is measured. If the input pattern is not sufficiently similar to the existing prototypes, it will be accommodated by a new one. Real-world datasets, which consist of instances of different class labels, often have a

certain degree of overlapping in the attribute space. For a supervised ART network, for instance Fuzzy ARTMAP (FAM), the presentation of similar data samples of different class labels to the network, for which they have no distinguishing regularities in the attribute space, will be recognized as a set of distinctive yet conflicting prototypes. Hence, if such conflicting prototypes remain unsettled, it could lead to undesirable effects to the generalization capability of the network as well as to the interpretability of the rules that are to be extracted thereafter.

Therefore, the main focus of this paper is to examine the fusion of a conflict-resolving algorithm into the learning process of supervised ART networks. Specifically, the FAM network and the Dynamic Decay Algorithm (DDA) [9] are combined to form an adaptive conflict-resolving classifer. Due to the constraint of space, the dynamics of FAM and the DDA will not be covered in detail. Interested readers can refer to references [1-8]. The remaining sections of this paper are organized as follows. The FAM, DDA, and FAM-DDA models are described in section 2. The effectiveness of the FAM-DDA is evaluated using two benchmark problems in section 3. Summary of the work is presented in section 4.

Fig.1 The architecture of Fuzzy ARTMAP 2 The FAM-DDA Network 2.1 Fuzzy ARTMAP As shown in Fig. 1, FAM consists of a pair of Fuzzy ART [5] modules, ARTa and ARTb, interconnected by a map field. Each ART module comprises three layers of nodes: )( 00

ba FF is the normalization layer in which an −M dimensional input vector, a , is complement-coded [6] to a −M2 dimensional vector, ( ) ( )MM

c aaaa −−≡= 1,...,1,,...,, 11aaA ;

)( 11ba FF is the input layer which receives the

complement-coded input vectors; )( 22ba FF is the

recognition layer which is a dynamic layer that encodes prototypes of input patterns and allows the creation of new nodes when necessary.

At ARTa, input A is propagated through aF1 to aF2 . Each neuron j in aF2 is activated according to

a choice function [6]

aj

aj

jTw

wA

+

∧=

α (1)

where 0→α is the choice parameter; ajw the

weight of node j . Using a winner-take-all competition scheme, the neuron J with the largest activation is selected as the winning node. The key feature of FAM is the vigilance test which measures the similarity between the winning prototype patterns, a

Jw , and A against a threshold (vigilance parameter, aρ ) [6], i.e.

a

aJ

ρ≥∧

A

wA (2)

When the match criterion is satisfied, learning takes place in accordance with [6]

( ) ( ) )()()( 1 oldaJ

oldaJ

newaJ wwAw ββ −+∧= (3)

where [ ]1,0∈β is the learning rate. When the match criterion is not satisfied, a new node is created, and the input is coded as its prototype pattern. As a result, the number of nodes grows with time, subject to the novelty criterion, in an attempt to learn a good network configuration autonomously and on-line. As different tasks demand different network structures, this learning approach thus avoids the need to specify a pre-defined static network size, or to re-train the network off-line.

During supervised learning, ARTa and ARTb, respectively, receive a stream of input and target patterns. FAM does not directly associate input patterns at ARTa to target patterns at ARTb. Rather, input patterns are first classified into prototypical category clusters before being linked with their target outputs via a map field. At each input pattern presentation, this map field establishes a link from the winning category prototype in aF2 to the target output in bF2 . This association is used, during testing, to recall a prediction when an input pattern is presented to ARTa. 2.2 The Dynamic Decay Algorithm The DDA, on the other hand, is a constructive algorithm that comprises three steps:

ρa

ρab

ρb

Map Field abF

abx abjw

ARTa ARTb

aF2

2bF

aF1 bF1

aF0 bF0

ay by

ax bx

),( caaA = ),( cbbB =

a b

reset reset

Match Tracking

ajw b

kw

(a) Covered. If a new pattern is correctly classified by an already existing prototype, it initiates regional expansion of the winning prototype in the attribute space.

(b) Commit. If a node of the correct class does not cover a new pattern, a new hidden node will be introduced, and the new pattern is recruited as the reference vector.

(c) Shrink. If a new pattern is incorrectly classified by an already existing prototype of conflicting classes, the width of the prototype will be reduced (i.e. shrunk) so as to overcome the conflict. There are several similarities between FAM

and Steps (a) and (b) of the DDA, which motivate the fusion of the DDA into the FAM framework. First, as in Step (a), learning for both FAM and the DDA incurs region expansion. In FAM, the learning process is deemed as the expansion of the winner node (hyperbox) towards the pattern. Second, as in Step (b), both FAM and the DDA perform incremental learning whereby new nodes are introduced when necessary. A new node will be committed to include the new input pattern if none of the existing nodes could classify it correctly. Furthermore, when FAM is in the fast learning mode, the weights of the newly committed node will be the same as the new input pattern. 2.3 The Hybrid FAM-DDA Network In the proposed FAM-DDA model, a number of modifications are necessary in order to implement the shrinking procedure effectively. A center estimation procedure [10] is used whereby a new set of center weight vector, ca

j−w , referred to as the

reference vector, is introduced to each prototype in aF2 . Each center weight vector is initialized as

( ) ( )0.0,,0.0,, 21 LL =≡ −−− caMj

caj

caj a

www (4) When learning takes place, the weight vector of the J-th F a

2 winning node is updated according to

( ) ( ) ( )

−+= −−− oldca

JabJ

oldcaJ

newcaJ wA

www 1 (5)

and

1old

+= abJ

abJ ww (6)

The procedure of the proposed FAM-DDA is as follows; The FAM learning algorithm is first employed to find the winning node, a

Jw . The reference vector of prototype J is updated according to equations (5) and (6). After the learning phase of

FAM, a shrinking procedure ensues. Note that the shrinking procedure is applied successively to the current weights against the weights of the respective dimension of conflicting prototypes. If the current point, which represents the current weights of the

−M dimension (or −M2 dimension with complement coding) of the attribute space falls in the region formed by the prototype of conflicting classes, the width of the conflicting prototypical region is shrunk.

( )

( )

{ }

ENDPROCENDIF

ELSE

THENELSEIF

THEN

Shrink Proc

IF

exist does

}

:,1:min{

max

}

:,1:min{ :cases for three new ute //comp

,

min,

max,l

min,max,

,

min,

max,

min,

,

mm

l

ml

kbestk

best

j

jjj

m

mmm

mmm

lll

kkj

jjj

k

kkk

kkkbest

caj

aj

xrxr

mjnjxr

xr

xrxr

kjnjxr

r

Λ=Λ

Λ=Λ

Λ≥Λ

Λ=ΛΛ

Λ

−−Λ≤

Λ−−Λ

≠≤≤∀−=Λ

−=Λ

≥Λ∧

Λ

−−Λ≤

Λ−−Λ

≠≤≤∀−=Λ

Λ

== −

ε

wwx

Fig.2 The FAM-DDA shrinking procedure. As shown in Fig. 2, three cases of width shrinking have been proposed to overcome the conflict. First, if the dimensions of the conflicting point can be shrunk without falling below a preset minimum width, minε , the one with the smallest loss in volume )( ,kbestΛ will be chosen, as follows;

{ }

min

, :,1:min

ε≥Λ∧

Λ

−−Λ≤

Λ

−−Λ

≠≤≤∀−=Λ

−−

−

kj

aj

cajj

k

ak

cakk

ak

cakkbest

wwww

kjnjww

(7)

Second, if the above is not the case, either one of the remaining dimensions will be shrunk )( max,lΛ , i.e.,

{ }al

call ww −=Λ −maxmax, (8)

Third, if the procedure results in a new width smaller than minε , one of dimensions )( min,mΛ will be shrunk below minε ,

{ }

Λ

−−Λ≤

Λ

−−Λ

≠≤≤∀−=Λ

−−

−

j

aj

cajj

m

am

camm

am

camm

wwww

mjnjww :,1:minmin,

(9)

The conflict-free associations learned by the FAM-DDA are used, during testing, to recall a prediction when an unseen pattern is presented to ARTa. 3 Experiments In this section, the classification performance of the proposed FAM-DDA network is assessed using two benchmark databases from the UCI machine learning repository [11]. The selected benchmark databases are Ionosphere and Hayes-Roth. The characteristics of the databases are summarized in Table 1. Table 1. Characteristics of Ionosphere and Hayes-Roth databases (#S - number of samples; #A - number of input attribute; #C - number of class; #Tr - number of training samples; #Te - number of test samples).

# Database #S #A #C #Tr #Te 1 Ionosphere 351 34 2 281 70 2 Hayes-Roth 160 4 3 132 28

The experimental procedure is as follows; A

training test and a test set were formed by randomly selecting the data samples with a ratio of 80:20 (as shown in Table 1). The same procedure has been employed in [12]. The training set was presented to the FAM-DDA network for learning. The training session was fixed at 10 epochs, and with fast learning rate, 1=β , minimum width, 1.0min =ε , and ARTa baseline vigilance, 0.0=aρ . The performance of the learned network was evaluated by measuring the percentage of correct predictions that the network has made in response to data samples in the test set. Five experimental runs were conducted and the classification results averaged.

Table 2 summarizes the average test accuracy of the FAM-DDA network for the Ionosphere and Hayes-Roth databases. It is observed that without fine-tuning aρ , the FAM-DDA network is able to achieve 90.32% and 88.02% accuracy rates. To ascertain the stability of the performance, the upper and lower limits of the test accuracy rates were estimated at 95% confidence interval using the bootstrap method [13]. Table 3 shows the comparison of the performance of the FAM-DDA with those from other approaches (i.e., C4.5, C4.5

rules, ITI, LMDT, CN2, LVQ, OC1, Nevprop, K5, Q*, RBF and SNNS). The results were quoted from [12] in which each approach used five different sets of training and test samples at ratio 80:20. The average classification results of these standard networks and algorithms were derived using default/no parameter settings from the user. Based on the results in Table 3, the FAM-DDA model ranked number five and number one, respectively, for the Ionosphere and Hayes-Roth results. Statistically, the accuracy rate achieved from Hayes-Roth classification is far better than those from other methods. Thus, the overall performance of the FAM-DDA model is very satisfactory and encouraging. Table 2 Classification results of FAM-DDA.

Test Accuracy (%) Database Lower Limit Mean Upper Limit Ionosphere 87.71 90.32 93.43 Hayes-Roth 85.71 88.02 91.40

Table 3 Comparison of classification results of Ionosphere and Hayes-Roth datasets between FAM-DDA and other methods from [12].

Accuracy (%) Method Ionosphere Hayes-Roth

Mean S.D. Mean S.D. C4.5 91.56 2.82 70.03 7.69 C4.5 rules 91.82 2.58 75.95 7.24 ITI 93.65 2.71 78.63 5.96 LMDT 86.89 3.51 78.94 8.69 CN2 90.98 3.29 75.21 7.20 LVQ 88.58 3.36 52.26 9.27 OC1 88.29 2.21 63.73 12.94 Nevprop 83.80 3.81 78.91 11.84 K5 85.91 4.14 55.21 10.85 Q* 89.70 4.70 74.84 8.69 RBF 87.60 6.45 70.03 8.62 SNNS - 79.65 6.10 FAM-DDA 90.32 - 88.02 -

4 Summary In this paper, a novel adaptive conflict-resolving classifier based on FAM and the DDA is described. The proposed FAM-DDA network inherits the advantages of its predecessors, i.e., incremental and stable learning and an ability to deal with overlapping prototypes of different classes. The FAM-DDA network is applied to two benchmark databases from the UCI machine learning repository. The classification results show that the performance

of the proposed conflict-resolving FAM-DDA network is encouraging and promising. Statistical analysis using bootstrapping has also been applied to ascertain the stability of the performance. Further work will focus on a more comprehensive comparison study on the effectiveness of the FAM-DDA with other machine learning methods. The interpretability of the embedded knowledge within the FAM-DDA network by introducing a suitable rule extraction procedure will also be investigated. Acknowledgements: The authors gratefully acknowledge the research grant provided by Multimedia University (PR/2004/0388) that has in part resulted in this article. References: [1] Grossberg, S., Adaptive Pattern Recognition and

Universal Recoding II: Feedback, Expectation, Olfaction, and Illusions, Biological Cybernectics, Vol. 23, 1976, pp.187-202.

[2] Carpenter, G., and Grossberg, S., A Massively Parallel Architecture for a Self-Organizing Neural Pattern Recognition Machine, Computer Vision, Graphics and Image Processing, Vol. 37, 1987, pp. 54-115.

[3] Carpenter, G., and Grossberg, S., The ART of Adaptive Pattern Recognition by a Self-Organizing Neural Network, Computer, Vol. 21, No. 3, 1988, pp. 77-88.

[4] Carpenter, G., Grossberg, S., and Reynolds, J., ARTMAP: Supervised Real-Learning and Classification of Nonstationary Data by a Self-Organizing Neural Network, Neural Networks, Vol. 4, No. 5, 1991, pp. 565-588.

[5] Carpenter, G., Grossberg, S., and Rosen, D., Fuzzy ART: Fast Stable Learning and Categorization of Analog Patterns by an Adaptive Resonance System, Neural Networks, Vol. 4, No. 1, 1991, pp. 759-771.

[6] Carpenter, G., Grossberg, S., Markuzon, N., Reynolds, J., and Rosen, D., Fuzzy ARTMAP: A Neural Network Architecture for Incremental Learning of Analog Multidimensional Maps, IEEE Transactions on Neural Networks, Vol. 3, No. 5, 1992, pp. 698-713.

[7] Carpenter, G., and Ross, W., ART-EMAP: A Neural Network Architecture for Object Recognition by Evidence Accumulation, IEEE Transactions on Neural Networks, Vol. 6, No. 4, 1995, pp. 805-818.

[8] Carpenter, G., Distributed Learning, Recognition, and Prediction by ART and ARTMAP Neural Networks, Neural Networks, Vol. 10, No. 8, 1997, pp. 1473-1494.

[9] Berthold, M.R., and Diamond, J., Boosting the performance of RBF networks with Dynamic Decay Adjustment. In Tesauro, G., Touretzky, D.S., and Leen, T.K., editors, Advances in Neural Information Processing Systems, Vol. 7, Cambridge, MA, MIT Press, 1995.

[10] Lim, C.P., Harrison, R.F., An Incremental Adaptive Network for On-Line Supervised Learning and Probability Estimation, Neural Networks, Vol. 10, 1997, pp. 925-939.

[11] Blake, C., Merz, C., UCI Repository of Machine Learning Databases, URL http://www.ics.uci.edu/~mlearn/MLRepository.html, 1998.

[12] Eklund, P.W. and Hoang A., A Performance Survey of Public Domain Supervised Machine Learning Algorithms, technical report, The University of Queensland, Australia, 2002.

[13] Efron, B., Bootstrap Methods: Another Look at the Jackknife, The Annals of Statistics, Vol. 7, 1979, pp. 1-26.

An Adaptive Conflict-Resolving Neural-Fuzzy Classifier€¦ · · 2006-09-30An Adaptive...

Documents

Transcript of An Adaptive Conflict-Resolving Neural-Fuzzy Classifier€¦ · · 2006-09-30An Adaptive...