Lecture Notes in Computer Science
Download 12.42 Mb. Pdf ko'rish
|
a,a
c )
x ρ
T j F 0 F 1 F 2 D J U j , D j OS AS y Fig. 1. Fuzzy ART 1
b 1 i b 2 n b 1 i b 2 n b
x b ρ b B T k b F 0 b F 1 b F 2 b
K b 1 j J m a 1 i a 2 n a 1 i a 2 n a (a,a c )
x a ρ a A T j a F 0 a F 1 a F 2 a
J a
j a
j a
k b
k b r ab 1
K m b
ab
b MT y b
J ab
j ab ART a ART
b MF F ab (b,b c )
until the input a changes. The above processes are iterated until ART finds an F 2 node with S J ≥ ρ. If all F 2 nodes are reset in a learning period, a new node is added to F 2 . Therefore, ρ can be regarded as the fineness of classification. Next, we explain the behavior of FAM. As shown in Fig.2, FAM consists of a learning ART (ART a ), a supervising ART (ART b ) and a map field (MF). In the learning period, ART a receives a sample a ∈ [0, 1] n a which may contain noise and ART b receives the corresponding recognition code b ∈ [0, 1] n b . The vigilance parameter of ART a (i.e., ρ
a ) is set to the baseline value ρ a0 , whenever ART a receives a new sample. If the category of ART a is designated by F a 2
J , ART a provides F ab in MF with W ab J
m b which is the same as the map field weight vector. If the category of ART b is designated by F b 2 node K, ART b 302 T. Kamio et al. provides F ab with y b ∈ m b which satisfies y b K
b k=K
= 0. After receiving W ab J
b , F
ab checks the mapping from ART a to ART
b by x ab ≡ y
b ∧ W
ab J ≥ ρ ab y b , (3)
where x ab is F ab activity and ρ ab ∈ [0, 1] is the vigilance parameter of MF. If Eq.(3) is true, MF judges that the mapping is correct. In the case of an erroneous mapping, MF executes the match tracking (MT). MT resets F a 2
J by increasing ρ a as follows: ρ a = S a J + , (4) where S
a J is the matching degree of F a 2 node J and is an arbitrary small posi- tive value. Therefore, ART a selects another node or generates a new node after MT. The above processes are iterated until Eq.(3) is satisfied. This means that MT can correct an erroneous mapping. When the mapping is deemed correct, AL-SLMAP [2] updates weight vectors as follows: D a(new) J = (1 + c
J ) −1 A + (1 − (1 + c
J ) −1 )D a(old)
J , U a(new) J = β a (A ∧ D a(new) J ) + (1 − β a )U a(old) J , (5) D b(new) K = β
b (B ∧ D b(old) K ) + (1 − β b )D b(old) K , U b(new)
K = β
b (B ∧ U b(old) K ) + (1 − β b )U b(old) K , (6) W ab(new) J = β
ab (y b ∧ W ab(old)
J ) + (1
− β ab )W ab(old) J , (7) where c
J is the count of selections of F a 2
a , β
b and β
ab ∈ (0, 1] are learning rates. Note that β b must be set to 1, whenever F b 2 node K learns for the first time. All the components of weight vectors are set to 1 before the first update. Eqs.(5), (6), and (7) are based on AL [2], FCSR [1] and SLMAP[3] respectively. The reason why FCSR updates D b K
b K is that FCSR can optimally learn recognition codes, which are noiseless data. After the supervised learning is finished, ART a can be used as a recognition system. This is because W ab j gives a mapping from ART a to ART b . However, if W ab
has more than one component which satisfies W ab jk ≥ ρ ab , F a 2 node j must be deleted before using ART a as a recognition system. 2.2 Characteristics First, we discuss the characteristics of weight vectors created by AL-SLMAP (i.e., Eqs.(5)-(7)). D a j
into F a 2 node j. U a j approximates the distribution of samples. It is expressed by a hyper rectangle in the input space. D b k
b k correspond to the immediate k-th kind of recognition code b k . Recognition codes b k denote noiseless data. Although the initialized W ab j relates F a 2 node j to all the F b 2 nodes, W ab j relates F a 2 node j to only one F b 2
Fuzzy ARTMAP with Explicit and Implicit Weights 303
Next, let us consider the result of Ref.[2]. Ref.[2] shows that AL-SLMAP can inhibit the category proliferation for the character recognition problem which consists of noisy samples and noiseless recognition codes. This result must be obtained by satisfying the following conditions: (a) Since Eq.(7) makes W ab j learn slowly, the occurrence of MT is reduced. As a result, F a 2
(b) When W ab j has just related F a 2 node j to only F b 2 node k, most of the samples learned by F a 2
k . (c) Since Eq.(5) averages samples, the influence of noise is eliminated from D a j and U a j . (d) Since D a j and U a j become a good category, unnecessary MT can hardly occur.
However, we have found two disappointing facts about AL-SLMAP. One is a fact that AL-SLMAP cannot inhibit the category proliferation for the charac- ter recognition problem in a highly noisy environment [4]. The other is a fact that AL-SLMAP is less suitable for the region classification problem than FCSR. From these facts, we have noticed three problems. The first problem is as follows. When F
a 2 node j learns a certain set of samples corresponding to the identical recognition code, the presentation order of the samples influences U a j . The sec- ond problem is that the elimination of noise from D a j
a j is incomplete because the condition (b) is not always satisfied. The third problem is a poten- tial drawback of MT [4]. To solve these problems we change the choice strength for ART a
using implicit weights. 3 FAM with Explicit and Implicit Weights 3.1 Choice Strength for ART a As the distance between a sample and a category becomes smaller and the size of the category grows larger, the choice strength by Eq.(1) becomes larger. However, we think that Eq.(1) is unsuitable for the choice strength for ART a . To verify our opinion we made F a 2 node j learn a certain set of samples corresponding to the identical recognition code. In each experiment, the samples were given in different order. Simulation results show the following. D a j was always the same vector. Although U a j
a j was small. From these results and the characteristics of Eq.(1) mentioned above, we have judged that U a j
distance between a sample and a category should be estimated by not A ∧ U
a j but A ∧ D a j . In this case the bottom-up weight should be defined by not a vector U
a j but a scalar U a j because only U a j is needed. Therefore, we change the definition of the choice strength for ART a as follows: T a j = A ∧ D
a j /(α + U a j ), (j = 1, · · · , m a ).
304 T. Kamio et al. W ab
W ab
W ab
W ab
F 1
F ab
j1 g jk g jK g jmb u a
d a
u a
d a
u a
d a
u a
d a
U a
, D a
Fig. 3. F a 2 node j of our proposed FAM 3.2
Explicit and Implicit Weights When W
ab j has just related F a 2 node j only to F b 2 node k, D a j and U a j should be created by the samples corresponding only to b k . To achieve this request, we propose FAM with explicit and implicit weights. As shown in Fig.3, F a 2 node j has a weight set (d a jk
a jk ) for each F b 2 node k. D a j and U a j is calculated by D a j = m b k=0 g jk n jk d a jk m b k=0
(g jk n jk ) ,
(9) U a j = m b k=0
g jk n jk u a jk m b k=0 (g jk n jk ) , (10) where n
jk is the count of updates of (d a jk
a jk ). All the components of d a jk are set to 1 and u a jk is set to 2n a before the first update. Also, g jk is given by g jk
1, if W ab jk ≥ ρ ab 0, otherwise . (11)
Eqs.(9)-(11) show that (D a j , U a j ) is defined only by weight sets (d a jk , u a jk ) with g jk = 1. Therefore, we call a weight set (d a jk , u a jk ) with g jk = 1 an explicit weight and a weight set (d a jk , u a jk ) with g jk = 0 an implicit weight. If MF judges that the mapping from F a 2 node j to F b 2 node k is correct, (d a jk , u a jk ) is updated by d a(new) jk = n
−1 jk A + (1 − n −1 jk )d a(old)
jk , (12) u a(new)
jk = n
−1 jk d a(new) jk ∧ A + (1 − n −1 jk )u a(old) jk . (13) Fuzzy ARTMAP with Explicit and Implicit Weights 305
Eqs.(12) and (13) mean that (d a jk , u a jk ) is updated by the samples corresponding only to b k . That is to say, when W ab j has just related F a 2 node j only to F b 2 node k, D a j and U a j are created by the samples corresponding only to b k . When ART a is used as a recognition system, F a 2 node j must be deleted if W ab j has more than one component which satisfies W ab jk ≥ ρ ab . Furthermore, all the implicit weights should be deleted from the viewpoint of resource costs. 3.3
Match Tracking After the original MT resets the activated F a 2
a , the erro- neous mapping from ART a to ART b is corrected. As a result, even if there are F a
nodes besides the reset ones, a new node may be forcibly generated. However, there is the possibility that the erroneous mapping is corrected by the restricted MT which resets activated F a 2 nodes without increasing ρ a . These facts illustrate that the original MT may needlessly generate F a 2 nodes. To solve this problem we propose the modified MT using implicit weights. From now on, we call the original MT “MT up ” and the restricted MT “MT fix ”. The modified MT is the combination of MT fix and the forcible node gener- ation. The former is used to inhibit the increment of F a 2 nodes. The latter is used to correct erroneous mappings which cannot be solved by MT fix . MT
fix is the same as MT up except using ρ a = ρ
a0 instead of Eq.(4). The forcible node generation is executed as follows. It is assumed that F a 2
b 2 node k are activated when the first erroneous mapping happens for t-th input set (a, b). At this point, an implicit weight (d a jk
a jk ) is updated by Eqs.(12) and (13), and then this update increases z jk by 1. That is to say, z jk counts the updates of the weight set (d a jk
a jk ) after it becomes an implicit weight. Next, the occurrence rate of erroneous mappings L(t) is calculated by L(t) = r/P
R , if t
≥ P R 1, otherwise , (14) where r is the number of input sets which give rise to erroneous mappings in the period [t − P R
L input
sets. If L(t) − L(t − P L ) > 0, the modified MT judges that there are erroneous mappings which cannot be solved by MT fix . In this case, the following equation is checked for F a 2 nodes with only one explicit weight: Z j /τ ≥ χ,
(15) where Z
j is the maximal z jk of implicit weights in F a 2 node j, τ is the summation of z jk of all the implicit weights in F a 2 layer, and χ ∈ (0, 1] is the standard for the forcible node generation. If F a
node J has the implicit weight (d a J K , u a J K ) satisfying Eq.(15), then a new F
a 2 node J is generated as follows: (d a J k , u a J k , n J k
, W ab J k , z J k
) = (d a J K , u
a J K
, 1, 1, 0), if k = K (1, 2n
a , 0, 1, 0), otherwise . (16) 306 T. Kamio et al. However, if the same F a 2 node J has satisfied Eq.(15) at the last check of Eq.(15), the node J is modified instead of generating a new F a 2
(d a J k , u a J k , n J k
, W ab J k , z J k
) = (d a J k , n
a , 1, 1, 0), if g J k = 1
(1, 2n a , 0, 1, 0), otherwise . (17)
This is because such F a 2 node J may provide the categories around it with bad influences. After completing these processes, the modified MT executes MT fix .
executes only MT fix . 4 Simulation Results Simulations have been carried out to demonstrate the effectiveness of our pro- posed method (PM). For the alphabet character recognition problem, PM is compared with FCSR [1] and AL-SLMAP [2]. The main difference of FCSR and AL-SLMAP is the weight update method for D a J
a J , and W ab J . In the case of FCSR, D a J and U a J are updated by Eq.(6). However, (A, D a J , U a J , β a ) must be given to Eq.(6) instead of (B, D b K , U b K , β b ). Also, W ab J is updated by Eq.(7) with β ab = 1. Fig. 4. Alphabet characters The original patterns of the alphabet characters are shown in Fig.4. Each pattern is illustrated by a (7 ×7)-pixel image. The pixel values are set to 0 for white pixels and 1 for black ones. In the learning period, ART a receives noisy patterns (i.e., sample data) a ∈ 7 ×7 and ART
b receives the corresponding recognition codes b ∈ 26 . A noisy pattern a is constructed by inverting some pixels in an original pattern selected randomly. The number of inverted pixels depends on Hamming distance (HD). In a recognition code b, one element is set to 1 and the others are set to 0. For instance, the code b corresponding to the character “A” is [1, 0, · · · , 0]. The quantity of learning data N L is 20000. In the test period, ART a receives noisy patterns (i.e., test data) a. The quantity of test data N T is 50000. We estimate each learning method by the learning time T L (sec.), the number of generated F a 2 nodes m a , and the recognition rate for test data R T . The parameters of each learning method are as follows. In the case of FCSR, α a = 0.1, β a = 0.2, ρ
a0 = 0.5, α
b = 1, β
b = 1, ρ
b = 1, β
ab = 1,
and ρ ab = 1. In the case of AL-SLMAP, α a = 0.1, β
a = 0.2, ρ
a0 = 0.5, α
b = 1,
β b = 1, ρ b = 1, β
ab = 0.02, and ρ ab = 0.75. They are the same as Ref.[2]. In the Fuzzy ARTMAP with Explicit and Implicit Weights 307
0 10 20 0 100
AL-SLMAP PM FCSR HD T L (sec.)
AL-SLMAP PM FCSR HD m a 0 10 20 0 1000 2000
AL-SLMAP PM FCSR HD R T 0 10 20 0.4
0.6 0.8
1 (a) The learning time. (b) The number of F 2 a nodes. (c) The recognition rate for test data. Fig. 5. Simulation results for the alphabet character recognition problem
308 T. Kamio et al. case of PM, α a = 0.1, ρ a0 = 0.5, α
b = 1, β
b = 1, ρ
b = 1, β
ab = 0.02, ρ ab = 0.75,
P R = 1000, P L = 800, and χ = 0.08. Fig.5 illustrates T L , m a , and R
T . Fig.5(a) and Fig.5(b) show that T L of PM
is the largest of the three methods and each method has the same m a when HD = 0. This is a result predicted before executing simulations because PM needs the highest calculation cost per F a 2
large, PM keeps the constant m a . But m a in other learning methods increases rapidly. As a result, PM can finish the learning much faster than the others when HD is large. Moreover, Fig.5(b) and Fig.5(c) show that PM has better m a
T . As HD becomes larger, this tendency becomes stronger. Therefore, we have concluded that PM can inhibit the category proliferation and keep high recognition performance in a highly noisy environment. 5 Conclusions AL-SLMAP is one of useful learning methods for fuzzy ARTMAP (FAM) from the viewpoint of simplicity and performance. However, AL-SLMAP has prob- lems in category selection, weight update, and match tracking. To solve these problems, we have proposed FAM with explicit and implicit weights. Simulation results have shown that our proposed method (PM) is better than FCSR and AL-SLMAP in terms of category proliferation and recognition performance when the sample data contains a large amount of noise. In the future, we will try to reduce the learning calculation cost of our FAM. Moreover, we have to compare PM with other learning methods and apply PM to more practical problems. References 1. Carpenter, G.A., Grossberg, S., Markuzon, N., Reynolds, J.H., Rosen, D.B.: Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans. Neural Networks 3(5), 698–713 (1992) 2. Lee, J.S., Yoon, C.G., Lee, C.W.: Improvement of recognition performance for the fuzzy ARTMAP using average learning and slow learning. IEICE Trans. Fundamen- tals E81-A(3), 514–516 (1998) 3. Carpenter, G.A., Grossberg, S., Reynolds, J.H.: A fuzzy ARTMAP nonparametric probability estimator for nonstationary pattern recognition problems. IEEE Trans. Neural Networks 6(6), 1330–1336 (1995) 4. Kamio, T., Nomura, K., Mori, K., Fujisaka, H., Haeiwa, K.: Improvement of Fuzzy ARTMAP by Controlling Match Tracking. In: Proc. International Symposium on Nonlinear Theory and its Applications, pp. 791–794 (2006) |
ma'muriyatiga murojaat qiling