Lecture Notes in Computer Science
Fig. 2. The block diagram of the error diffusion algorithm Fig. 3
Download 12.42 Mb. Pdf ko'rish
|
- Bu sahifa navigatsiya:
- 3 Performance
- 4 Summary and Discussion
- Acknowledgment
Fig. 2. The block diagram of the error diffusion algorithm
Fig. 3. The Floyd-Steinburg kernel
Next, in order to estimate the performance of our method for a standard image, we use the mean square error as ( ) 2 1 , , , 2 ˆ 1 ∑ − = = L y x y x y x z NQ ξ σ (10)
where ξ x, y and
, ˆ
z are the pixel values of the original gray-level and reconstructed images. On the other hand, if we estimate the performance of gray-level images gen- erated by the true prior P({ξ}), we evaluate the mean square error which is averaged over the true prior as ( ) (
. ˆ 1 } { 2 1 , , , 2 } { ∑ ∑ = − = L y x y x y x z NQ P ξ ξ σ ξ
(11) This value becomes zero, if each pixel value of all reconstructed images is completely same with that of the corresponding original images. As we have said in above, because the edge structures embedded in the halftone image is considered to influence on the performance of inverse-halftoning, we esti- mate the edge length appearing both in the halftone and reconstructed images as
668 Y. Saika
( ) ( )( ) ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ − − = ∑ ∑ ' ,' , n.n.
' ,' , } { GH edge , 1 , 1 } { y x y x y x y x z z P L δ τ δτ ξ ξ (12)
which is averaged all over the snapshots of the Q-Ising model. We also estimate the gradient of the gray-level on the edges appearing both in the halftone and recon- structed images as ( )
( ) ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ − − = ∑ ∑ n.n. ' ,' , ' ,' , } { GH , | | , 1 } { y x y x y x y x y x z z P δz τ δτ ξ ξ
(13) which is averaged over the snapshots of the Q-Ising model. Next in order to clarify how the edge structures of the original image is observed in the reconstructed image, we estimate the edge length on the edges in the original, halftone and reconstructed images: ( )
( )( )( ) ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ − − − = ∑ ∑ ' ,' , ' ,' , n.n. ' ,' , } { GHO , 1 , 1 , 1 } {
x y x y x y x y x y x edge z z P L δ τ δτ ξ δξ ξ ξ
(14) which is averaged over the original images. Also we estimate the edge length on the edges in the original, halftone and reconstructed images: ( )
( )( ) ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ − − − = ∑ ∑ n.n.
' ,' , ' ,' , ' ,' , } { GHO , | | , 1 , 1 } { y x y x y x y x y x y x y x z z P δz τ δτ ξ δξ ξ ξ
(15) which is averaged over the original images. 3 Performance In order to estimate the performance of the MPM estimate, we carry out the Monte Carlo simulation both the set of gray-level images generated by the Boltzmann factor of the Q-Ising model and the gray-level standard image. First, we estimate the performance of the MPM estimate for the set of the snap- shots of the Q=4 Ising model as shown in Fig. 1 (a). Then the halftone image shown in Fig. 1(d) is obtained by the error diffusion method using the Floyd-Steinburg ker- nel. When we estimate the performance of the MPM estimate, we use the mean square error which is averaged over the 10 samples of the snapshots of the Q=4 Ising model with h s =1 and T s =1. Now we investigate static properties of the MPM estimate based on the mean square error and the edge structures observed both in the original, halftone and reconstructed images. First we confirm the static properties of the MPM estimate when the posterior probability has same form with the likelihood at J=0 or with the model prior in the limit of J→∞. At J=0, as the likelihood is assumed to enhance the halftone image, the MPM estimate reconstructs the gray-level image which is almost same with the half- tone image. Therefore the reconstructed image at J=0 has the edge length which is
Inverse-Halftoning for Error Diffusion 669
image which is obtained from the set of the snapshots of the Q=4 Ising model by the error diffusion method using the Floyd-Steinburg kernel when h s =1, T s =1, h=1, T=0.1
using the Q-Ising model for the set of the snapshots of the 4-Ising model when h s =1, T s =1, h=1, T=1
longer than the original image, as the halftone image is expressed by the dot pattern which is visually similar to the original image. For instance, the edge length averaged over the set of the Q=4 Ising model and the halftone images obtained by the error diffusion using the Floyd-Steinburg kernel are 6021.3 and 12049.8. On the other hand, in the limit of J→∞, as the model prior is assumed to enhance smooth struc- tures, the MPM estimate reconstructs the flat pattern.
670 Y. Saika
Fig. 6. The mean square error as a function of J obtained by the MPM estimate for error diffu- sion for Q=255 standard image “girl” when h=1, T=1
the MPM estimate for Q=256 standard image “girl” when h=1, T=1 Then we investigate the performance of the MPM estimate when the posterior probability is composed both of the model prior and the likelihood. In Fig. 4 we show the mean square error as a function of J for the snapshots of the Q=4 Ising model. The figure indicates that the mean square error takes its minimum at J=0.875. This means that the MPM estimate reconstructs the gray-level image by suppressing the gradient of the gray-level of the edges embedded in the halftone image by the Boltzmann fac- tor of the Q-Ising model. Also the figure indicates that the mean square error rapidly decreases as we increase J from 0.025 to 0.3. The origin of the rapid drop of the mean square error can be explained by the edge structures of the reconstructed image, such Inverse-Halftoning for Error Diffusion 671 as the edge length and the gradient of the gray-level as GH edge L , GHO edge L , | | GH edge δz and | | GHO edge
δz
which are shown in (12)-(15) in the following. Now we evaluate the edge structures of the reconstructed images, such as the edge length and the gradient of the gray-level. Figure 5 indicates how L GH , L GHO , |δz GH | and
|δz GHO
| depend on the parameter J due to the MPM estimate for the set of the Q=4 Ising model. Figure 5 indicates that L GH is steady in 0<J<0.6, though it gradually decreases with the increase in J from 0.6 and becomes zero in J>1.175. This means that the edge structures of the halftone image survives in J<0.6 and that they are removed by the model prior in J>0.6. On the other hand, |δz GH | decreases with the increase in J from 0.025 to 0.3 and then becomes almost same with the edge length L GH at J=0.3. Then, as shown in Fig. 5, although the L GH is steady in 0<J<0.6, L GH becomes short with the increase in J in from 0.6 and then takes L GH =4096.3 at J=J opt =0.875. Then, as we fur- ther increase J, L edge
becomes zero = 0 at J=1.075 in the end. These results indicate that |δz GH | is rapidly suppressed by the model prior which enhances smooth structures in J<0.6 and that the MPM estimate the edges embedded in the halftone image is re- moved by the model prior in J>0.6. Further we show that the optimal performance is achieved by suppressing the gradient of the gray-level and by removing a part of the edges embedded in the halftone image if we set the parameters appropriately. Also we numerically estimate the dynamical properties of the MPM estimate using the Monte Carlo simulation for the Q=4 Ising model on the square lattice. The simula- tion clarifies that the mean square error smoothly converges to the same value within the statistical uncertainty and however that the convergent value depends on the initial condition if we set J>J opt
. Next, we estimate the performance of the MPM estimate for the 256-level standard image “girl” with 100×100 pixels which is shown in Fig. 1 (d). Then the halftone im- age in Fig.1 (e) is obtained from the original image by the error diffusion method using the Floyd-Steinburg kernel. As shown in Figs. 6 and 7, we estimate the performance in terms of the mean square error, the edge length and the gradient of the gray-level. Figure 6 indicates that the MPM estimate shows optimal performance at J=1.45. Then, as shown in Fig. 7, the Monte Carlo simulation clarifies that the edge length is longer than that of the halftone image and further that the gray-level difference is larger than that of the edge length in the reconstructed image. These indicate that the fine struc- tures introduced into the halftone image remains in the reconstructed image, though the gradient of the gray-level is sharply suppressed with the increase of J in 0<J<1.0. As shown in above, these results indicate the similar behavior to the Q=4 model.
In the previous chapters, we formulate the problem of inverse-halftoning for error diffusion using the Floyd-Steinburg kernel based on the MPM estimate. Then using the Monte Carlo simulation for the snapshots of the Q-Ising model and the standard image, we estimate the performance of our method in terms of the mean square error and the edge structures of the reconstructed image. We clarify that the optimal per- formance of the MPM estimate using the above model prior and the likelihood is available for the problem of inverse-halftoning, if we tune the parameters. This also means that the assumed model prior is available to suppress the fine structures intro- duced into the halftone image and reconstruct the original gray-level image. 672 Y. Saika
Acknowledgment The author was financially supported by Grant-in-Aid Scientific Research on Priority Areas “Deepening and Expansion of Statistical Mechanical Informatics (DEX-SMI)” of the Ministry of Education, Culture, Sports, Science and Technology (MEXT) No.18079001.
1. Besag, J.: Journal of the Royal Statistical Society B 48, 259-302 (1986) 2. Winkler, G.: Image analysis, Random fields and Dynamic Monte Carlo Methods. A Mathematical Introduction. Springer, Berlin (1995) 3. Cressie, N.A.: Statistics for Spatial Data. Wiley, New York (1993) 4. Possio, A. (ed.): Spatial Statistics and Imaging. Institute of Mathematical Statistics, Hay- ward, California. Lecture Notes-monograph Series, vol. 20 (1991) 5. Ogata, Y., Tanemura, M.: Ann. Inst. Stat. Math. B 33, 131 (1981) 6. Nishimori, H.: Statistical Physics of Spin Glasses and Information Processing: An Intro- duction. Oxford University Press, Oxford (2001) 7. Tanaka, K.: Journal of Physics A: Mathematics and General 35, R38-R150 (2002) 8. Pryce, J.M., Bruce, A.D.: Journal of Physics A 28, 511-532 (1995) 9. Sourlas, N.: Nature 339, 693-695 (1989) 10. Sourlas, N.: Europhys. Letters 25, 159–164 (1994) 11. Nishimori, H., Wong, K.Y.M.: Statistical mechanics of image restoration and error- correcting codes. Physical Review E 60, 132–144 (1999) 12. Tanaka, T.: Statistical mechanics of CDMA multiuser demodulation. Europhysics Let- ters 54, 540–546 (2001) 13. Ulichney, R.: Digital Halftoning. The MIT Press, Cambridge, Massachusetts, London, England (1987) 14. Bayer, B.E.: An optimum method for two-level rendition of continuous-tone pictures. In: ICC CONF. RECORD, vol. 26, pp. 11–15 (1973) 15. Ulichney, R.: Dithering with blue noise. Proc. IEEE 76(1), 56–79 (1988) 16. Mitsa, T., Parker, K.J.: Digital halftoning technique using a blue noise mask. Journal of the Optical Society of America A/9(11), 1920–1929 (1992) 17. Yao, M., Parker, K.J.: The blue noise mask and its comparison with error diffusion. SID 94 Digest 37(3), 805–807 (1994) 18. Yao, M., Parker, K.J.: Modified approach to the construction of a blue noise mask. Journal of Electric Imaging 3(1), 92–97 (1994) 19. Miceli, C.M., Parker, K.J.: Inverse halftoning. Journal of Electric Imaging 1, 143–151 (1992) 20. Wong, P.W.: Inverse Halftoning and Kernel Estimation for Error Diffusion. IEEE Trans. Image Processing 4, 486–498 (1995) 21. Saika, Y., Inoue, J.: Probabilistic inference to the problem of inverse-halftoning based on statistical mechanics of spin systems, Probabilistic inference to the problem of inverse- halftoning based on statistical mechanics of spin systems. In: Proc. SICE-ICCAS 2006, pp. 4563–4568 (2006) 22. Saika, Y., Inoue, J.: Probabilistic inference to the problem of invere-halftoning based on the Q-Ising model. In: Proc. IEEE FOCI 2007, pp. 429–433 (2007) 23. Inoue, J., Saika, Y., Okada, M.: accepted for the 7th International Conference on Intelli- gent the System Design and Applications (ISDA 2007) (2007) Chaotic Motif Sampler for Motif Discovery Using Statistical Values of Spike Time-Series Takafumi Matsuura and Tohru Ikeguchi Graduate School of Science and Engineering, Saitama University, 225 Shimo-Ohkubo, Sakura-ku, Saitama-city 338-8570, Japan takafumi@nls.ics.saitama-u.ac.jp Abstract. One of the most important issues in bioinformatics is to dis- cover a common and conserved pattern, which is called a motif, from bi- ological sequences. We have already proposed a motif extraction method called Chaotic Motif Sampler (CMS) by using chaotic dynamics. Explor- ing a searching space with avoiding undesirable local minima, the CMS discovers the motifs very effectively. During a searching process, chaotic neurons generate very complicated spike time-series. In the present pa- per, we analyzed the complexity of the spike time-series observed from each chaotic neuron by using a statistical measure, such as a coefficient of variation and a local variation of interspike intervals, which are frequently used in the field of neuroscience. As a result, if a motif is embedded in a sequence, corresponding spike time-series show characteristic behavior. If we use these characteristics, multiple motifs can be identified. 1 Introduction The Human Genome Project was completed in April 2003. The project gener- ated a large amount of genomic sequence data sets. The human genomes consist of approximately three billion base pairs and approximately 25,000 genes. Then, one of the primary issues in the field of bioinformatics is how to identify biolog- ically important parts. It is generally considered that the biologically important parts repeatedly appear in a biological sequence, for example, DNA, RNA, and protein sequences. Therefore, one of the most popular analyses involves extract- ing a common and conserved pattern, which is often called a motif. The problem of extracting motifs (motif extracting problem, MEP) from bi- ological sequences can be mathematically described as follows: We have a data set S = {s
, s 2 , ..., s N }, where s i is the ith biological sequence (Fig. 1). The ith sequence consists of m i (i = 1, 2, ..., N ) elements. In the case of DNA or RNA, the elements correspond to the four bases, while in case of protein sequences they correspond to 20 amino acids. The most basic case usually includes exactly one motif whose length is L in each sequence. However, real life problems have several variations; the number of embedded motifs in each sequence can be more than one or random (including zero). To extract motifs from a biological sequence, one of the simple and natural methods is a brute force method, or an enumeration method, which enumerates M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 673–682, 2008. c Springer-Verlag Berlin Heidelberg 2008
674 T. Matsuura and T. Ikeguchi m i
s 1 ATGG CTGAGTA AG · · · · · · ACTGAGTCAGTT s 2 CTATCCAGATT · · · · · · ATAGCGCGC CTGAGTA
CTA ...
... ...
s N −1 AGATCAGGA CTGAGTA
GCCTATTTG · · · · · · AGTT s
ACTACAACAATACA CTGAGTA
TGC · · · · · · AGATCTC Fig. 1. An example data set of DNA sequences. A, C, G, and T denote Adenine, Cytosine, Guanine, and Thymine, respectively. The bold face alignments indicate a motif. In this example, the motif “ CTGAGTA ” is embedded in each sequence. all possible patterns. However, when using the brute force method for DNA or RNA sequences, there are 4 L possible patterns that have to be considered. For a protein sequence, this number becomes 20 L because a protein sequence has 20 amino acids. If L becomes large, the number of the motif patterns exponentially diverges. It means that it is almost impossible to explore all the possible motif patterns in a reasonable time frame. Actually, it has also been proved that MEP belongs to a class of N P-hard [2]. These facts indicate that it is inevitable to develop an effective approximation algorithm for extracting motifs. To realize the issue, many approximation algorithms have been proposed [9]. To search near-optimal solutions of combinatorial optimization problems, many heuristic algorithms have been proposed [3, 4]. It has also been shown that an algorithm, which introduces a chaotic neurodynamics for exploring so- lution spaces, is very effective [6, 7, 8, 10, 11]. The central idea of involving the chaotic neurodynamics is that if we use the chaotic dynamics, a volume of search- ing space reduces drastically. Different from using a kind of random algorithm, the volume of a state space explored by a chaotic dynamics has Lebegue measure zero, because the state space generally consists of a chaotic attractor. To realize an effective chaotic search, an algorithm for controlling a local search by the chaotic dynamics has been proposed. In this algorithm, an execution of a local search, such as 2-opt for solving traveling salesman problems (TSPs) [5, 7], and position movement for MEPs [10, 11] is driven by the chaotic neurodynamics. Then, it has been shown that the chaotic neurodynamics searches good near op- timal solutions for TSPs [5, 7], MEPs [10, 11] and quadratic assignment problem (QAP) [8]. In addition, it has also been clarified that its high searching ability depends on a statistical property of refractoriness in a chaotic neuron [1] for the case of the MEPs [12, 13]. To solve MEPs, we have already proposed a new motif extraction method “Chaotic Motif Sampler (CMS),” by using a chaotic neurodynamics [11]. To realize the chaotic neurodynamics, we used a chaotic neural network composed by a chaotic neuron [1]. In the CMS, the chaotic neurons are assigned to all motif candidate position. Thus, the motif is decided by whether the corresponding chaotic neuron is firing or resting. As a result, it is shown that the CMS can find motif in a very high rate, if only a single motif is embedded in a sequence.
Chaotic Motif Sampler for Motif Discovery 675
Exploring a searching space with avoiding undesirable local minima, the CMS detects the motifs by firing neurons. Then, each neuron generates a complicated spike time-series. Then, the spike time-series of the chaotic neuron corresponding to a correct motif position may exhibit characteristic response. In this paper, in order to reveal such characteristic property, we analyzed complexity of the spike time-series from each chaotic neuron by using a statistical measure, such as coefficient of variation (C V ) and local variation of interspike intervals (L V ), which are frequently used in the field of neuroscience. As a result, if motifs are embedded in a sequence, corresponding spike time-series shows characteristic behavior. A chaotic neuron to which correct motifs are assigned has a higher C V value than the other neurons, while the value of the L V is lower than the other neurons. As a result, even if multiple motifs are embedded in a sequence, CMS can extract the motifs. 2 Chaotic Motif Sampler The CMS uses chaotic neurons [1] generating chaotic neurodynamics to explore better solutions embed in a state space [11]. In other word, the motif candidates are decided by firing of the chaotic neuron. To realize the CMS, we first used a chaotic neural network [1] composed of N i=1
(m i − L + 1) neurons (Fig. 2). In this neural network, the firing of a chaotic neuron encodes a movement of the head position of a motif candidate to the corresponding position (Fig. 2). j = 1 2
4 5 6 · · · · · · · m i − L + 1 i = 1
2 ...
N C T A G T i i
i i T C A A
... ...
A G C G C i i
· · · i i
G C A T i i i i i i · · · i i
T A A G G N: The number of sequence L: Motif length m i : The number of elements in each sequence : Motif candidate h : Chaotic neuron Fig. 2. How to code motif positions to neuron firings. In this example, the motif length is five.
The firing of the (i, j)th neuron is then defined by x i,j
(t) = f (y i,j
(t)) > 1 2 , where f (y) = 1/(1 + exp( −y/ )), and y i,j
(t) is an internal state of the (i, j)th neuron at time t. The internal state of the chaotic neuron [1] is decomposed into two parts—ξ i,j
(t) and ζ i,j
(t)—each of which represents different effects or determining the firing of a neuron in the algorithm; a gain effect, a refractory effect, and a mutual inhibition effect, respectively.
676 T. Matsuura and T. Ikeguchi The first part, ξ i,j
(t), which expresses the gain effect, is defined by ξ i,j (t + 1) = β E i,j
(t) − ˆ
E , (1)
E i,j
(t) = 1 L L k=1 ω
∈Ω f k (ω) log 2 f k (ω)
p(ω) , (2) where β is a scaling parameter of the effect; E i,j
(t) represents the objective function in CMS, a relative entropy score when a candidate motif position is moved to the jth position in the sequence s i ; ˆ E, the entropy score of the current state; f
k (ω), the number of appearances of an element (one of the four bases in the case of DNA or RNA sequences and 20 amino acids in the case of a protein sequence) ω ∈ Ω at the kth position of subsequences; p(ω), the background probability of appearance of the element ω; and Ω, a set of bases (Ω BASE =
C, G ,T }) or a set of amino acids (Ω ACID =
C, M, P, F, A, V, L, I }). The quantity on the right-hand side of Eq.(1) becomes positive if the new motif candidate position is better than the current state. The second part, ζ i,j (t), qualitatively realizes the refractoriness of the neuron. The refractoriness is one of the important properties of real biological neurons; once a neuron fires, it can hardly fire for a certain period of time. The second part is then expressed as follows: ζ i,j (t + 1) = −α t d=0 k d r x i,j (t − d) + θ
(3) = −αx i,j (t) + k
r ζ i,j (t) + θ(1 − k
r ) (4) where α is a scaling parameter to decide strength of the refractory effect after neuron firing (0 < α); k r , a decay parameter that assumes values between 0 and 1; and θ, a threshold value. Thus, in Eq.(3), ζ i,j
(t + 1) expresses the refractory effect with a factor k r because the more frequently the neuron has fired in its past, the more negative the first term of the right side hand of Eq.(3) becomes, which in turn depresses the value of ζ i,j (t+1) and leads the neuron to a relatively resting state. Although the refractoriness realized in the chaotic neuron [1] has similar effect such as tabu effect [3, 4], we have already shown that the refractoriness can control to search a solution in a state space better than the tabu effect [12, 13]. By using these two internal states, we construct an algorithm for extract- ing motifs, as described in the following. Consider a set of N sequences of lengths m i (i = 1, 2, ..., N ); further, let the length of a subsequence (motif) be L (Fig. 2). We proceed as follows: 1. The position of an initial subsequence t i,j (i = 1, 2, ..., N ; j = 1, 2, ..., m i − L + 1) is randomly set at the ith sequence. 2. The ith sequence s i is selected cyclically. 3. For a selected sequence s i at the second step, to change the position of the motif candidate to a new position, y i,j
(t + 1) is calculated from the first neuron (j = 1) to the last neuron (j = m i − L + 1) in the sequence s i . Then,
Chaotic Motif Sampler for Motif Discovery 677
the neuron whose internal state takes maximum (y i,max
) is selected. If the (i, max)th neuron fires (x i,max (t + 1) > 1/2), a new motif position is moved to the (i, max)th position, and the value of ˆ E is updated. 4. Repeat the steps 2 and 3 for the sufficient number of iterations. 3 Statistical Measures C V and L
V [14]
The coefficient of variation (C V ) of interspike intervals is a measure of random- ness of interval variation. The C V is defined as C V = 1 n − 1 n i=1
(T i − T ) 2 T , (5) where T
i is the ith interspike interval (ISI), n is the number of ISIs, and T = 1 n
i=1 T i is the mean ISI. For spike intervals that are independently exponentially distributed, C V is 1 in the limit of a large number of intervals. For a regular spike time-series in which T i is constant, C V = 0.
Next, the local variation (L V ) of interspike intervals is a measure of the spiking characteristics of an individual neuron. The L V is defined as L V = 1 n − 1 n −1 i=1 3(T i − T i −1 ) 2 (T i + T i+1
) 2 . (6) For spike intervals that are independently exponentially distributed, L V = 1.
For a regular spike time-series in which T i is constant, L V = 0.
4 Results
4.1 Investigation of Statistical Measures of C V and L
V In order to investigate the values of C V and L
V [14] for chaotic neurons in CMS which solves MEP, we used real protein sequences [9, 15]. At first, we used the real protein data set consisted of N = 30 sequences. The ith sequence is composed of m i (i = 1, 2, ..., N ) amino acids, and one motif (L = 18) is embedded in each sequences. Because correct locations of the motif for this real protein sequences [15] has already been clarified by another method [9, 15], we can compare the analysis results between the neurons of correct motif position and the others. For more details of the sequence data and how to identify the correct motif positions, see Ref.[9] and the references therein. Table 1 shows correct locations of the motif in each sequence. To extract motifs from this data set, the parameters of the CMS are set to k r = 0.8, α = 0.25, θ=1.0, β=30.0, and =0.01. 678 T. Matsuura and T. Ikeguchi Table 1. Correct location of the motif in each real protein sequence [15] Sequence number 1 2
4 5 6 7 8 9 10 11 12 13 14 15 Correct location 225 198 22 326 449 22 5 73 99 25 169 15 12 196 196 Sequence number 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Correct location 252 444 11 23 3 5 26 67 495 205 160 3 3 27 25 0
200 300
400 500
600 700
800 900
1000 0 100 200 N eur on in de x
15 20 25
50 100 150
200 250
0.8 1
0.8
1 Neuron index (a) (b)
(c) (d)
t Fig. 3. (a) Raster plot, the values of (b) the firing rate, (c) C V and (d) L V for all
neurons in the 3rd sequence of Table 1 It is considered that the frequently firing neurons correspond to location of a correct motif, because the gain effect of the chaotic neurons is decided by a degree of similarity to another motif candidates. Namely, the neurons corresponding to motifs frequently fire. In this sense, it might be enough to observe firing rates of the neurons. However, only the firing rate does not always give a correct result. Figure 3 shows such an example; the results for the 3rd sequence (s 3 ) of the data in Table 1. For the 3rd sequence, although the location of correct motif is 22, the most frequently firing neuron is not the 22nd, but middle (160 ∼ 170) and end (270
∼ 280) neurons of the sequence (Fig. 3 (b)). If we evaluate a raster plot of all the neurons in the 3rd sequence, the fre- quently firing neurons seem to be firing randomly, and a firing pattern of the 22nd neuron shows characteristic behavior (Fig. 3(a)). To detect such characteristic firing behavior, we calculated the statistical values of output spike time-series. As a result, it is clearly seen that the value of the C V of the 22nd neuron is higher than the other neurons and the value of the L V is lower than the other neurons (Figs. 3 (c) and (d)). On the other hand, for the frequently firing neu- rons, the values of C V are low and the values of the L V are high. For the other Chaotic Motif Sampler for Motif Discovery 679
100 200 300
400 500
5 10 15 20 25 30 10 15 20 25 30 S eque n ce n um b er Neuron index (a) Firing rate of neurons
100 200
300 400
500 5 10 15 20 25 30 0.7
0.8 0.9
1 1.1
1.2 1.3
S eque
n ce n um b er Neuron index (b) Values of C V
100 200
300 400
500 5 10 15 20 25 30 0.7
0.8 0.9
1 1.1
1.2 1.3
S eque
n ce n um b er Neuron index (c) Values of L V Fig. 4. The values of (a) the firing rate, (b) C V , and (c) L V of all neurons for the real protein data set. The correct location of the motifs are shown in Table 1. sequences, the neurons corresponding to a correct motif show the same tendency (Fig. 4 and Table 1). Furthermore, if the neuron at a correct motif position fires frequently, C V becomes not low but high, and L V becomes not high but low. The results indicate that spike time-series generated by the chaotic neuron of a correct motif position has different characteristics from the other neurons. 4.2 Multiple Motif Case The original CMS [11, 12] cannot always find multiple motifs from a sequence, because the most similar subsequence is extracted as a motif from the sequence. However, the multiple motifs might be extracted from each sequence by using Table 2. Correct locations of the motif in each artificial protein sequence Sequence number 1 2 3 4 5 6 7 8 9 10 11 12 Correct locations 130 15
Sequence number 13 14 15 16 17 18 19 20 21 22 23 24 25 Correct locations 11 23 50
3 5 26 67 495 205 178 3 3 25
680 T. Matsuura and T. Ikeguchi 0 100
200 300
400 500
600 700
800 900
1000 0 100 200 N eur on in de x
10 15
50 100 150 200
250 0.8
1 1.2
0.8 1 1.2
Neuron index (a)
(b) (c)
(d) t Fig. 5. (a) Raster plot, the values of (b) the firing rate, (c) C V and (d) L V for all
neurons in the 15th sequence of Table 2
100 200
300 400
500 5 10 15 20 25 10 15 20 25 30 S eque n ce n um b er Neuron index (a) Firing rate of neurons
100 200
300 400
500 5 10 15 20 25 0.8 1 1.2 1.4 1.6
S eque
n ce n um b er Neuron index (b) Values of C V
100 200
300 400
500 5 10 15 20 25 0.7 0.8
0.9 1 1.1 1.2 1.3
1.4 S eque n ce n um b er Neuron index (c) Values of L V Fig. 6. The values of (a) the firing rate, (b) C V , and (c) L V of all neurons for the artificial protein data set. The correct location of the motifs are shown in Table 2. characteristics of spike time-series. To verify this hypothesis, we prepared an artificial data set. This artificial data has 25 sequences (N = 25), and the ith sequence is composed of m i (i = 1, 2, ..., N ) amino acids. In each sequence, one Chaotic Motif Sampler for Motif Discovery 681
or two motifs are embedded. Table 2 shows the correct locations of the motif (L = 18) in each sequence. To extract motifs from this data set, the parameters are set to k r = 0.8, α = 0.25, θ = 0.9, β = 25.0, and = 0.01. Figure 5 shows the results for the 15th sequence of Table 2 in which two motifs are embedded. For the 15th sequence, although one neuron of the motif (23rd) takes a high firing rate, the other (156th) is low (Figs. 5 (a) and (b)). However, the both values of C V are higher than the other neurons, and both values of L V are lower than the other neurons (Figs. 5 (c) and (d)). For the other sequences, the neurons of correct motifs show the same tendency (Fig. 6 and Table 2). 5 Conclusions In this paper, to improve solving performance of motif extraction problem by a chaotic neurodynamics, we introduced the spike time-series produced from chaotic neurons in the CMS, and analyzed its complexity by using a statistical measure, such as coefficient of variation (C V ) and local variation (L V ) of inter- spike intervals, which are frequently used in the field of neuroscience. As a result, C V of corresponding neurons to correct motif positions becomes higher than the other neurons and L V becomes lower than the other neurons, independent of firing rates of the neurons. The result of using these characteristics for finding motifs, we can solve MEP or extract the motifs in case that multiple motifs are embed in each sequence. Although statistical values of C V and L V of spikes produced from neurons corresponding to the correct location are depend on each sequence, their value exhibit statistically significant in the sequence. Then, if we evaluate the motif positions more quantitatively by using statis- tical values of C V and L
V , we could develop a new algorithm of the CMS, elim- inating the sequences with no motifs. The research of T.I. is partially supported by a Grant-in-Aid for Scienctific Research (B) (No.16300072) and a research grant from the Mazda Foundation. References [1] Aihara, K., et al.: Chaotic Neural Networks. Physics Letters A 144, 33–340 (1990) [2] Akutsu, T., et al.: On Approximation Algorithms for Local Multiple Alignment. In: Proceedings of the 4th Annual International Conference Molecular Biology, pp. 1–7 (2000) [3] Glover, F.: Tabu Search I. ORSA Journal on Computing 1(3), 190–206 (1989) [4] Glover, F.: “Tabu Search II”. ORSA Journal on Computing 2(1), 4–32 (1990) [5] Hasegawa, M., et al.: Combination of Chaotic Neurodynamics with the 2-opt Algorithm to Solve Traveling Salesman Problems. Physical Review Letters 79(12), 2344–2347 (1997) [6] Hasegawa, M., et al.: Exponential and Chaotic Neurodynamical Tabu Searches for Quadratic Assignment Problems. Control and Cybernetics 29(3), 773–788 (2000) [7] Hasegawa, M., et al.: Solving Large Scale Traveling Salesman Problems by Chaotic Neurodynamics. Neural Networks 15(2), 271–283 (2002)
682 T. Matsuura and T. Ikeguchi [8] Hasegawa, M., et al.: A Novel Chaotic Search for Quadratic Assignment Problems. European Journal of Operational Research 39(3), 543–556 (2002) [9] Lawrence, C.E., et al.: Detecting Subtle Sequence Signals: A Gibbs Sampling Strategy for Multiple Alignment. Science 262, 208–214 (1993) [10] Matsuura, T., et al.: A Tabu Search for Extracting Motifs from DNA Sequences. In: Proceedings of Nonlinear Circuits and Signal Processing 2004, pp. 347–350 (2004) [11] Matsuura, T., et al.: A Tabu Search and Chaotic Search for Extracting Motifs from DNA Sequences. In: Proceedings of Metaheuristics International Conference 2005, pp. 677–682 (2005) [12] Matsuura, T., Ikeguchi, T.: Refractory Effects of Chaotic Neurodynamics for Find- ing Motifs from DNA Sequences. In: Corchado, E.S., Yin, H., Botti, V., Fyfe, C. (eds.) IDEAL 2006. LNCS, vol. 4224, pp. 1103–1110. Springer, Heidelberg (2006) [13] Matsuura, T., et al.: Analysis on Memory Effect of Chaotic Dynamics for Com- binatorial Optimization Problem. In: Proceedings of Metaheuristics International Conference (2007) [14] Shinomoto, S., et al.: Differences in Spiking Patterns Among Cortical Neurons. Neural Computation 15, 2823–2842 (2003) [15] Information available at, http://www.genome.jp/
A Thermodynamical Search Algorithm for Feature Subset Selection F´ elix F. Gonz´ alez and Llu´ıs A. Belanche Languages and Information Systems Department Polytechnic University of Catalonia, Barcelona, Spain {fgonzalez,belanche}@lsi.upc.edu Abstract. This work tackles the problem of selecting a subset of features in an inductive learning setting, by introducing a novel Thermodynamic Feature Selection algorithm (TFS). Given a suitable objective function, the algorithm makes uses of a specially designed form of simulated an- nealing to find a subset of attributes that maximizes the objective func- tion. The new algorithm is evaluated against one of the most widespread and reliable algorithms, the Sequential Forward Floating Search (SFFS). Our experimental results in classification tasks show that TFS achieves significant improvements over SFFS in the objective function with a no- table reduction in subset size. 1 Introduction The main purpose of Feature Selection is to find a reduced set of attributes from a data set described by a feature set. This is often carried out with a subset search process in the power set of possible solutions. The search is guided by the optimization of a user-defined objective function. The generic purpose pursued is the improvement in the generalization capacity of the inductive learner by reduction of the noise generated by irrelevant or redundant features. The idea of using powerful general-purpose search strategies to find a subset of attributes in combination with an inductive learner is not new. There has been considerable work using genetic algorithms (see for example [1], or [2] for a recent successful application). On the contrary, simulated annealing (SA) has received comparatively much less attention, probably because of the prohibitive cost, which can be specially acute when it is combined with an inducer to evaluate the quality of every explored subset. In particular, [3] developed a rule-based system based on the application of a generic SA algorithm by standard bitwise random mutation. We are of the opinion that such algorithms must be tailored to feature subset selection if they are to be competitive with state-of-the-art stepwise search algorithms. In this work we contribute to tackling this problem by introducing a novel Thermodynamic Feature Selection algorithm (TFS). The algorithm makes uses of a specially designed form of simulated annealing (SA) to find a subset of at- tributes that maximizes the objective function. TFS has a number of distinctive characteristic over other search algorithms for feature subset selection. First, the M. Ishikawa et al. (Eds.): ICONIP 2007, Part I, LNCS 4984, pp. 683–692, 2008. c Springer-Verlag Berlin Heidelberg 2008 684 F.F. Gonz´ alez and L.A. Belanche probabilistic capability of SA to accept momentarily worse solutions is enhanced by the concept of an -improvement, explained below. Second, the algorithm is endowed with a feature search window in the backward steps that limits the neighbourhood size while retaining efficacy. Third, the algorithm accepts any objective function for evaluating the quality of generated subsets. In this paper three different inducers have been tried to assess this quality. TFS is evaluated against one of the most reliable stepwise search algorithms, the Sequential For- ward Floating Search method (SFFS). Our experimental results in classification tasks show that TFS notably improves over SFFS in the objective function in all cases, with a very notable reduction in subset size, compared to the full set size. We also find that the compared computational cost is affordable for small number of features and much better as the number of features increases. 2 Feature Selection Let X = {x 1 , . . . , x n }, n > 0 denote the full feature set. Without loss of gene- rality, we assume that the objective function J : P(X) → R
+ ∪ {0} is to be maximized, where P denotes the power set. We denote by X k ⊆ X a subset of selected features, with |X k | = k. Hence, by definition, X 0 = ∅, and X n = X. The feature selection problem can be expressed as: given a set of candidate features, select a subset 1 defined by one of two approaches: 1. Set 0 < m < n. Find X m ⊂ X, such that J(X m ) is maximum. 2. Set a value J 0 , the minimum acceptable J . Find the X m ⊆ X with smaller m such that J (X m ) ≥ J 0 . Alternatively, given α > 0, find the X m ⊆ X with
smaller m, such that J (X m ) ≥ J(X) or |J(X m ) − J(X)| < αJ(X). In the literature, several suboptimal algorithms have been proposed for feature selection. A wide family is formed by those algorithms which, departing from an initial solution, iteratively add or delete features by locally optimizing the objective function. Among them we find the sequential forward generation (SFG) and sequential backward generation (SBG), the plus-l-minus-r (also called plus l - take away r) [12], the Sequential Forward Floating Search (SFFS) and its backward counterpart SFBS [4]. The high trustworthiness and performance of SFFS has been ascertained experimentally (see e.g. [5], [6]). 3 Simulated Annealing Simulated Annealing (SA) is a stochastic technique inspired on statistical mechan- ics for finding near globally optimum solutions to large (combinatorial) optimiza- tion problems. SA is a weak method in that it needs almost no information about the structure of the search space. The algorithm works by assuming that some part of the current solution belongs to a potentially better one, and thus this part 1 Such an optimal subset of features always exists but is not necessarily unique. A Thermodynamical Search Algorithm for Feature Subset Selection 685
should be retained by exploring neighbors of the current solution. Assuming the objective function is to be minimized, SA can jump from hill to hill and escape or simply avoid sub-optimal solutions. When a system S (considered as a set of pos- sible states) is in thermal equilibrium (at a given temperature T ), the probability P T
probability follows a Boltzmann distribution: P T (s) = exp(
− E(s)
kT ) Z , with Z = s ∈S exp − E(s) kT where k is the Boltzmann constant and Z acts as a normalization factor. Metropo- lis and his co-workers [7] developed a stochastic relaxation method that works by simulating the behavior of a system at a given temperature T . Being s the current state and s a neighboring state, the probability of making a transition from s to s is the ratio P T (s
of being in s : P T (s → s ) =
P T (s ) P T (s) = exp − ΔE kT where we have defined ΔE = E(s ) −E(s). Therefore, the acceptance or rejection of s as the new state depends on the difference of the energies of both states at temperature T . If P T (s ) ≥ P T (s) then the “move” is always accepted. It P T (s ) < P T (s) then it is accepted with probability P T (s, s ) < 1 (this situation corresponds to a transition to a higher-energy state). Note that this probability depends upon and decreases with the current temperature. In the end, there will be a temperature low enough (the freezing point ), wherein these transitions will be very unlikely and the system will be considered frozen. In order to maximize the probability of finding states of minimal energy at every temperature, thermal equilibrium must be reached. The SA algorithm pro- posed in [8] consists on using the Metropolis idea at each temperature for a finite amount of time. In this algorithm the temperature is first set at a initially high value, spending enough time at it so to approximate thermal equilibrium. Then a small decrement of the temperature is performed and the process is iterated until the system is considered frozen. If the cooling schedule is well designed, the final reached state may be considered a near-optimal solution. 4 TFS: A Thermodynamic Feature Selection Algorithm In this section we introduce TFS (Thermodynamic Feature Selection). Consi- dering SA as a combinatorial optimization process [10], TFS finds a subset of attributes that maximizes the value of a given objective function. A special- purpose feature selection mechanism is embedded into the SA tehcnique that takes advantage of the probabilistic acceptance capability of worse scenarios over a finite time. This characteristic is enhanced in TFS by the notion of an -improvement: a feature -improves a current solution if it has a higher value of the objective function or a value not worse than %. This mechanism is intended 686 F.F. Gonz´ alez and L.A. Belanche to account for noise in the evaluation of the objective function due to finite sam- ple sizes. The algorithm is also endowed with a feature search window (of size l) in the backward step, as follows. In forward steps always the best feature is added (by looking at all possible additions). In backward steps this search is lim- ited to l tries at random (without repetition). The value of l is incremented by one at every thermal equilibrium point. This mechanism is an additional source of non-determinism and a bias towards adding a feature only when it is the best option available. In contrast, to remove a feature, it suffices that its removal -improves the current solution. A direct consequence is of course a considerable speed-up of the algorithm. Note that the design of TFS is such that it grows more and more deterministic, informed and costly as it converges towards the final configuration. Fig. 1. TFS algorithm for feature selection. For the sake of clarity, only those variables that are modified are passed as parameters to Forward and Backward. The pseudo-code of TFS is depicted in Figs. 1 to 3. The algorithm consists of two major loops. The outer loop waits for the inner loop to finish and then updates the temperature according to the chosen cooling schedule. When the outer loop reaches T min
, the algorithm halts. The algorithm keeps track of the best solution found (which is not necessarily the current one). The inner loop is the core of the algorithm and is composed of two interleaved procedures: Forward and Backward, that iterate until a thermal equilibrium point is found, represented by reaching the same solution before and after. These procedures work independently of each other, but share information about the results of their respective searches in the form of the current solution. Within them, feature selection takes place and the mechanism to escape from local minima starts A Thermodynamical Search Algorithm for Feature Subset Selection 687
PROCEDURE Forward (var Z, J Z ) Repeat x := argmax { J(Z ∪ {x i }) }, x i ∈ X
n \ Z
If -improves (Z, x, true) then accept := true else ΔJ := J(Z ∪ {x}) − J(Z) accept := rand(0, 1) < e ΔJ t
If accept then Z := Z ∪ {x} endif If J (Z) > J Z then J Z := J(Z) endif Until not accept END Forward Fig. 2. TFS Forward procedure (note Z, J Z are modified) PROCEDURE Backward (var Z, J Z ) A := ∅; A
B := ∅ Repeat For i := 1 to min(l, |Z|) do FUNCTION -improves (Z, x, b) Select x ∈ Z \ A
B randomly
RETURNS boolean If -improves (Z, x, f alse) If b then Z := Z ∪ {x}
then A := A ∪ {x} endif else Z := Z \ {x} endif A B
B ∪ {x}
Δx := J(Z ) − J(Z)
EndFor If Δx > 0 then return true x 0
{J(Z \ {x})}, x ∈ A B else return −Δx J (Z)
< endif
If x 0 ∈ A then accept := true END -improves else
ΔJ := J(Z \ {x
0 }) − J(Z) accept := rand(0, 1) < e ΔJ t endif If accept then Z := Z \ {x 0
If J (Z) > J Z then J Z := J(Z) endif Until not accept END Forward Fig. 3. Left: TFS Backward procedure (note Z, J Z are modified and x 0 can be effi- ciently computed while in the For loop). Right: The function for -improvement. working, as follows: these procedures iteratively add or remove features one at a time in such a way that an -improvement is accepted unconditionally, whereas a non -improvement is accepted probabilistically. 5 An Experimental Study In this section we report on empirical work. There are nine problems, taken mostly from the UCI repository and chosen to be a mixture of different real-life feature selection processes in classification tasks. In particular, full feature size
688 F.F. Gonz´ alez and L.A. Belanche ranges from just a few (17) to dozens (65), sample size from few dozens (86) to the thousands (3,175) and feature type is either continuous, categorical or binary. The problems have also been selected with a criterion in mind not very commonly found in similar experimental work, namely, that these problems are amenable to feature selection. By this it is meant that performance benefits clearly from a good selection process (and less clearly or even worsens with a bad one). Their main characteristics are summarized in Table 1. Table 1. Problem characteristics. Instances is the number of instances, Features is that of features, Origin is real/artificial and Type is categorical/continuous/binary. Name Instances Features Origin Type
Breast Cancer (BC) 569
30 real
continuous Ionosphere (IO) 351 34
continuous Sonar (SO) 208 60
continuous Mammogram (MA) 86 65
continuous Kdnf (KD) 500 40
binary Knum (KN) 500 30
binary Hepatitis (HP) 129 17
categorical Splice (SP) 3,175 60
categorical Spectrum (SC) 187 22
categorical 5.1
Experimental Setup Each data set was processed with both TFS and SFFS in wrapper mode [11], us- ing the accuracy of several classifiers as the objective function : 1-Nearest Neigh- bor (1NN) for categorical data sets; 1-Nearest Neighbor, Linear and Quadratic discriminant analysis (LDA and QDA) for continuous data sets [12]. We chose these for being fast, parameter-free and not prone to overfit (specifically we discarded neural networks). Other classifiers (e.g. Naive Bayes or Logistic Re- gression) could be considered, being this matter user-dependent. Decision trees are not recommended since they perform their own feature selection in addition to that done by the algorithm and can hinder an accurate assessment of the results. The important point here is noting that the compared algorithms do not specifically depend on this choice. Despite taking more time, leave-one-out cross validation is used to obtain reliable generalization estimates, since it is known to have low bias [9]. The T FS parameters are as follows: = 0.01, T 0 = 0.1 and T min = 0.0001. These settings were chosen after some preliminary trials and are kept constant for all the problems. The cooling function was chosen to be geometric α(t) = ct, taking c = 0.9, following recommendations in the literature [10]. Note that SFFS needs the specification of the desired size of the final solution [4], acting also as a stop criterion. This parameter is very difficult to estimate in practice. To overcome this problem, we let SFFS to run over all possible sizes 1 . . . n, where n is the size of each data set. This is a way of getting the best performance of this algorithm. In all cases, a limit of 100 hours computing time was imposed to the executions. A Thermodynamical Search Algorithm for Feature Subset Selection 689
Table 2. Performance results. J opt
is the value of the objective function for the final solution and k its size. For categorical problems only 1NN is used. Note NR are the corresponding results with no reduction of features. The results for SFFS on the SP data set (marked with a star) were the best after 100 hours of computing time, when the algorithm was cut (this is the only occasion this happened). TFS
NR SFFS
1NN LDA
QDA 1NN LDA QDA 1NN LDA
QDA J opt k J opt
k J opt
k J opt J opt
J opt
J opt
k J opt
k J opt
k BC 0.96 14 0.98 11 0.99 9 0.92 0.95
0.95 0.94 11 0.98 13 0.97 7 IO 0.95 12 0.90 13 0.95 8 0.87
0.85 0.87
0.93 13 0.89 6 0.94 14 SO 0.91 25 0.84 9 0.88 10 0.80 0.70
0.60 0.88 7 0.79 23 0.85 11 MA 0.91 6 0.99 10 0.94 6 0.70
0.71 0.62
0.84 16 0.93 6 0.93 5 SP 0.91 8 n/a n/a n/a n/a 0.78 n/a n/a
0.90 ∗ 6 ∗ n/a n/a n/a n/a SC 0.74 7 n/a n/a n/a n/a 0.64 n/a
n/a 0.73 9 n/a n/a n/a n/a KD 0.70 11 n/a n/a n/a n/a 0.60 n/a
n/a 0.68 11 n/a n/a n/a n/a KN 0.86 14 n/a n/a n/a n/a 0.69 n/a
n/a 0.85 12 n/a n/a n/a n/a HP 0.91 4 n/a n/a n/a n/a 0.82 n/a
n/a 0.91 4 n/a n/a n/a n/a A number of questions are raised prior to the realization of the experiments: 1. Does the Feature Selection process help to find solutions of similar or better accuracy using lower numbers of features? Is there any systematic difference in performance for the various classifiers? 2. Does TFS find better solutions in terms of the objective function J ? Does it find better solutions in terms of the size k? 5.2 Discussion of the Results Performance results for TFS and SFFS are displayed in Table 2, including the results obtained with no feature selection of any kind, as a reference. Upon realization of the results we can give answers to the previously raised questions: 1. The feature selection process indeed helps to find solutions of similar or better accuracy using (much) lower numbers of features. This is true for both algorithms and all of the three classifiers used. Regarding a systematic differ- ence in performance, for the classifiers, the results are non-conclusive, as can reasonably be expected, being this matter problem-dependent in general [11]. 2. In terms of the best value of the objective function, TFS outperforms SFFS in all the tasks, no matter the classifier, except in HP for 1NN, where there is a tie. The difference is sometimes substantial (more than 10%). Recall SFFS was executed for all possible final size values and the figure reported is the best overall. In this sense SFFS did never came across the subsets obtained by TFS in the search process (otherwise they would have been recorded). It is hence conjectured that TFS can have a better access to hidden good subsets than SFFS does. In terms of the final subset size, the results are quite interesting. 690 F.F. Gonz´ alez and L.A. Belanche Both TFS and SFFS find solutions of smaller size using QDA, then using LDA and finally using 1NN, which can be checked by calculating the column totals. T FS gives priority to the optimization of the objective function, without any specific concern for reducing the final size, However, it finds very competitive solutions both in terms of accuracy and size. This fact should not be overlooked, since it is by no means clear that solutions of bigger size should have better value of the objective function and viceversa. In order to avoid the danger of loosing relevant information, many times it is better to accept solutions of somewhat bigger size if these entail a significantly higher value of the objective function. Finally, it could be argued that the conclusion that TFS consistently yields better final objective function values is but optimistic. Possibly the results for other sizes are consistently equal or even worse. To check whether this is the case, we show the entire solution path for one of the runs, representative of the experimental behaviour found (figs. 4). It is seen that for (almost) all sizes, TFS offers better accuracy, very specially at lowest values. (9, 0.99) (7, 0.97) 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 K-TFS Accuracy 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 K-SFFS TFS-QDC SFFS-QDC Fig. 4. Comparative full performance for all k with QDA on the BC data set. The best solutions (those shown in Table 2) are marked by a bigger circle or square. 5.3
Computational Cost The computational cost of the performed experiments is displayed in Table 3. It is seen that SFFS takes less time for smaller sized problems (e.g. HP, BC, IO and KN), whereas TFS is more efficient for medium to bigger sized ones (e.g. SO, MA, KD and SP), where time is more and more an issue. In this vain, the two-phase interleaved mechanism for forward and backward exploration carries out a good neighborhood exploration, thereby contributing to a fast relaxation of the algorithm. Further, the setup of the algorithm is made easier than in a conventional SA since the time spent at each temperature is automatically and dynamically set. In our experiments this mechanism did not lead to the stagnation of the process. The last analyzed issue concerns the distribution of features as selected by TFS. In order to determine this, a perfectly known data set is needed. An artificial prob- lem f has been explicitly designed, as follows: letting x 1 , . . . , x n be the relevant features f (x 1 , · · · , x n ) = 1 if the majority of x i is equal to 1 and 0 otherwise. Next, completely irrelevant features and redundant features (taken as copies of
A Thermodynamical Search Algorithm for Feature Subset Selection 691
Table 3. Computational costs: J eval
is the number of times the function J is called and Time is the total computing time (in hours) 0 50 100 150 200 250 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Download 12.42 Mb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling