st. dev. Nˊ
46
25
9
11
9
avg. M
664
337
151
146
167
st. dev. M
182
94
45
32
46
avg. Mˊ
474
263
134
97
138
st. dev. Mˊ
145
81
41
29
43
Table 5: Word statistics for the input sets (avg. – average value, st. dev. – standard
deviation)
The identification of properly and improperly spelled names and surnames
consisted in comparing the words starting with an uppercase letter to the dictionary
contents, and comparing the words starting with a lowercase letter to the dictionary
contents. We performed the second step to detect potentially misspelled names and/or
surnames.
The summary of the experimental results is shown in Fig. 5. The figure contains the
information on the number of properly identified names and surnames (true positives,
TP), the number of words improperly identified as names or surnames (false positives,
FP) and the number of corrected words. On the X axis of each subplot, label S
ij
corresponds to the j-th member of the i-th set of texts, where i = 1, 2, . . ., 5 and j = 0, 1, .
. . , 5. The S
i0
, for i = 1, 2, . . . 5, corresponds to the original story in each set. Hence there
are no corrected words for these elements.
From the results shown in Fig. 5 it follows that using the dictionary we were able to
correct all misspelled words in all cases. On the other hand, it is also worth noticing that
we were also suggested to correct some properly spelled words due to their polysemy
(see the black bars in Fig. 5). Therefore, we conclude that the results of the comparison
with the dictionary contents should always be verified by the user. Otherwise, apart from
correcting the misspelled words, we can also introduce some new errors into our texts.
Do'stlaringiz bilan baham: |