Handling Missing Values in Data Mining Submitted By
Discovering the Disguise
Download 304.86 Kb. Pdf ko'rish
|
Article by missing data
4.2 Discovering the Disguise
Assuming the presence of disguised missing data in the datasets the most important question comes in one’s mind is how to detect them. If the data is adequately disguised in the dataset then sometimes even domain knowledge or best of the methods known cannot detect them. But the approach is to identify abnormal values or patterns in the datasets with the help of domain knowledge or other methods and try to distinguish real from disguised data. The basic step is to identify suspicious values in the datasets which may look factual but are actually fake or false data. With the background knowledge a preliminary analysis of data can be done thus coming up with the range of values for each attribute. Domain knowledge might also prove useful in the above process. Once we have the range of attributes we can examine the data to find suspicious values and thus detect disguised values. Alternatively partial domain knowledge can also prove useful in exposing disguised missing data. For example, even if we do not have any knowledge of lower or upper bounds of data we can still come to a conclusion for variables like age that they can never be negative. Detecting outliers can sometimes help in uncovering disguised missing data but not always. If the values selected to encode missing data are sufficiently far outside the range of the nominal data to appear as outliners, we can apply standard techniques to look for disguised missing data [3].
Data Cleaning and Preparation Term Paper Submitted by: Bhavik Doshi
Page | 8
Download 304.86 Kb. Do'stlaringiz bilan baham: |
ma'muriyatiga murojaat qiling