machine learning - How should zero standard deviation in one of the features be handled in multi-variate gaussian distribution -
i using multi-variate guassian distribution analyze abnormality. how training set looks
19-04-16 05:30:31 1 0 0 377816 305172 5567044 0 0 0 14 62 75 0 0 100 0 0 <date> <time> <--------------------------- ------- features --------------------------->
lets 1 of above features not change, remain zero.
calculation mean = mu
mu = mean(x)'
calculating sigma2 as
sigma2 = ((1/m) * (sum((x - mu') .^ 2)))'
probability of individual feature in each data set calculated using standard gaussian formula as
for particular feature, if values come out zero, mean (mu) zero. subsequently sigma2 zero. thereby when calculate probability through gaussian distribution, "device zero" problem.
however, in test sets, feature value can fluctuate , term abnormality. how, should handled? dont want ignore such feature.
so - problem occurs every time when have variable constant. approximating normal distribution has absolutely no sense. whole information such variable contained in 1 value - , intuition why division 0 phenomenon occurs.
in case when know there these fluctuations in variable not observed in training set - set variance of such variable not lesser value. apply function max(variance(x), eps)
instead of classic variance definition. - sure no division 0 occurs.
Comments
Post a Comment