Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
223 views
in Technique[技术] by (71.8m points)

python - Can I deal with outliers in my data by applying tanh(x)?

I am working with financial data and cannot assume a Gaussian distribution. So I normalize my data by subtracting the median and dividing by the interquartile range. This puts 95% of the data into a range [-2,2]. The rest are a bunch of crazy outliers that can be as high as -8, 28, 47 etc.

But I still dont want to throw the outliers away. So I apply a tanh(x) to my entire normalized time series and the majority of the data that are in the range [-2,-2] are now mapped to [-0.95, 0.95], and the crazy outliers are now saturated close to -1 and 1, and the really crazy ones are all mapped to precisely -1 and 1. Order is kept throughout the process, because tanh(x) is a monotonic function. And the Machine Learning algorithm doesnt have to waste time and energy on numbers that have much larger absolute values than others. The extreme outliers are all now in two groups, -1 and 1.

By the way, the tanh compression doesnt destroy too many unique values. That is, close values are not collapsed to the same value by tanh. I get almost exatly the same amount of unique values in my time series before the tanh, as after.

The data will be fed into Neural Network, Random Forest, and Gradient Boosted Decision Trees. (Even though decision trees dont care too much about outliers, I still want to force all indicators into the same range [-1,1]).

What are the bad consequences to my approach, compared to just throwing the outliers away? What am I missing?

question from:https://stackoverflow.com/questions/66052101/can-i-deal-with-outliers-in-my-data-by-applying-tanhx

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...