Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
595 views
in Technique[技术] by (71.8m points)

python - Is there function that can remove the outliers?

I find a function to detect outliers from columns but I do not know how to remove the outliers

is there a function for excluding or removing outliers from the columns

Here is the function to detect the outlier but I need help in a function to remove the outliers

import numpy as np
import pandas as pd
outliers=[]
def detect_outlier(data_1):

    threshold=3
    mean_1 = np.mean(data_1)
    std_1 =np.std(data_1)


    for y in data_1:
        z_score= (y - mean_1)/std_1 
        if np.abs(z_score) > threshold:
            outliers.append(y)
    return outliers

Here the printing outliers

#printing the outlier 
outlier_datapoints = detect_outlier(df['Pre_TOTAL_PURCHASE_ADJ'])
print(outlier_datapoints)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

An easy solution would be to use scipy.stats.zscore

from scipy.stats import zscore
# calculates z-score values
df["zscore"] = zscore(df["Pre_TOTAL_PURCHASE_ADJ"]) 

# creates `is_outlier` column with either True or False values, 
# so that you could filter your dataframe accordingly
df["is_outlier"] = df["zscore"].apply(lambda x: x <= -1.96 or x >= 1.96)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...