pandas - Anonymizing data / replacing names

Question

Welcome To Ask or Share your Answers For Others

pandas - Anonymizing data / replacing names

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

pandas - Anonymizing data / replacing names

Normally I anonymize my data by using hashlib and using the .apply(hash) function.

Now im trying a new approach, imagine I have to following df called 'data':

df = pd.DataFrame({'contributor':['eric', 'frank', 'john', 'frank', 'barbara'],
                   'amount payed':[10,28,49,77,31]})

  contributor  amount payed
0        eric            10
1       frank            28
2        john            49
3       frank            77
4     barbara            31

Which I want to anonymize by turning the names all into person1, person2 etc, like this:

output = pd.DataFrame({'contributor':['person1', 'person2', 'person3', 'person2', 'person4'],
                       'amount payed':[10,28,49,77,31]})

  contributor  amount payed
0     person1            10
1     person2            28
2     person3            49
3     person2            77
4     person4            31

So my first though was summarizing the name column so the names are attached to a unique index and I can use that index for the number after 'person'.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:31:47+0000

I think faster solution is use factorize for unique values, add 1, convert to Series and strings and prepend Person string:

df['contributor'] = 'Person' + pd.Series(pd.factorize(df['contributor'])[0] + 1).astype(str)
print (df)
  contributor  amount payed
0     Person1            10
1     Person2            28
2     Person3            49
3     Person2            77
4     Person4            31

Categories

pandas - Anonymizing data / replacing names

pandas - Anonymizing data / replacing names

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags