Normally I anonymize my data by using hashlib and using the .apply(hash) function.
Now im trying a new approach, imagine I have to following df called 'data':
df = pd.DataFrame({'contributor':['eric', 'frank', 'john', 'frank', 'barbara'],
'amount payed':[10,28,49,77,31]})
contributor amount payed
0 eric 10
1 frank 28
2 john 49
3 frank 77
4 barbara 31
Which I want to anonymize by turning the names all into person1
, person2
etc, like this:
output = pd.DataFrame({'contributor':['person1', 'person2', 'person3', 'person2', 'person4'],
'amount payed':[10,28,49,77,31]})
contributor amount payed
0 person1 10
1 person2 28
2 person3 49
3 person2 77
4 person4 31
So my first though was summarizing the name column so the names are attached to a unique index and I can use that index for the number after 'person'.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…