Use Series.str.extractall
for get all numeric, convert to integers and last sum per duplicated index values:
df['text_sum'] = df['text'].str.extractall('(d+)')[0].astype(int).sum(level=0)
print (df)
position base text text_sum
1 458372 A 19:t|12:cg|7:CG|1:tcag|1:T 40
2 458373 C 21:GCA|3:GCG|3:ATA|2:GCGAA|1:GTA|1:CGAG|1:g 32
Or if possible sum values splitted by |
and then before :
use:
df['text_sum'] = df['text'].apply(lambda x: sum(int(y.split(':')[0]) for y in x.split('|')))
print (df)
position base text text_sum
1 458372 A 19:t|12:cg|7:CG|1:tcag|1:T 40
2 458373 C 21:GCA|3:GCG|3:ATA|2:GCGAA|1:GTA|1:CGAG|1:g 32
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…