this is my first question, so first of all, I wanted to say that I appreciate the information from stack overflow community and have found many solutions to my problems since today. Please excuse my poor skills since I just started learning Python and am not a professional programmer by any means.
Problem is as followed, I imported a dataframe from an excel file with several hundred rows and columns with multiline strings, which I need now to clean up.
Inputfile
| From | To |
| ----------- | -- |
| /7
/8
123.456.7
A-300A/A
TEXTA1
/8
/7
A123.456.7
AB-6005B/7A
/7 | 123.456.7
B-300B/8
TEXTB2
/8
/7
B123.456.7
B-6005B/7
/7
|
This is what the Input looks like
Normally the pattern is like (.*).d{3}.d
(A123.345.6 or just 123.345.6) or [A-Z]{1,2}-d*[A-Z]{0,2}/(.{0,2})
(A-300A/A or AB-6005B/7A). I managed to organize my data (got rid of TEXTA1, white-spaces, etc.) in a column ['To'] to a certain point until i reached this:
multiline string cell
I used df['To']=df['To'].str.replace('/d$', "",regex=True, flags=re.MULTILINE)
and without the MULTILINE flag, but it either replaces only the last 7
or every
with a digit. But the result should be as following
Outputfile
| From | To |
| ----------- | -- |
| 123.456.7
A-300A/A
A123.456.7
A-6005B/7 | 123.456.7
B-300B/B
B123.456.7
B-6005B/7
|
This is what the Output should look like
Is there a way to simply delete everything that doesn't match both the pattern mentioned above instead of replacing each wrong entry?
It feels to complicated to sort every mistake instead of just choosing the correct lines.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…