I'm using Pandas for some data cleanup, and I have a very long regex which I would like to split into multiple lines. The following works fine in Pandas because it is all on one line:
df['REMARKS'] = df['REMARKS'].replace(to_replace =r'(?=[^])}]*([[({]|$))(?:GR|MDT|CMR|HLDS|NEXT|NGI|MDTS|RES|PPC|IND|FDC|CNL)(?:s*(?:,s*)?(?:(?:or|and)s+)?(?:GR|MDT|CMR|HLDS|NEXT|NGI|MDTS|RES|PPC|IND|FDC|CNL))*', value = r'<g<0>>', regex = True)
However, it is difficult to manage. I've tried the following verbose method which works in regular Python:
df['REMARKS'] = df['REMARKS'].replace(to_replace =r"""(?=[^])}]*([[({]|$))
(?:GR|MDT|CMR|HLDS|NEXT|NGI|MDTS|RES|PPC|IND|FDC|CNL)
(?:s*(?:,s*)?(?:(?:or|and)s+)?
(?:GR|MDT|CMR|HLDS|NEXT|NGI|MDTS|RES|PPC|IND|FDC|CNL))*""", value = r'<g<0>>', regex = True)
This does not work in Pandas, though. Any ideas what I'm missing?
Here is some sample text for testing:
GR, MDT, CMR, HLDS, NEXT, NGI @ 25273, COMPTG
FIT 13.72 ON 9-7/8 LNR, LWD[GR,RES,APWD,SONVIS], MDTS (PRESS & SAMP)
ROT SWC, TSTG BOP
LWD[GR,RES,APWD,SONVIS], GR, RES, NGI, PPC @ 31937, MDTS (PRESS &
SAMP) TKG ROT SWC
LWD[GR,RES] @ 12586, IND, FDC, CNL, GR @ 12586, SWC, RAN CSG, PF
12240-12252, RR (ADDED INFO)
Thanks!