I am trying to "append" new information to a CSV file. The problem is that that information is not in a dataframe structure, but is information extracted from a text using regular expressions. The sample text would be the next one:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Etiam id diam
posuere, eleifend diam at, condimentum justo. Pellentesque mollis a
diam id consequat.
TITLE-SDFSD-DFDS-SFDS-01-01: This is the title 1 that
is split into two lines with a blank line in the middle
Conditions Pellentesque blandit scelerisque pellentesque. Sed nec quam
purus. Quisque nec tellus sed neque accumsan lacinia sit amet sit amet
tellus. Etiam venenatis nibh vel pellentesque elementum. Nullam eget
tortor quam. Morbi sed leo et arcu aliquet luctus.
Opening date 15 Apr 2021
Deadline 26 Aug 2021
Indicative budget: The total indicative budget for the topic is EUR
20.00 million.
TITLE-SDFSD-DFDS-SFDS-01-02; This is the title2 in one single line
Conditions Cras egestas consectetur sapien at dignissim. Maecenas
commodo purus nibh, a tempus augue vestibulum feugiat. Vestibulum
dolor neque, sagittis ut tortor et, lobortis faucibus quam.
Opening date 15 March 2021
Deadline 17 Aug 2021
Indicative budget: The total indicative budget for the topic is EUR
15.00 million.
TITLE-SDFSD-DFDS-SFDS-01-03: This is the title3 that is too long and takes
two lines
Conditions Cras egestas consectetur sapien at dignissim. Maecenas
commodo purus nibh, a tempus augue vestibulum feugiat. Vestibulum
dolor neque, sagittis ut tortor et, lobortis faucibus quam.
Opening date 15 May 2021
Deadline 26 Sep 2021
Indicative budget: The total indicative budget for the topic is EUR
5.00 million.
To extract all the information, I have to make several interactions to extract the information I need. I know it is possible to make just one iteration subdividing into several groups what I need, but it is very hard for me to find just a regular expression that works. Instead, I am using several of them:
import re
import csv
with open('doubt2.txt','r', encoding="utf-8") as f:
f_contents = f.read()
regexHOR =r'
(TITLE-S+-d{2}-d{2})[:|;](.*?)^Conditions'
regexOD = r'^Opening dates+(d{1,2} w+ d{4})s*?'
regexDL =r'^Deadlines+(d+ w+ d+)'
patternHOR = re.compile(regexHOR, re.MULTILINE | re.DOTALL)
patternOD = re.compile(regexOD, re.MULTILINE | re.DOTALL)
patternDL = re.compile(regexDL, re.MULTILINE | re.DOTALL)
matchesHOR = patternHOR.finditer(f_contents)
matchesOD = patternOD.finditer(f_contents)
matchesDL = patternDL.finditer(f_contents)
marchesHOR
finds two groups, whereas the other matches are just one group. Once I have the matches I have to export it in a CSV file executing the next code:
with open("result.csv", "w",newline='') as outfile:
csvfile = csv.writer(outfile)
csvfile.writerow(['Topic ID', 'Title', 'Opening date', 'Deadline'])
for match in matchesHOR:
csvfile.writerow([match.group(1), match.group(2).replace('
', ' '),'',''])
for match in matchesOD:
csvfile.writerow(['','',match.group(1),''])
for match in matchesDL:
csvfile.writerow(['','','',match.group(1)])
The problem is that when I write the new nows after the matchesHOR
it put me below, as you can see in this table: