I have a file of paths called test.txt
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001G_1_Clean.fastq.gz
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001G_2_Clean.fastq.gz
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001T_1_Clean.fastq.gz
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001T_2_Clean.fastq.gz
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002G_1_Clean.fastq.gz
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002G_2_Clean.fastq.gz
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002T_1_Clean.fastq.gz
/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002T_2_Clean.fastq.gz
Notice that the number of lines is even and always even, my final goal is to parse this file and create a new one looping through these paths on a two by two basis. I am trying enumerate
function but this will not parse two by two. Furthermore, I'm going out of range because indexing the way I'm doing is wrong. It would also be great if someone could tell me how to index properly with enumerate
.
with open('./src/test.txt') as f:
for index,line in enumerate(f):
sample = re.search(r'pfg[dGT]+',line)
sample_string = sample.group(0)
#print(sample_string)
print('{{"name":"{0}","readgroup":"{0}","platform_unit":"{0}","fastq_1":"{1}","fastq_2":"{2}","library":"{0}"}},'.format(sample_string,line,line[index+1]))
The result is something like this:
{"name":"pfg001G","readgroup":"pfg001G","platform_unit":"pfg001G","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001G_1_Clean.fastq.gz
","fastq_2":"g","library":"pfg001G"},
{"name":"pfg001G","readgroup":"pfg001G","platform_unit":"pfg001G","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001G_2_Clean.fastq.gz
","fastq_2":"r","library":"pfg001G"},
{"name":"pfg001T","readgroup":"pfg001T","platform_unit":"pfg001T","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001T_1_Clean.fastq.gz
","fastq_2":"o","library":"pfg001T"},
{"name":"pfg001T","readgroup":"pfg001T","platform_unit":"pfg001T","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001T_2_Clean.fastq.gz
","fastq_2":"u","library":"pfg001T"},
{"name":"pfg002G","readgroup":"pfg002G","platform_unit":"pfg002G","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002G_1_Clean.fastq.gz
","fastq_2":"p","library":"pfg002G"},
{"name":"pfg002G","readgroup":"pfg002G","platform_unit":"pfg002G","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002G_2_Clean.fastq.gz
","fastq_2":"s","library":"pfg002G"},
{"name":"pfg002T","readgroup":"pfg002T","platform_unit":"pfg002T","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002T_1_Clean.fastq.gz
","fastq_2":"/","library":"pfg002T"},
{"name":"pfg002T","readgroup":"pfg002T","platform_unit":"pfg002T","fastq_1":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg002T_2_Clean.fastq.gz","fastq_2":"c","library":"pfg002T"},
Clearly the indexation is wrong since it's going through every element of my path that is g
r
etc instead of printing the next path. For the first iteration the next path printed should be: "fastq_2":"/groups/cgsd/javed/validation_set/LeungSY_Targeted_SS-190528-01a/Clean/pfg001G_2_Clean.fastq.gz"
.
I believe the problem itself can be tackled with itertools
more elegantly I just don't know how to do it. Would also be great if someone could tell me if an indexation with enumerate could also work.
question from:
https://stackoverflow.com/questions/65913089/python-enumerate-out-of-range-when-looping-through-a-file