bash - 在文件中查找并替换与另一个文件中的字符串匹配的子字符串(find and replace substrings in a file which match strings in another file)

Question

Welcome To Ask or Share your Answers For Others

bash - 在文件中查找并替换与另一个文件中的字符串匹配的子字符串(find and replace substrings in a file which match strings in another file)

asked Feb 21, 2021 in Technique[技术] by 深蓝 (71.8m points)

bash - 在文件中查找并替换与另一个文件中的字符串匹配的子字符串(find and replace substrings in a file which match strings in another file)

I have two txt files: File1 is a tsv with 9 columns.

(我有两个txt文件： File1是具有9列的tsv。)

Following is its first row ( SRR6691737.359236/0_14228//11999_12313 is the first column and after Repeat is the 9th column):

(以下是其第一行（ SRR6691737.359236/0_14228//11999_12313是第一列，而Repeat是第9列）：)

SRR6691737.359236/0_14228//11999_12313  Censor  repeat  5       264     1169    +       .       Repeat BOVA2 SINE 1 260 9

File2 is a tsv with 9 columns.

(File2是具有9列的tsv。)

Following is its first row (after Read is the 9th column):

(以下是其第一行（在“ Read”为第9列之后）：)

CM011822.1  reefer  discordance 63738705    63738727    .   +   .   Read SRR6691737.359236 11999 12313; Dup 277

File1 contains information of read name ( SRR6691737.359236 ), read length ( 0_14228) and coordinates ( 11999_12313 ) while file two contains only read name and coordinate.

(File1包含读取名称（ SRR6691737.359236 ），读取长度（ 0_14228)和坐标（ 11999_12313 ）的信息，而文件2仅包含读取名称和坐标。)

All read names and coordinates in file1 are present in file2, but file2 may also contain the same read names with different coordinates.

(file1中的所有读取名称和坐标都存在于file2中，但是file2也可能包含具有不同坐标的相同读取名称。)

Also file2 contains read names which are not present in file1.

(而且file2包含file1中不存在的读取名称。)

I want to write a script which finds read names and coordinates in file2 that match those in file1 and adds the read length from file1 to file2.

(我想编写一个脚本，在文件2中找到与文件1中的名称和坐标相匹配的读取名称和坐标，并将读取的长度从文件1添加到文件2中。)

ie changes the last column of file2:

(即更改file2的最后一列：)

Read SRR6691737.359236 11999 12313; Dup 277

to:

(至：)

Read SRR6691737.359236/0_14228//11999_12313; Dup 277

any help?

(有什么帮助吗？)

ask by Mani Ghani poor Samami translate from so

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-02-21T04:09:39+0000

If unclear how your input files look look like.

(如果不清楚，输入文件的外观如何。)

You write:

(你写：)

I have two txt files: File1 is a tsv with 9 columns.
(我有两个txt文件： File1是具有9列的tsv。)
Following is its first row ( SRR6691737.359236/0_14228//11999_12313 is the first column and after Repeat is the 9th column):
(以下是其第一行（ SRR6691737.359236/0_14228//11999_12313是第一列，而Repeat是第9列）：)
 SRR6691737.359236/0_14228//11999_12313 Censor repeat 5 264 1169 + . Repeat BOVA2 SINE 1 260 9 

If I try to check the columns (and put them in a 'Column,Value' pair): Column,Value 1,SRR6691737.359236/0_14228//11999_12313 2,Censor 3,repeat 4,5 5,264 6,1169 7,+ 8,.

(如果我尝试检查列（并将它们放在“列，值”对中）：列，值1，SRR6691737.359236 / 0_14228 // 11999_12313 2，检查器3，重复4,5 5,264 6,1169 7，+ 8，)

9,Repeat 10,BOVA2 11,SINE 12,1 13,260 14,9

(9，重复10，BOVA2 11，正弦12,1 13,260 14,9)

That seems to have 14 columns, you specify 9 columns...

(似乎有14列，您指定了9列...)

Can you edit your question, and be clear about this?

(您可以编辑您的问题，并对此清楚吗？)

ie specify as csv SRR6691737.359236/0_14228//11999_12313,Censor,repeat,5,.....

(即指定为csv SRR6691737.359236/0_14228//11999_12313,Censor,repeat,5,.....)

Added info, after feedback : file1 contains the following fields (tab-separated):

(反馈后添加的信息：file1包含以下字段（制表符分隔）：)

SRR6691737.359236/0_14228//11999_12313
(SRR6691737.359236 / 0_14228 // 11999_12313)
Censor
(审查)
5
(5)
264
(264)
1169
(1169)
+
(+)
.
(。)
Repeat BOVA2 SINE 1 260 9
(重复BOVA2 SINE 1 260 9)

You want to convert this (using a script) to a tab-separated file:

(您想要将此（使用脚本）转换为制表符分隔的文件：)

CM011822.1
(CM011822.1)
reefer
(冷藏箱)
distance
(距离)
63738705
(63738705)
63738727
(63738727)
+
(+)
.
(。)
Read SRR6691737.359236 11999 12313
(阅读SRR6691737.359236 11999 12313)
Dup 277
(杜普277)

More info is needed to solve this!

(需要更多信息来解决这个问题！)

field 1: How/Where is the info for 'CM011822.1' coming from?

(字段1：“ CM011822.1”的信息来自何处？)

field 2 and 3: 'reefer'/'distance'.

(栏2和3：“冷藏箱” /“距离”。)

Is this fixed text, should these fields always contain these texts or are there exceptions?

(这是固定文本吗？这些字段应始终包含这些文本吗？还是有例外？)

field 4 and 5: Where are these values (63738705 ; 63738727) coming from?

(字段4和5：这些值（63738705; 63738727）来自何处？)

OK, it's clear that there are more questions to be asked than can give here …

(好吧，很明显，这里有更多的问题要问……)

Categories

bash - 在文件中查找并替换与另一个文件中的字符串匹配的子字符串(find and replace substrings in a file which match strings in another file)

bash - 在文件中查找并替换与另一个文件中的字符串匹配的子字符串(find and replace substrings in a file which match strings in another file)

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags