python - Comparing two columns of a csv and outputting string similarity ratio in another csv

Question

Welcome To Ask or Share your Answers For Others

python - Comparing two columns of a csv and outputting string similarity ratio in another csv

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Comparing two columns of a csv and outputting string similarity ratio in another csv

I am very new to python programming. I am trying to take a csv file that has two columns of string values and want to compare the similarity ratio of the string between both columns. Then I want to take the values and output the ratio in another file.

The csv may look like this:

Column 1|Column 2 
tomato|tomatoe 
potato|potatao 
apple|appel

I want the output file to show for each row, how similar the string in Column 1 is to Column 2. I am using difflib to output the ratio score.

This is the code I have so far:

import csv
import difflib

f = open('test.csv')

csf_f = csv.reader(f)

row_a = []
row_b = []

for row in csf_f:
    row_a.append(row[0])
    row_b.append(row[1])

a = row_a
b = row_b

def similar(a, b):
    return difflib.SequenceMatcher(a, b).ratio()

match_ratio = similar(a, b)

match_list = []
for row in match_ratio:
    match_list.append(row)

with open("output.csv", "wb") as f:
    writer = csv.writer(f, delimiter=',')
    writer.writerows(match_list)

f.close()

I get the error:

Traceback (most recent call last):
  File "comparison.py", line 24, in <module>
    for row in match_ratio:
TypeError: 'float' object is not iterable

I feel like I am not importing the column list correctly and running it against the sequencematcher function.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:25:01+0000

The for loop you're setting up here expects something like an array where you have match_ratio, and judging by the error you're getting, that's not what you have. It looks like you're missing the first argument for difflib.SequenceMatcher, which should probably be None. See 6.3.1 here: https://docs.python.org/3/library/difflib.html

Without that first argument specified, I think you're getting back 0.0 from difflib.SequenceMatcher and then trying to run ratio off of that. Even if you correct your SequenceMatcher call, I think you'll still be trying to iterate on a single float value that ratio is returning. I think you need to call SequenceMatcher inside the loop for each set of values you're comparing.

So you'd wind up with a call more like this in your function: difflib.SequenceMatcher(None, a, b). Or if you'd prefer, since these are named arguments, you could do something like this: difflib.SequenceMatcher(a=a, b=b).

Categories

python - Comparing two columns of a csv and outputting string similarity ratio in another csv

python - Comparing two columns of a csv and outputting string similarity ratio in another csv

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags