Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
653 views
in Technique[技术] by (71.8m points)

string - Word comparison algorithm

I am doing a CSV Import tool for the project I'm working on. The client needs to be able to enter the data in excel, export them as CSV and upload them to the database. For example I have this CSV record:

   1,   John Doe,     ACME Comapny   (the typo is on purpose)

Of course, the companies are kept in a separate table and linked with a foreign key, so I need to discover the correct company ID before inserting. I plan to do this by comparing the company names in the database with the company names in the CSV. the comparison should return 0 if the strings are exactly the same, and return some value that gets bigger as the strings get more different, but strcmp doesn't cut it here because:

"Acme Company" and "Acme Comapny" should have a very small difference index, but "Acme Company" and "Cmea Mpnyaco" should have a very big difference index Or "Acme Company" and "Acme Comp." should also have a small difference index, even though the character count is different. Also, "Acme Company" and "Company Acme" should return 0.

So if the client makes a type while entering data, i could prompt him to choose the name he most probably wanted to insert.

Is there a known algorithm to do this, or maybe we can invent one :) ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You might want to check out the Levenshtein Distance algorithm as a starting point. It will rate the "distance" between two words.

This SO thread on implementing a Google-style "Do you mean...?" system may provide some ideas as well.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...