Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.4k views
in Technique[技术] by (71.8m points)

linq - Most efficient way to compare two generic lists based on id elements contained within nested list (C#)

I have two generic lists of items each containing a list of suppliers and their id's:

List<ExisitingItems>
    List<Suppliers>

List <PotentialMatches>
    List<Suppliers>

Suppliers
    SupplierId
    Name

I need to compare the list of existing items with a list of potential matches based on the nested supplier id and name in each.

I currently compare these two lists with the desired result like so:

  foreach (var potentialMatch in _potentialMatches)
            {
foreach (var supplier in potentialMatch.Suppliers)
                {
                    var match = ExistingItems.Find
                       (e => e.Suppliers.Any
                        (s => s.SupplierId == supplier.SupplierItemId && s.Name == supplier.Name));

//Do stuff with match
                }
             }

However, when processing large amounts of records >500k it is not efficient and performs very slowly.

How can I perform the same type of compare but more efficiently?

question from:https://stackoverflow.com/questions/65939495/most-efficient-way-to-compare-two-generic-lists-based-on-id-elements-contained-w

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Your current algorithm seem to be O(n*m*s*s) where n = number of existing items, m = number number of potential matches and s = average number of suppliers for each existingItem/PotentialMatch. You could reduce the running time to O(n*m*s) by using a hash-set for the matching of suppliers.

A generic version would look like this

public static IEnumerable<(T1, T2)> SetJoin<T1, T2, TKey>(
        IEnumerable<T1> t1s,
        IEnumerable<T2> t2s,
        Func<T1, IEnumerable<TKey>> t1Key,
        Func<T2, IEnumerable<TKey>> t2Key) where TKey : IEquatable<TKey>
    {
        foreach (var t1 in t1s)
        {
            var t1Keys = new HashSet<TKey>(t1Key(t1));
            foreach (var t2 in t2s)
            {
                // t2Key(t2) would be called many times, 
                // might be worth pre-computing it for each t2.
                if (t2Key(t2).Any(t1Keys.Contains))
                {
                    yield return (t1, t2);
                }
            }    
        }
    }

And call it like

SetJoin<ExistingItems, PotentialMatches, int>(
              existingItems, 
              potentialMatches,
              e=> e.Suppliers.Select(s => s.Id),
              p => p.Suppliers.Select(s => s.Id))

Also, while linq result in compact and nice code, it is often faster to write the equivalent logic using regular loops if performance is important.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...