Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
613 views
in Technique[技术] by (71.8m points)

sql - LINQ Joins - Performance

I am curious on how exactly LINQ (not LINQ to SQL) is performing is joins behind the scenes in relation to how Sql Server performs joins.

Sql Server before executing a query, generates an Execution Plan. The Execution Plan is basically an Expression Tree on what it believes is the best way to execute the query. Each node provides information on whether to do a Sort, Scan, Select, Join, ect.

On a 'Join' node in our execution plan, we can see three possible algorithms; Hash Join, Merge Join, and Nested Loops Join. Sql Server will choose which algorithm to for each Join operation based on expected number of rows in Inner and Outer tables, what type of join we are doing (some algorithms don't support all types of joins), whether we need data ordered, and probably many other factors.

Join Algorithms:

Nested Loop Join: Best for small inputs, can be optimized with ordered inner table.

Merge Join: Best for medium to large inputs sorted inputs, or an output that needs to be ordered.

Hash Join: Best for medium to large inputs, can be parallelized to scale linearly.

LINQ Query:

DataTable  firstTable, secondTable;

...

var rows = from firstRow in firstTable.AsEnumerable ()
                join secondRow in secondTable.AsEnumerable ()
                    on firstRow.Field<object> (randomObject.Property)
                    equals secondRow.Field<object> (randomObject.Property)
           select new {firstRow, secondRow};

SQL Query:

SELECT *
FROM firstTable fT
    INNER JOIN secondTable sT ON fT.Property = sT.Property

Sql Server might use a Nested Loop Join if it knows there are a small number of rows from each table, a merge join if it knows one of the tables has an index, and Hash join if it knows there are a lot of rows on either table and neither has an index.

Does Linq choose its algorithm for joins? or does it always use one?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The methods on System.Linq.Enumerable are performed in the order they are issued. There is no query optimizer at play.

Many methods are very lazy, which allows you to not fully enumerate the source by putting .First or .Any or .Take at the end of the query. That is the easiest optimization to be had.

For System.Linq.Enumerable.Join specifically, the docs state that this is a hash join.

The default equality comparer, Default, is used to hash and compare keys.

So examples:

//hash join (n+m) Enumerable.Join
from a in theAs
join b in theBs on a.prop equals b.prop

//nestedloop join (n*m)  Enumerable.SelectMany
from a in theAs
from b in theBs
where a.prop == b.prop

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...