I'm trying to do a "database side" bulk copy (i.e. SELECT INTO/INSERT INTO) using linq2db. However, my code is trying to bring the dataset over the wire which is not possible given the size of the DB in question.
My code looks like this:
using (var db = new MyDb()) {
var list = db.SourceTable.
Where(s => s.Year > 2012).
GroupBy(s => new { s.Column1, s.Column2 }).
Select(g => new DestinationTable {
Property1 = 'Constant Value',
Property2 = g.First().Column1,
Property3 = g.First().Column2,
Property4 = g.Count(s => s.Column3 == 'Y')
});
db.Execute("TRUNCATE TABLE DESTINATION_TABLE");
db.BulkCopy(new BulkCopyOptions {
BulkCopyType = BulkCopyType.MultipleRows
}, list);
}
The generated SQL looks like this:
BeforeExecute
-- DBNAME SqlServer.2017
TRUNCATE TABLE DESTINATION_TABLE
DataConnection
Query Execution Time (AfterExecute): 00:00:00.0361209. Records Affected: -1.
DataConnection
BeforeExecute
-- DBNAME SqlServer.2017
DECLARE @take Int -- Int32
SET @take = 1
DECLARE @take_1 Int -- Int32
SET @take_1 = 1
DECLARE @take_2 Int -- Int32
...
SELECT
(
SELECT TOP (@take)
[p].[YEAR]
FROM
[dbo].[SOURCE_TABLE] [p]
WHERE
(([p_16].[YEAR] = [p].[YEAR] OR [p_16].[YEAR] IS NULL AND [p].[YEAR] IS NULL) AND ...
...)
FROM SOURCE_TABLE p_16
WHERE p_16.YEAR > 2012
GROUP BY
...
DataConnection
That is all that is logged as the bulkcopy fails with a timeout, i.e. SqlException "Execution Timeout Expired".
Please note that running this query as an INSERT INTO statement takes less than 1 second directly in the DB.
PS: Anyone have any recommendations as to good code based ETL tools to do large DB (+ 1 TB) ETL. Given the DB size I need things to run in the database and not bring data over the wire. I've tried pyspark, python bonobo, c# etlbox and they all move too much data around. I thought linq2db had potential, i.e. basically just act like a C# to SQL transpiler but it is also trying to move data around.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…