Sample package using SSIS 2008 R2 that inserts or updates using batch operation:
Here is a sample package written in SSIS 2008 R2
that illustrates how to perform insert, update between two databases using batch operations.
- Using
OLE DB Command
will slow down the update operations on your package because it does not perform batch operations. Every row is updated individually.
The sample uses two databases namely Source
and Destination
. In my example, both the databases reside on the server but the logic can still be applied for databases residing on different servers and locations.
I created a table named dbo.SourceTable
in my source database Source
.
CREATE TABLE [dbo].[SourceTable](
[RowNumber] [bigint] NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[ModifiedOn] [datetime] NOT NULL,
[IsActive] [bit] NULL
)
Also, created two tables named dbo.DestinationTable
and dbo.StagingTable
in my destination database Destination
.
CREATE TABLE [dbo].[DestinationTable](
[RowNumber] [bigint] NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[ModifiedOn] [datetime] NOT NULL
)
GO
CREATE TABLE [dbo].[StagingTable](
[RowNumber] [bigint] NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[ModifiedOn] [datetime] NOT NULL
)
GO
Inserted about 1.4 million rows in the table dbo.SourceTable
with unique values into RowNumber
column. The tables dbo.DestinationTable
and dbo.StagingTable
were empty to begin with. All the rows in the table dbo.SourceTable
have the flag IsActive
set to false.
Created an SSIS package with two OLE DB connection managers, each connecting to Source
and Destination
databases. Designed the Control Flow as shown below:
First Execute SQL Task
executes the statement TRUNCATE TABLE dbo.StagingTable
against the destination database to truncate the staging tables.
Next section explains how the Data Flow Task
is configured.
Second Execute SQL Task
executes the below given SQL statement that updates data in dbo.DestinationTable
using the data available in dbo.StagingTable
, assuming that there is a unique key that matches between those two tables. In this case, the unique key is the column RowNumber
.
Script to update:
UPDATE D
SET D.CreatedOn = S.CreatedOn
, D.ModifiedOn = S.ModifiedOn
FROM dbo.DestinationTable D
INNER JOIN dbo.StagingTable S
ON D.RowNumber = S.RowNumber
I have designed the Data Flow Task as shown below.
OLE DB Source
reads data from dbo.SourceTable
using the SQL command SELECT RowNumber,CreatedOn, ModifiedOn FROM Source.dbo.SourceTable WHERE IsActive = 1
Lookup transformation
is used to check if the RowNumber value already exists in the table dbo.DestinationTable
If the record does not exist, it will be redirected to the OLE DB Destination
named as Insert into destination table
, which inserts the row into dbo.DestinationTable
If the record exists, it will be redirected to the OLE DB Destination
named as Insert into staging table
, which inserts the row into dbo.StagingTable
. This data in staging table will be used in the second `Execute SQL Task to perform batch update.
To activate few more rows for OLE DB Source, I ran the below query to activate some records
UPDATE dbo.SourceTable
SET IsActive = 1
WHERE (RowNumber % 9 = 1)
OR (RowNumber % 9 = 2)
First execution of the package looked as shown below. All the rows were directed to destination table because it was empty. The execution of the package on my machine took about 3 seconds
.
Ran the row count query again to find the row counts in all three table.
To activate few more rows for OLE DB Source, I ran the below query to activate some records
UPDATE dbo.SourceTable
SET IsActive = 1
WHERE (RowNumber % 9 = 3)
OR (RowNumber % 9 = 5)
OR (RowNumber % 9 = 6)
OR (RowNumber % 9 = 7)
Second execution of the package looked as shown below. 314,268 rows
that were previously inserted during first execution were redirected to staging table. 628,766 new rows
were directly inserted into the destination table. The execution of the package on my machine took about 12 seconds
. 314,268 rows
in destination table were updated in the second Execute SQL Task with the data using staging table.
Ran the row count query again to find the row counts in all three table.
I hope that gives you an idea to implement your solution.