Skip to content

Bulk Insert in SQL server

February 11, 2012

Hi All,

Bulk insert command in SQL server provides a easy and fast way to load the table/Flat file. One of the common process performed in ETL processes using BCP or BULK INSERT to import large amounts of data into a empty database.

The Syntax for the Bulk-Insert is as follows:
 
BULK INSERT
[ database_name . [ schema_name ] . | schema_name . ] [ table_name | view_name ]
FROM ‘data_file’
[ WITH
(
[ [ , ] BATCHSIZE = batch_size ]
[ [ , ] CHECK_CONSTRAINTS ]
[ [ , ] CODEPAGE = { ‘ACP’ | ‘OEM’ | ‘RAW’ | ‘code_page’ } ]
[ [ , ] DATAFILETYPE =
{ ‘char’ | ‘native’| ‘widechar’ | ‘widenative’ } ]
[ [ , ] FIELDTERMINATOR = ‘field_terminator’ ]
[ [ , ] FIRSTROW = first_row ]
[ [ , ] FIRE_TRIGGERS ]
[ [ , ] FORMATFILE = ‘format_file_path’ ]
[ [ , ] KEEPIDENTITY ]
[ [ , ] KEEPNULLS ]
[ [ , ] KILOBYTES_PER_BATCH = kilobytes_per_batch ]
[ [ , ] LASTROW = last_row ]
[ [ , ] MAXERRORS = max_errors ]
[ [ , ] ORDER ( { column [ ASC | DESC ] } [ ,…n ] ) ]
[ [ , ] ROWS_PER_BATCH = rows_per_batch ]
[ [ , ] ROWTERMINATOR = ‘row_terminator’ ]
[ [ , ] TABLOCK ]
[ [ , ] ERRORFILE = ‘file_name’ ]
)]

 
There are a lot of  parameters that can be used but I will describe few of them here that are significant.

Before I start few things that need to remember while practicing BULK INSERT

  • You have rights to execute the BULK INSERT command (System admin,Bulk admin role)
  • The Bulk-Insert command won’t accept a string concatenation or variable in the FROM clause. If you need to do this, it will be possible only with the help of dynamic SQL statement.
  • Since BULK INSERT is dependent on the column position of both the source file and the destination file, it is the best practice to use a view as an abstraction layer between the BULK INSERT command and the table.
  • If the structure of the source file or the destination file is altered, then modifying the view can keep the BULK INSERT running without having to change the other object’s structure.
  • The file is present on the server or a shared drive that can be accessed from the SQL server. This is best if the file is present on the itself as network traffic may slow down the process.

Now, it’s time to discuss some significant parameters in the BULK INSERT:

  • Field Terminator: It specifies the character used to delimit or separate columns in the source file. It can be “,” (comma) or “|” (pipe).
  • Row Terminator: It specifies the character that ends a row in the source file. ‘\n’ means end of row and is the typical setting. Files from mainframe don’t use a clean end of line. So we have to use hex editor to view the actual end of the line characters and specify the row terminator in hex. A hex value of ‘0A’ is quoted as ‘0x0A’.
  • FirstRow: It is useful when specifying whether the incoming file has column headers.
  • TabLock: It places an exclusive lock on the entire table and saves SQL Server the trouble of having to lock the table’s data pages being created.
  • Rows per Batch: It tells SQL Server to insert n number of rows in a single batch, rather than the entire file.
  • Max Errors: It specifies how many rows can fail before the bulk insert fails.

Example:

BULK INSERT model.TEST.BULK_INSERT FROM ‘\\RAHUL-PC\TEST\BULK_INSERT.txt’
WITH (
  ROWTERMINATOR = ‘\n’,
  FIELDTERMINATOR = ‘|’,
  FIRSTROW = 2,
  MAXERRORS = 1,
  ERRORFILE = ‘\\RAHUL-PC\TEST\Err_BULK_INSERT.txt’,
  ROWS_PER_BATCH = 100000,
  ORDER (ID ASC),
  TABLOCK
)

 
Syntax of the BULK INSERT is very clear and easy to understand.The statement starts with the BULK INSERT keyword and then followed by the name of the target table name. Then the FROM clause includes the file name with full path(path may be your local folder path or the network path where the file resides).

The statement also contains optional WITH clause with multiple options that can be altered or removed as per the requirement. The TABLOCK option locks the table while loading it may cause unavailibility of data but enhances the load performance.

In some cases, however, you might want to use a format file when importing your data. In such situations, rather than specifying the formatting details in the WITH clause, you specify the name of the format file, as shown in the following example:

BULK INSERT model.TEST.BULK_INSERT
FROM ‘\\RAHUL-PC\TEST\BULK_INSERT.txt’
WITH
(
FORMATFILE = ‘\\RAHUL-PC\TEST\BULK_INSERT.fmt’
);
 

When you use a format file to import data, you don’t need to know the format of the source data; you need to know only the location of the format file.

Below is an example I used to create a dynamic SQL to perform BULK INSERT. Here the file name is similar to the the table name (like if the table name is Employee then the file name will be like Employee.txt)

 

CREATE PROCEDURE [TEST].[BulkInsert]
   @FILE AS NVARCHAR(128) = ‘Employee’,
   @DRIVE AS NVARCHAR(128) = ‘TEST’,
   @SCHEMA NVARCHAR(128) =’TEST’
AS
BEGIN
SET NOCOUNT ON
DECLARE
    @DBNAME NVARCHAR(100),
    @SERVER_NAME NVARCHAR(100),
    @HOST NVARCHAR(100),
    @ERROR_FILE NVARCHAR(128),
    @TABLE NVARCHAR(128),
    @SQLTRUNC NVARCHAR(4000),
    @SQL NVARCHAR(4000),
    @ROWCNT INT,
    @FLAG INT = 1
    /*** Variable for ORDER BY clause ***/

    ,@CNT INT,
    @MAX_CNT INT = 1,
    @COL_NAME VARCHAR(100),
    @KEY_ORDER VARCHAR(20),
    @ALL_COLS VARCHAR(100)

    SET @DBNAME = (SELECT DB_NAME() )
    SET @HOST = (SELECT HOST_NAME() )
    SET @SERVER_NAME = (SELECT @@SERVERNAME)
    SET @TABLE = @DBNAME+’.’+@SCHEMA+’.’+@FILE

/*** START : Section for creating Dynamic ORDER BY ***/

SELECT idx.object_id AS object_id ,Ix.column_id AS column_id,Ix.is_descending_key
INTO #INDEXED_COLUMNS — DROP TABLE #INDEXED_COLUMNS
FROM sys.indexes idx
INNER JOIN (
SELECT object_id,column_id,index_id,is_descending_key
FROM sys.index_columns
WHERE object_id = OBJECT_ID(N”+@TABLE+”)
    ) Ix
ON idx.object_id = Ix.object_id
AND idx.index_id = Ix.index_id
WHERE type_desc = ‘CLUSTERED’
SELECT name,ix.column_id,is_descending_key
INTO #IDX_NAME — DROP TABLE #IDX_NAME
FROM sys.all_columns cols
INNER JOIN #INDEXED_COLUMNS ix
ON ix.object_id = cols.object_id
AND ix.column_id = cols.column_id

SELECT @CNT = COUNT(1) FROM #IDX_NAME — SELECT * FROM #IDX_NAME

WHILE @MAX_CNT
BEGIN
 SELECT @ALL_COLS = COALESCE(@ALL_COLS+’, ‘ , ”) + name +’ ‘+
   CASE WHEN is_descending_key = 0
     THEN ‘ASC’ ELSE ‘DESC’ END
   FROM #IDX_NAME WHERE COLUMN_ID = @MAX_CNT

    SET @MAX_CNT = @MAX_CNT +1
END

/*** END : Section for creating Dynamic ORDER BY ***/

    SET @SQLTRUNC = ‘TRUNCATE TABLE ‘+@TABLE
    SET @SQL = ‘BULK INSERT ‘+@TABLE+’ FROM ”\\’+@HOST+’\’+@DRIVE+’\’+@FILE+’.txt”’
+’ WITH (
    ROWTERMINATOR = ‘+”’\n”’+’,
    FIELDTERMINATOR = ‘+”’|”’+’,
    FIRSTROW = 2,
    MAXERRORS = 1,
    ERRORFILE = ‘+”’\\’+@HOST+’\’+@DRIVE+’\Err_’+@FILE+’.txt”’+’,
    ROWS_PER_BATCH = 100000,
    ORDER (‘+@ALL_COLS+’),
    TABLOCK
    )’

BEGIN TRY

     EXEC (@SQLTRUNC)

END TRY

BEGIN CATCH

     PRINT ‘Error while truncation’
     SET @FLAG = 0

END CATCH

BEGIN TRY

     EXEC (@SQL)
     SET @ROWCNT = @@ROWCOUNT

     DROP TABLE #IDX_NAME
     DROP TABLE #INDEXED_COLUMNS

END TRY

BEGIN CATCH

     PRINT ‘Error while loading the table ‘
     SET @FLAG = 0

END CATCH

    IF @FLAG = 0
    BEGIN
      RAISERROR(@@ERROR,18,1)
    END

SET NOCOUNT OFF
END
GO

 
That’s all for the BULK INSERT. Details of other parameters can be used can be accessed from msdn site or online books.

Advertisements

From → SQL Server 2008

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: