Monday, August 24, 2015

How do you eliminate special characters from a text source data file

Source File: Comma Delimited CSV file

The file has special characters which need to be eliminated before loading in the database or any further transformations.This can be done by the TEXT QUALIFIER feature of text connection Manager.






Change TEXT QUALIFIER TO #

The data is cleaned of special character #

Package Design which Checks existence of data file .Only If it exists Execute Package,If not found exit package

Using the same earlier package

http://simplyssis.blogspot.com/2015/08/how-to-use-begin-tran-in-ssis-rollback.html

Let's now add a requirement that the package checks for the existence of the data file.If the data file exists it then executes the tasks if not exits the task.

To check the existence of the data file we use the SCRIPT TASK and a user variable with the full name(name & location) of the data file.


The C# code to check for the existence of the file is:

if (System.IO.File.Exists(Dts.Variables["User::file_name"].Value.ToString()))
            { Dts.TaskResult = (int)ScriptResults.Success; }
            else
            { Dts.TaskResult = (int)ScriptResults.Failure; }


Then Add SCRIPT TASK with precedence constraint on Success to BEGIN TRAN SQl command.If the Script task doesn't find the data file it fails and the following tasks are not executed.

Also change the delayvalidation property of the transaction container  to true.If this is set to False the entire package will fail at pre-execute stage and even the script task won't execute.Delayvalidation property delays the validation to the execution stage
 

If the script task finds the data file the following tasks are executed


Sunday, August 23, 2015

How to use "BEGIN TRAN" in SSIS-ROLLBACK on failure/COMMIT on success.

Scenario: Truncate Table before loading the data.However if the loading fails rollback the truncate.If loading succeeds commit transaction.

Data Source: Excel File

OLEDB Destination:
A normal package to truncate case_details and load the data from the source file would look like:
However in the event the load fails sql command is still executed.

Hence a normal package with SQL COMMAND & DATA FLOW TASK will not work.

In such a situation wherein we need to rollback/commit tasks BEGIN TRAN & CONTAINER need to be used

Resolved Package:

First TASK  is SQL COMMAND TASK  connected to container

The Transaction Container has precedence constraint on success to SQL COMMAND TASK(COMMIT)


 .It has precedence constraint on failure to SQL COMMAND TASK(ROLLBACK)

 For the successful execution of package set the Trueproperty of OLEDB Connection Manager as TRUE

On Execution:
If loading is successful the transaction commits

If loading fails the transaction rollbacks



Error"[Lookup [102]] Error: The cache is being written to by Cache Transform, and cannot be read by Lookup" when using Cache Transform


The package fails with the error:
[Lookup [102]] Error: The cache is being written to by Cache Transform, and cannot be read by Lookup.
[Lookup [102]] Error: Failed to prepare the cache for providing data.
[SSIS.Pipeline] Error: Lookup failed the pre-execute phase and returned error code 0xC0010204.


The package fails because cache transform and lookup are used in the same data flow.Cache transform and Lookup need to be in different data flow with precedence constraint on Cache transform data flow for the package to work.

Resolved Package:

 Data Flow with Cache Transform:
Data Flow with Lookup:



Saturday, August 22, 2015

How To Extract Duplicate Data from a Source Excel/CSV File in SSIS

Scenario: The source excel/csv file must have unique case numbers.Since the alphanumeric case numbers are not auto generated there can be duplicate case numbers with different details.

Requirement: The SSIS package has to identify the duplicate case numbers and load their details in a separate destination file 





Case 1: Only the duplicate Case_numbers are required




 SSIS Package Transformation: Aggregate



 SSIS Package Transformation: Conditional Split



Destination File with Duplicate Case Numbers




Case 2: Duplicate Case_numbers with details are required

 The SSIS package will be different from the previous one.
Additional data transformations Multicast,Sort,Merge Join will be used.


Merge Join transformation will use a left inner join and extract details from master file for the matching data from conditional split transformation


Sort transformation is required for both input sources of merge join.
After execution of the package the destination file has all the details for duplicate case numbers




The important question here is why did I use a merge join  over a cache transform & lookup approach?

A cache transform & lookup approach uses a equi-join and is followed if for each input row there exists a single row in the lookup/cache reference.It returns back the first found matching row.
In our case there were multiple rows for the same case numbers.When lookup is used the destination file looks like as below which is not the desired output.