Saturday, March 19, 2016

How to delete Duplicate records from a SQL Server table with Identity Column


Scenario: A package with a Data Flow task runs multiple times and inserts duplicate records in the destination table.The destination table is a transactional table with Numeric Identity column for transaction_id for each record inserted.Although the records are duplicate,they are still unique because of the identity property of the transaction_id column.
 
Following would be the best way to delete the duplicate records while maintaining the identity of the first load inserted.

The package loads data in destination table TBL_Transaction_Details as below.
 

  After the package execution the data in the table looks like:


The Data from Transaction_Id 6 onwards is duplicate which needs to be deleted.

Using a CTE code with rownumber over partition by the duplicated column values and order by the transaction id, will remove the duplicate transactions while restoring the id's of the first load.

WITH CTE_Transaction_Details ([Customer_NAME]
      ,[Customer_address]
      ,[Transaction_amount]
      ,[Transaction_discount]
      ,[discount_percentage]
      ,[New_customer]
      ,[Transaction_Fees],Count_duplicate)
AS
(
SELECT [Customer_NAME]
      ,[Customer_address]
      ,[Transaction_amount]
      ,[Transaction_discount]
      ,[discount_percentage]
      ,[New_customer]
      ,[Transaction_Fees]
      ,ROW_NUMBER() OVER(PARTITION BY [Customer_NAME]
      ,[Customer_address]
      ,[Transaction_amount]
      ,[Transaction_discount]
      ,[discount_percentage]
      ,[New_customer]
      ,[Transaction_Fees] ORDER BY transaction_id) AS Count_duplicate
  FROM [dbo].[TBL_Transaction_Details]
)

DELETE
FROM CTE_Transaction_Details
WHERE count_duplicate > 1

GO

When we delete from a CTE it deletes from the source table

After the above code is executed the table is cleaned of the duplicate data




Friday, March 4, 2016

Error SSIS Variable (Cannot assign value to variable): The result of the expression cannot be written to the property. The expression was evaluated, but cannot be set on the property.

I recently encountered an error while dynamically returning the value of a connection string in a parent package used to execute child packages in a for-each loop.The parent master package retrieves the name of the child packages from a database table and passes the value to a connection string variable(user::packagename) in the for each loop container,each enumerator of which is the package name for the subsequent execute package task in the container.

However the package on runtime failed with the error "The result of the expression “@variablename” on property  cannot be written to the property. The expression was evaluated, but cannot be set on the property."

When a script task was used to debug the issue it correctly showed all the package names in the loop which proved that the value to the variable was being written correctly...However when the script task was replaced by execute package task it failed with the above error.After much troubleshooting found out that the value of the string user:packagename must be initialised to an existing connection manager in the parent package before executing it .Once that was done the parent package worked perfectly executing the child packages with the dynamic connection string variable.

Saturday, September 26, 2015

How to use Merge/Merge Join Transformation Without SORT Tranformation.

Merge/Merge Join transformation need to be preceded by a Sort Transformation.However it is a good SSIS Practise to sort the data at the source.Sort Transformations can slow down the ETL process when dealing with large volumes of data.

Assuming that the data source is an OLE DB Source a simple ORDER BY clause in the SQL statement of the Source will sort the data at the source itself.

In order to bypass the SORT Transformation 2 properties need to be configured on the source
IsSorted
SortKeyPosition

The above mentioned 2 properties will be configured through the advanced source editor 




Once the IsSorted property is set to TRUE Configure the SortKeyPosition.To configure the sortkeyposition go to the output columns

Select the column on which the data is sorted in the source and set the SORTKeyposition to 1


This configuration will tell the data flow engine that the data has already been sorted by the emp_id in the source.






How to use a Cache Transform with Excel/CSV Source File

In order to compare the data from 2 excel files a lookup transformation is used.Add a data flow task with excel source.



Configure the Cache Transform for an excel source.



Set the index position of the source column which would be looked up  to 1.

Configure the column mappings of the source excel file and the cache


The Cache Connection Manager is configured to be used in the lookup transformation.Same procedure would be followed for a csv source file.

Tuesday, September 22, 2015

Using OLE DB Command to insert new records & update Existing Records in SQL SERVER Database Table

The OLE DB Command transformation runs an SQL statement for each row in a data flow. It is used to execute an SQL statement that inserts, updates, or deletes rows in a database table.

Package Requirement:For a given source file if the record exists in the database table,update for the new values.If not then insert the new records

Source File:
Lookup/Destination Table:
The package should insert records 4,5 and update record 3 for the new emp_location

Package Design:

Configuring the text file source..


 Data Conversion Transformation converts the source datatype to the column data type in lookup reference table




Lookup Transformation:





For OLE DB Command for matching rows(update)



Map the parameters to the fields in the source file




For OLE DB Command for non matching rows(insert)


 




Tuesday, September 15, 2015

Archival of Data Files using Control Tasks

In a real time scenario there can be multiple data files which are used to load data in staging tables and then need to be archived.Archival of data files can be done by using 2 control flow tasks
For-each-Loop
File System Task.

Source Files:
Package:First let's create a user variable file name .Now Add a For-Each-Loop Container Control Task to your package design.

Next Step is configuration of for each loop container.Since we will loop through files select the for each file enumerator
Select the source folder,file type.Since we only require the file name hence select name only.If the source directory consists of subfolders that have files that need to be archived select travserse subfolders.
In the variable mapping select the user variable file name with an index 0.With each occurence of file in the loop the name of the file will be stored in the user variable.

 After configuring the for each loop containers let's add a File System Task
Configuration of File System Task:
Create a File Connection Manager for Destination Directory

If delayvalidation property of the file system task is set to false the task shows an error "variable user::filename is used as a source or destination and is empty."It recieves this error as the variable filename gets its value only on the execution of for each loop.

Hence set the delayvalidation property to true.
Before execution of package
Source Directory:
Destination Directory:





 


Monday, August 24, 2015

How do you eliminate special characters from a text source data file

Source File: Comma Delimited CSV file

The file has special characters which need to be eliminated before loading in the database or any further transformations.This can be done by the TEXT QUALIFIER feature of text connection Manager.






Change TEXT QUALIFIER TO #

The data is cleaned of special character #