Skip to content

Creating a pipeline that will automatically create View of data in Synapse, whenever data arrives in ADLS Gen2.

Notifications You must be signed in to change notification settings

ayush9892/SynapseSQLPool-DynamicView

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SynapseSQLPool-DynamicView

Purpose

The primary purpose of this solution is to make the work done in cost effective manner. The goal can be achieved simply by using Stored Procedure activity in pipeline, but Stored Procedure activity is not supported in Synapse Serverless SQL Pool. So, if we use Dedicated SQL Pool, then it will be a costly approach.

Scope and Goals

The scope is to create views of files that are stored in ADLS Gen2. But catch here is, the views should be created in such a way that the date in file should be dynamic. The goal is to create an solution that can automatically create view in Synapse Serverless SQL Pool, whenever new files arrives in ADLS Gen2.

Used Technologies

  • Azure Synapse

Steps

A. Get the latest added file in folder

Inside the Data Lake, I have a folder in a container that basically contains the files pushed by external source every day. However, I wanted to process only the latest added file in that folder. For this:-

  1. Create a Dataset pointing to that folder. image

  2. Create another Dataset that will going to point to the files in that folder. image

  3. Create two variables in pipeline: image

  4. Use Get Metadata activity, to get the metadata of that folder. image

  5. Then, use ForEach activity to iterate over all files in that folder. image

  6. In that ForEach activity, again add Get Metadata activity for files. image

  7. Then add If Condition activity to check the latest file. I have used this dynamic expression because, my filename is like that <filename_yyyy-mm-dd> and I checking the latest file based on filename. image

  8. Then add Set Variable activity in True activities of If Condition. To set the latest filename and maxtime variable. image image

B. Create the View in Azure Synapse Serverless SQL Pool

  1. Create a Linked Service to Azure Synapse Analytics.

    • But, there’s one problem however: it expects Dedicated SQL pools and doesn’t display the databases of Synapse Serverless SQL Pool in the list.
    • As a work around, you can type your Workspace SQL endpoint manually.
  2. Then in pipeline add Script activity to execute the CREATE VIEW command in Synapse Serverless SQL Pool.

    • Enter this code in the Script:
    DECLARE @sql NVARCHAR(MAX);
    
    SET @sql = N'CREATE VIEW ' + QUOTENAME(@viewName) + N' AS 
    SELECT * FROM OPENROWSET(
    BULK ' + QUOTENAME(@filename, '''') + N',
    DATA_SOURCE = ''raw'',
    FORMAT = ''CSV'', 
    FIELDTERMINATOR ='','', 
    ROWTERMINATOR = ''\n'',
    PARSER_VERSION = ''2.0''
    ) AS [r]';
    
    EXEC sp_executesql @sql; 

    NOTE: - Some points to be noted in this code is, SQL Server does not support the use of variables or parameters for the view name directly in a CREATE VIEW statement like commands, this is because these names must be a constant. However, it can be achieve by using dynamic SQL (construct the SQL command as a string and then execute it).

    • Enter this Dynamic Expression in the viewName parameter:
    @substring(variables('filename'), 0, indexOf(variables('filename'), '.'))
    
    • Enter this Dynamic Expression in the fileName parameter:
    @concat('dynamic_view_files/', variables('filename'))
    

    image

About

Creating a pipeline that will automatically create View of data in Synapse, whenever data arrives in ADLS Gen2.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published