I’ve been using Azure Data Factory for a few years now, and I think it’s a good time to start sharing some thoughts on how you could build an ETL framework using that platform. More specifically a framework for orchestration, the details of which I will get to in the next blog post.





The back-story

Similar to many other analytics professionals in the Microsoft world, my experience started way back in the good old days of Data Transformation Services (DTS). Yes, it was SQL Server 2000 and I am dating myself here :-/

DTS was a “good enough” tool when you were just trying to move data around, but any in-transit transformations required some messy VB Script code that was difficult to maintain. This forced your hand to move any transformations in your ETL process to something else like stored procedures, if you were using SQL Server as back-end database of course. Stored procedures provided a much cleaner & efficient mechanism to write, store and document set-based logic for your ETL processes, making it the preferred choice for most of us…and for some of us even today still.

Fast forward to SQL Server 2005 and we saw the release of Integration Services (SSIS), a tool that provided better options for transformations without the need to write as much custom code. Although these built-in transformation tasks were a great addition, it didn’t solve all of the performance issues related to the row-by-row processing of the .NET engine. To be fair, SSIS did a much better job than its predecessor of processing certain things in memory-resident batches but it still fell short of set-based languages like SQL.

Integration Services is still a popular choice for companies who use SQL Server on-premises, and although I’ve developed my own SSIS framework a few years ago using BIML, I chose to focus my attention on Azure Data Factory (ADF) instead. All things considered, my personal opinion is that ADF will become the de-facto ETL tool for Microsoft’s on-premises customers in future too…but that’s a conversation for a different day and platform.





Why a framework?

Most people will describe a framework as the “building blocks” of something. I see it more as the foundation and outer frame of a structure, if you’d like to use the building of a house as an analogy. There are a few rigid pieces (the foundation) that provide restrictions outside of which you are not prepared to go, and then some less rigid pieces (the frame) which provide more general guidelines that can change.

However you interpret it, at its core a framework provides a boundary within which you construct something, and guidelines for consistency & standards. It is closely related to coding standards, but we shouldn’t confuse one for the other. Both are equally important pieces to the proverbial puzzle though, and they should go hand-in-hand with one another if you want to set yourself up for success.

When I prepare to build an ETL framework, the following characteristics are important to me:

  • Repeatable Functionality – Is it able to do what I need it to do, repeatedly and without recreating the wheel?
  • Efficiency & Performance – Is it performing tasks as efficiently as possible, with options to tweak performance by adjusting resources?
  • Scalability – Can the framework keep up with demand and grow as my environment gets larger, while allowing me to start small and not take forever to get things off the ground and running?
  • Component/Code Reuse – Will the framework save development time in future by reusing components?
  • Simplicity – Complexity is sometimes unavoidable, but I follow the mantra of making things as simple as possible and only as complex as it needs to be.
  • Cost – In the cloud world we live in today, keeping costs down is more of a priority than it was when everything was on-premises. A cost effective process is no longer a luxury, but a necessity if you want maximize your investments.




The “secret sauce” is to find the balance between these items, and this series will focus on the components required to build such a framework in Azure Data Factory. We’ll cover the following areas:





Stay tuned for the next few weeks, and I hope you find the series useful!

One thought on “Building a framework for orchestration in Azure Data Factory: A series

  1. Anonymous says:

    Looking forward to reading the series Martin.

Leave a Reply to AnonymousCancel reply

Discover more from Martin's Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading