Martin's Blog

This blog post is part of the Building a framework for orchestration in Azure Data Factory series.

Controllers are pipelines that initiate the execution of a single process or task in a specific order and with constraints. Whereas everything else in this framework is pretty automated, this part is entirely manual.

Why? Well, when I started thinking about the design of this framework I knew I needed something at the “highest level” that would execute an entire daily ETL process, or a modified ETL process that only loads specific data during the day. I wanted to maximize the flexibility of the framework, and that either meant adding another level to the metadata structure or creating this layer of pipelines that sit at the top. I opted for the second, because I did not feel it was worth the complexity of adding another layer into the metadata structure. That being said, it doesn’t mean it cannot or shouldn’t be done…it was a personal choice I made to keep things as simple as I could.

A typical Controller pipeline may look something like the image below, where each Execute Pipeline task initiates the main orchestrator (or main entry point) with different parameters. The benefit of using default values for items such as the Subscription ID and Resource Group should also be more apparent now, because I don’t have to provide those values every time. I can simply override the Process, Task & Environment parameters as required, and the orchestration layer will take care of the rest.

An additional benefit you get from this layer is visibility in the logs. Because it is all wrapped up in a pipeline, seeing how long an entire ETL process takes to run is as simple as isolating that item in the logs, as opposed to having to identify specific process executions and aggregating the results.

And that’s it! There’s really nothing more to say about this layer and no template to provide, although you could probably create an ARM template for this if you’d like. We’ll wrap up the series with a recap next week…

Reminder: Data Factory templates

As part of this blog series, I am publishing the templates of my framework in my public GitHub repo. It is free to use by anyone who wants to, either as a starting point for their own framework or as a complete solution to play around with.

I recommend that you attempt to deploy these in a test environment first, and please read the documentation before doing so. It contains important information you will need for a successful deployment.

The documentation for the orchestrator pipelines: Readme – Orchestrators

The documentation for the worker pipelines: Readme – Workers

The ARM templates: Data Factory Templates