Site icon Martin's Blog

Building a framework for orchestration in Azure Data Factory: Recap

This blog post is part of the Building a framework for orchestration in Azure Data Factory series.





We’re wrapping up this series with a short recap of the most important bits and pieces…

Frameworks are extremely useful when they are thoughtfully designed and implemented. I have seen both sides of the coin, but what I probably see the most of is a lack of any sort of framework. What I typically see are some naming conventions and coding standards, but many companies miss the opportunity to take it one step further and reduce the inefficiencies of repetitive tasks. There’s a ton of repetition in ETL processes, and in my opinion that gives us a really good opportunity to streamline the way in which we are doing things with a well designed framework.





If I have to highlight the most valuable things I’ve learned from building this framework, it would be the following:





Where to from here?

Once you’ve successfully built a metadata-driven framework, the most tedious task would be to maintain the metadata itself. Let’s face it, storing all the source queries and column mappings in a database is super useful, but updating them when you need to add a new attribute for instance will be very painful…and this is where some big-picture thinking helps.

Think about the ways in which you can address that challenge. I have, and have developed a process whereby I maintain the metadata in an Excel workbook and use PowerShell to generate source queries and column mappings. I take that even one step further and generate statements to recreate staging tables, as well as generate & deploy the ARM templates when I’m ready with some changes or have a new environment. Say what you’d like about Excel, but using it in this fashion to maintain the metadata and document your system at the same time is incredibly useful, and much less frustrating than formatting JSON code that’s stored in a database table.

The options are plenty, and I hope that this series has given you some ideas on how you could go about creating your own framework…or a head-start towards that goal at minimum.





<side note> I am thinking about doing some training based on this series, and I’d love to hear your feedback. Please send me a DM on social media or leave a comment on this blog post if that is something that would interest you </side note>





Reminder: Data Factory templates

As part of this blog series, I have published the templates of my framework in my public GitHub repo. It is free to use by anyone who wants to, either as a starting point for their own framework or as a complete solution to play around with.

I recommend that you attempt to deploy these in a test environment first, and please read the documentation before doing so. It contains important information you will need for a successful deployment.

The documentation for the orchestrator pipelines: Readme – Orchestrators

The documentation for the worker pipelines: Readme – Workers

The ARM templates: Data Factory Templates

Exit mobile version