One of the most appealing features in Azure Data Factory (ADF) is implicit mapping. The benefit of this is that I can create one dataset and reuse it multiple times and without explicitly mapping the source & destination columns. Compare that to Integration Services (SSIS) where columns have to be explicitly mapped and where metadata is validated before execution (forcing you to create granular-level packages), and you can understand why it is so appealing to anyone coming from that world…the SSIS world I mean.

With implicit mapping you don’t specify a schema in the dataset or mapping in the copy activity, and as long as your destination column names are the same as the source (and in the same order), it works beautifully and is a welcome time-saver especially when you’re staging data.

It’s all about the source

The catch-22 is that implicit mapping only works if the data types of the source can be successfully interpreted and mapped to the destination. As you can imagine, the ability of Azure Data Factory to map data types from one SQL Server database to another SQL Server database works exceptionally well, while your mileage will vary with less structured sources. ADF actually converts native data types in the source to interim data types, before converting to the destination’s native data types. This double translation works well enough for most relational database sources, but not so well with others even if the documentation suggests otherwise.

To illustrate this point, let’s extract some data out of Microsoft Dynamics 365 CRM and see what the difference would be between implicit and explicit mapping. Create two similar staging tables and define the source query (FetchXml) in the following way:

-- Staging Table 1
create table [Staging_CRM].[businessunit_Test1](
	[businessunitid] uniqueidentifier null,
	[name] varchar(255) null
); 

-- Staging Table 2
create table [Staging_CRM].[businessunit_Test2](
	[businessunitid] varchar(50) null,
	[name] varchar(255) null
); 

-- Source Query (FetchXml)
<fetch mapping="logical">
	<entity name="businessunit">
		<attribute name="businessunitid"/>
		<attribute name="name"/>
	</entity>
</fetch>




The businessunitid attribute shown above is a guid, while the name attribute is alpha-numeric. An Azure Data Factory copy activity into businessunit_Test1 works as expected when using implicit mapping, but try to do the same into businessunit_Test2 where the data type of businessunitid is slightly different and you get the following error:





Change the copy activity from an implicit to an explicit mapping while still using the businessunit_Test2 staging table, and everything suddenly works:





What’s happening here?!

It’s difficult to say exactly what is happening, especially considering the fact that the documentation indicates that a GUID interim data type should convert to a String without any issues. My guess is that the interim data type (inferred by ADF in this case) is not as compatible as it would seem, and the only way around it is to either change the data type of the staging table attribute or implement explicit mapping. Depending on your scenario, neither of these may be ideal solutions for you.

Dynamic mapping to the rescue

There is an option that gives us the best of both worlds, giving us more control over the data type mapping while still maintaining the flexibility of implicit mapping. But before we dive into the solution, let’s have a look at what happens underneath the covers when we explicitly map columns. If we look at the JSON source code of the pipeline with the explicit mapping shown earlier, this is what you will see:





The typeProperties section of the source code contains the translation (or mapping) between the source and target (sink), and the interface allows us to parameterize this and add it as dynamic content:





The most important thing to remember is that the ColumnMapping parameter needs to be defined as type Object even though you’re just passing a string as value. In our example, the parameter value will include everything except for the “translator :” part and will look like the value below. I prefer to store the mapping in a metadata table somewhere, and remove all special characters to conserve some space:

{"type": "TabularTranslator","mappings": [{"source": {"name": "businessunitid","type": "Guid"},"sink": {"name": "businessunitid","type": "String"}}, {"source": {"name": "name","type": "String"},"sink": {"name": "name","type": "String"}}]}




My advice when implementing dynamic mapping is to create an explicit mapping which is representative of all the expected data types as a first step. After making sure that your data type mappings work, adding this as a parameter should be fairly straight forward. If you’re dealing with JSON source data (think APIs) that contain nested levels, this first step becomes even more valuable as it will show you how to traverse the JSON structure successfully.

12 thoughts on “Dynamic column mapping in Azure Data Factory

  1. Anonymous says:

    how to the same in databricks

  2. Joyce says:

    Does this work when I want to load older parquet files that do not have the same columns anymore, into 1 staging table? 4 columns have recently been added to the source table, but the older parquet files do not have these. I want to load the full data lake at night, and right now this returns errors because column mapping is not the same for all files.

    1. I am not familiar enough with Parquet sources to definitively answer your question. I know it works for JSON sources (returning null for attributes that don’t exist), and given the fact that Parquet makes provision for optional attributes my initial gut feeling would be yes…but you’d have to test it to make sure.

  3. wajih says:

    very helpful. Thanks

  4. Anonymous says:

    I have come across very strange behaviour. I am using FetchXml and specifying selective columns and using dynamic mapping for source column and target column. But, in result set, I am getting additional fields. These additional fields are system generated fields (like Dynamics CRM creates additional fields in background when you create lookup field). Because of these additional fields pipeline is failing with error “The given ColumnMapping does not match up with any column in the source or destination” as these additional fields are not present in target table (target table is MS-SQL table). Strange thing about this is, these additional columns are appearing randomly (probably, when there is some update or change in their value). If they are not in result set, pipeline works perfectly fine with provided dynamic column mapping.

    Your help on this will be highly appreciated. This is causing lot of trouble to our clients.

    1. I have seen similar behavior with FetchXML…it will return some additional fields especially when option sets are used. You will encounter these errors if you use implicit column mapping, and the most effective way to deal with this scenario is to use dynamic column mapping where you pass the mapping as parameter/variable into the copy task.

  5. Anonymous says:

    How do you actually set the parameter? Once you create it, there isn’t any further dynamic content shown to paste in the script.

    1. I store the JSON in a metadata table, which then gets passed in as parameter during runtime. I’m actually busy with a series on my ETL framework, which will cover that part. Follow along if you like: https://martinschoombee.com/2024/04/02/building-a-framework-for-orchestration-in-azure-data-factory-a-series/

  6. Anonymous says:

    im having issue with the max limit of 8512 characters within the expression builder in mapping section. Im extacting xml files with mupwards of 500 columns and not able to do it for the above reason. Any workaround for this?

    1. None apart from the obvious: Map only the attributes you need, or break it up into smaller chunks, or import the XML as is and use TSQL to parse it.

  7. Anonymous says:

    Hi Martin, as an ADF newbie I’m trying to get a Pipeline working with dynamic column mappings. Did you have any articles or blogs on setting up a simple pipeline of copying data between 2 databases with dynamic column mappings referring to a the ColumnMappings being stored in a Metadata table ? Thks

    1. Yes, I have a series on ADF frameworks that I can recommend: Building a framework for orchestration in Azure Data Factory: A series đŸ™‚

Leave a Reply

Discover more from Martin's Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading