Site icon Martin's Blog

Working with OAuth 2.0 APIs in Azure Data Factory: Refreshing tokens

This blog post is part of a “Working with OAuth 2.0 APIs in Azure Data Factory” series, and you can find a list of the other posts here.





As part of the authorization code flow you’ll receive two very important tokens. The access token is what you will use for authentication when sending API requests, but access tokens are only valid for a certain amount of time. How long the access token is valid for usually depends on vendor, and it could be anything from a few minutes to a few hours.

Once the access token has expired, you’ll typically use the refresh token along with some other identifiers (which is also different depending on the API and vendor) to get a new access and refresh token. The need to refresh tokens periodically means that you have to build that into your ETL process somehow, and there’s two ways in which you can approach this:

  1. Assume that a request failure is due to an expired token, and build some error logic into your process to then refresh the tokens.
  2. Assume that your token has expired every time the process runs, and refresh the tokens as a first step.




The first option is not completely fail-safe, because API requests could fail for other reasons too and the error messages are not always very helpful. If you were to build a loop into your error logic, your process may end up in an endless loop. Option two is not perfect either and will cause problems if you try to execute it concurrently as part of a parallel process, or if you have multiple long running API requests that extend beyond the validity of the tokens.

I usually implement token refresh as a first step (option 2) and before each series of API requests (i.e. ETL process). It seems like overkill, especially if you’re sending a bunch of requests…but I have found it to be the most trustworthy option and API servers don’t mind refreshing tokens often.

If you have long-running processes, you may even have to refresh the tokens before each request, and enforce asynchronous execution to avoid any concurrency pitfalls.

Here’s an example of my token refresh pipeline in ADF:





Saving tokens

Azure KeyVault is a good place to securely store your tokens, and I use it to store the client ID and secret too. Once you have the necessary items in your KeyVault, give the Azure Data Factory Managed Identity the necessary access to it, and follow this reference to extract specific items from your KeyVault.

Important: You are extracting sensitive information and sending it to a subsequent step in your process, and these values are sent to the logs in clear text unless you secure it. Make sure that you secure both the input and output of the steps that send/receive the sensitive information (image below).





The most important step is sending the request to refresh the tokens, and this is where our experimentation in Postman will pay dividends. From that blog post, we needed to do the following in order to get a new token:

The details in Postman will help us troubleshoot any issues in Data Factory, as those will most likely be due to an incorrectly formatted request. Every little detail is important here, and even though not apparent at first you also have to set the content type in the header (especially if you encode the values):





The expression to formulate the authorization header is the following, and as you can see we use the resulting output from the previous steps to concatenate the client ID and secret (separated by a colon), and encode the entire string with the base64() function:

Basic @{base64(concat(activity('Get Xero Client ID').output.value, ':', activity('Get Xero Client Secret').output.value))}

The expression for the request body looks like this:

grant_type=refresh_token&refresh_token=@{activity('Get Xero Refresh Token').output.value}

The last two steps in the process will replace the tokens in Azure KeyVault, and we are now ready to extract some data from the API.

Exit mobile version