If you’re working with Azure AI’s large language models (LLMs), you’ve probably wondered how to keep track of how many tokens you’re actually using—and more importantly, how to manage and monitor that usage smartly. Unfortunately, Azure doesn’t give us a built-in, easy way to see detailed token usage out of the box. That’s where Azure API Management (APIM) comes in. In this blog, we will walk through how to set up APIM to track token usage for Azure AI services. Whether you’re trying to keep costs under control or just want better visibility into what your apps are doing under the hood, this approach can help you get the insights you need—without making major changes to your existing setup.
Let’s Dive In !
Before we dive in, make sure you have the following in place:
To measure and monitor token usage for Azure Open AI, we need to route requests through Azure API Management (APIM). APIM acts as a gateway, allowing us to intercept and log requests, analyze headers, and even apply policies—like logging the token count returned by Azure Open AI. But to do that, we first need to wrap our Azure Open AI endpoint inside a custom API within APIM. This gives us full control over the request/response and enables advanced observability without modifying the application.
2024-02-01
) is recommended unless your app depends on older behavior.starter
if available). You can always add this later.
{
"temperature": 1,
"top_p": 1,
"stream": false,
"stop": null,
"max_tokens": 4096,
"presence_penalty": 0,
"frequency_penalty": 0,
"logit_bias": {},
"messages": [
{
"role": "user",
"content": "Tell me about france."
}
]
}
If you’ve followed the above steps and your API is working fine, great! However, if you’re facing any issues, feel free to leave a comment below on this blog.
Since Azure OpenAI Service does not directly provide token usage per request, we need to modify the policy of the API we have created and add specific tags to track token usage. To update the Policy follow the below steps:
This policy configuration plays a important role in enabling token observability for your Azure OpenAI API in APIM. Here’s what it does:
<llm-emit-token-metric>
: This section emits custom token metrics to Azure Monitor, enriched with useful dimensions like API ID, Client IP, Subscription ID, and more. These help break down token usage by client, product, or even geographic location.<llm-token-limit>
: This defines a token quota policy. It tracks and limits token consumption per subscription, based on an hourly quota (token-quota="2000000"
). It also estimates prompt tokens in real time and returns headers (consumed-tokens
, remaining-tokens
) so you can inspect usage per request.Together, these policies allow you to log and control how tokens are being consumed, making it easy to build usage reports, enforce limits, and optimize cost.
Now we need to explore the logs to find the token usage per request. But before that, we must enable diagnostic settings in the APIM instance to send logs to Log Analytics.
To Enable the Diagnostic Settings for Azure APIM follow the below Steps:
We have applied all the settings to track the token used per request. Now we have to track the logs from the Logs table. To do that we have to run the KQL query to filter the Azure Open AI API logs from the logs table. For this Follow the below steps:
ApiManagementGatewayLogs
| where OperationId == "ChatCompletions_Create"
Now that we can track the consumed tokens per request, we can create a dashboard that displays token usage per model. If different teams are using separate subscription keys, the dashboard can also show token usage per model for each subscription, and also track the cost of each model as illustrated in the image below:
If you’re looking to monitor token consumption across models or subscriptions, we can help you build a custom dashboard—just like the one shown below. This dashboard provides a clear breakdown of token usage:
📩 Contact us to get started with your own token usage dashboard or for assistance in implementing token tracking in your environment. To contact us click on the below button or leave the comment in the comment box.
We use cookies to enhance your browsing experience, analyze traffic, and serve personalized marketing content. You can accept all cookies or manage your preferences.