TechAnek

If you’re working with Azure AI’s large language models (LLMs), you’ve probably wondered how to keep track of how many tokens you’re actually using—and more importantly, how to manage and monitor that usage smartly. Unfortunately, Azure doesn’t give us a built-in, easy way to see detailed token usage out of the box. That’s where Azure API Management (APIM) comes in. In this blog, we will walk through how to set up APIM to track token usage for Azure AI services. Whether you’re trying to keep costs under control or just want better visibility into what your apps are doing under the hood, this approach can help you get the insights you need—without making major changes to your existing setup.

Let’s Dive In ! 

Prerequisites

Before we dive in, make sure you have the following in place:

  • An active Azure subscription – You’ll need this to access Azure AI services and deploy resources.
  • Azure API Management (APIM) instance – This will be used to route and monitor requests to your Azure AI endpoint.
  • Access to Azure Open AI – Make Sure you have Azure Open AI service Created and Deployed one or more deployed.

Creating an API for Azure OpenAI in APIM

To measure and monitor token usage for Azure Open AI, we need to route requests through Azure API Management (APIM). APIM acts as a gateway, allowing us to intercept and log requests, analyze headers, and even apply policies—like logging the token count returned by Azure Open AI. But to do that, we first need to wrap our Azure Open AI endpoint inside a custom API within APIM. This gives us full control over the request/response and enables advanced observability without modifying the application.

Steps to Create the API in APIM

  • Go to the Azure Portal.
  • Search for API Management services and select your APIM instance.
  • In the left-hand menu, click on APIs.
  • Select Azure Open AI Service. You can see the example in the below image.
  • Azure OpenAI instance:
    • Select your provisioned Azure Open AI resource from the dropdown.
    • If nothing appears, make sure the resource exists in your subscription and region.
  • Azure OpenAI API version:
    • Choose the API version you want to use. For most use cases, the latest version (e.g., 2024-02-01) is recommended unless your app depends on older behavior.
  • Display name:
    • Enter a name for this API as it will appear in APIM. Example: “azure-openai-llm 
  • Name:
    • This is the system name (used in URLs and identifiers).
    • Use lowercase and dashes/underscores. Example: “Azure Open-AI LLM API
  • Description:
    • Optionally add a short description.
    • Example: Proxy for Azure OpenAI endpoints to enable monitoring and analytics via APIM.
  • Base URL:
    • This is automatically populated with your APIM gateway base URL.
    • You typically don’t need to change it.
  • Products:
    • You can associate this API with a product (which controls who can access it).
    • For initial testing, you can skip this or choose an existing public product (like starter if available). You can always add this later.
  • Skip the Policies section for now.
  • Click Next: Policies or Review + create once you’re done.
  • You can see the Example in the Below Image.

Test the API from the Portal

  • Go to the API we have created using the above steps.
  • In API on the Top find the Test Option.
  • Then Select the Operation with name of “Creates a completion for the chat message“.
  • After that it shows the Parameter as below:
    • deployment-id: here you have to pass the Deployment name of your model (e.g. Gpt-4o-Mini). 
    • Note: do not have to pass the model name must pass the Deployment name.
    • Then pass the API-version of your Model.
  • Now we have to pass the payload in the Request Body. Example Request body has given Below or you can check the below images:
				
					{
    "temperature": 1,
    "top_p": 1,
    "stream": false,
    "stop": null,
    "max_tokens": 4096,
    "presence_penalty": 0,
    "frequency_penalty": 0,
    "logit_bias": {},
    "messages": [
        {
            "role": "user",
            "content": "Tell me about france."
        }
    ]
}
				
			
  • Now click on box with a option of “Bypass CORS proxy“.
  • After that click on the Send Button.
  • After click on send button you get the response as shown in the below image: 

If you’ve followed the above steps and your API is working fine, great! However, if you’re facing any issues, feel free to leave a comment below on this blog.

Update the API Policy to Add the Token Usage Info in the Logs

Since Azure OpenAI Service does not directly provide token usage per request, we need to modify the policy of the API we have created and add specific tags to track token usage. To update the Policy follow the below steps:

  • Go to the APIM instance.
  • Select the API we have created.
  • Then choose the Operation named “Creates a completion for the chat message“.
  •  Find Inbound Processing section in Design tab.
  • Then Click On the Policy.
  • See the Below image for more.
  • Add the Following lines or tags in the Policy or you can replace the whole policy with the below policy: 
				
					<policies>
    <inbound>
        <base />
        <llm-emit-token-metric>
            <dimension name="API ID" />
            <dimension name="Client IP address" value="@(context.Request.IpAddress)" />
            <dimension name="Gateway ID" />
            <dimension name="Location" />
            <dimension name="Operation ID" />
            <dimension name="Product ID" />
            <dimension name="Subscription ID" />
            <dimension name="User ID" />
        </llm-emit-token-metric>
        <llm-token-limit remaining-quota-tokens-header-name="remaining-tokens" remaining-tokens-header-name="remaining-tokens" token-quota="2000000" token-quota-period="Hourly" counter-key="@(context.Subscription.Id)" estimate-prompt-tokens="true" tokens-consumed-header-name="consumed-tokens" />
    </inbound>
    <backend>
        <base />
    </backend>
    <outbound>
        <base />
    </outbound>
    <on-error>
        <base />
    </on-error>
</policies>
				
			

Policy Explanation

This policy configuration plays a important role in enabling token observability for your Azure OpenAI API in APIM. Here’s what it does:

  • <llm-emit-token-metric>: This section emits custom token metrics to Azure Monitor, enriched with useful dimensions like API ID, Client IP, Subscription ID, and more. These help break down token usage by client, product, or even geographic location.
  • <llm-token-limit>: This defines a token quota policy. It tracks and limits token consumption per subscription, based on an hourly quota (token-quota="2000000"). It also estimates prompt tokens in real time and returns headers (consumed-tokens, remaining-tokens) so you can inspect usage per request.

Together, these policies allow you to log and control how tokens are being consumed, making it easy to build usage reports, enforce limits, and optimize cost.

Track the Token Usage in Logs

Now we need to explore the logs to find the token usage per request. But before that, we must enable diagnostic settings in the APIM instance to send logs to Log Analytics. 

Enable the Diagnostic Setting

To Enable the Diagnostic Settings for Azure APIM follow the below Steps: 

  • Go to the APIM Instance.
  • Then Find the Diagnostic Setting under the Monitor Section or Search in the search bar for the “Diagnostic Settings“.
  • Click on the Diagnostic settings.
  • Give the Appropriate name for the Diagnostic Setting.
  • Then select all the Audit and Metrics Logs.
  • In destination Detail select the “Send to Log Analytics Workspace“.
    • Note: if you do not have any Log analytics Workspace then create new for this.
  • Then Select your workspace.
  • After that save the Setting by click on the Save button.

Explore the Logs of the Azure Open AI API

We have applied all the settings to track the token used per request. Now we have to track the logs from the Logs table. To do that we have to run the KQL query to filter the Azure Open AI API logs from the logs table. For this Follow the below steps:

  • Go to the Azure APIM Instance.
  • First We have to Run the Test on the API we have created to Generate the Logs.
  • Click here for the steps of how to test API.
  • After Successful test, Find the Logs under the Monitoring Section.
  • On the Right side find the Simple Mode and KQL mode, from them select the KQL mode.
  • Now run the below query to get the Logs of the Azure Open AI API.
				
					ApiManagementGatewayLogs
| where OperationId == "ChatCompletions_Create"
				
			
  • After Run the above you can find the below Output in the Logs.
  • In Logs you can find the ResponseHeader section Under that you can see the Consumed Token and Remaining Token. 
  • From the Consumed Token you can count the Total token usage.

Visualize The Data In Dashboard

Now that we can track the consumed tokens per request, we can create a dashboard that displays token usage per model. If different teams are using separate subscription keys, the dashboard can also show token usage per model for each subscription, and also track the cost of each model as illustrated in the image below:

Need Help Tracking Token Usage?

If you’re looking to monitor token consumption across models or subscriptions, we can help you build a custom dashboard—just like the one shown below. This dashboard provides a clear breakdown of token usage:

  • Per Model: Understand how each OpenAI model is being utilized.
  • Per Subscription: Ideal for organizations where different teams use separate subscription keys.
  • Visual Insights: Leverage charts and metrics to identify usage trends and optimize costs.

📩 Contact us to get started with your own token usage dashboard or for assistance in implementing token tracking in your environment. To contact us click on the below button or leave the comment in the comment box.

Leave a Reply

Your email address will not be published. Required fields are marked *