bug: ai-rate-limiting is applied globally instead of per consumer

Current Behavior

When I add a consumer (e.g. georg) with an ai-rate-limitin configuration as shown in the example, and then create a second consumer (e.g. martin) with the same plugin configuration but a different key-auth API key, both consumers end up sharing the same rate limit for an instance.
Once georg reaches the configured token limit, request from martin are also rejected with 429 "Configured rate limit reached", even though martin has his own consumer entry and API key.
The rate limitin appears to be applied globally per model, rather than per consumer, as described in the documentation.

{
  "username":"georg",
  "plugins":{
    "key-auth":{
      "key":"Bearer "
    },
    "ai-rate-limiting":{
      "instances":[
        {
          "name":"gpt-oss-120b",
          "limit_strategy":"prompt_tokens",
          "time_window":60,
          "limit":400
        },
        {
          "name":"bge-m3",
          "limit_strategy":"prompt_tokens",
          "time_window":60,
          "limit":10000
        },
        {
          "name":"gpt-oss-120b",
          "limit_strategy":"completion_tokens",
          "time_window":60,
          "limit":400
        }
      ],
      "rejected_code":429,
      "rejected_msg":"Configured rate limit reached",
      "show_limit_quota_header":true
    }
  }
}

Expected Behavior

Each consumer should have an independent rate limit quota.
When multiple consumers (e.g. georg and marting) are configured with the same ai-rate-limiting plugin settings but different key-auth API keys, the token limit should be enforced per consumer, not shared globally.
If georg reaches his configured token limit, only request from georg should be rejected with 429 ..., while martin should still be able to make requests within his own quota.

Error Logs

No response

Steps to Reproduce

Create a Consumer named georg with key-auth enabled and configure the ai-rate-limiting plugin with a token limit (e.g. 400 tokens per 60 seconds for an instance).
Create a second Consumer named martin with the same ai-rate-limiting configuration, but with a different key-auth API key.
Send requests using georg’s API key until the configured token limit is reached and requests start returning: "429 Configured rate limit reached"
Immediately send requests using martin’s API key.
Observe that martin’s requests are also rejected with: "429 Configured rate limit reached"

Environment

APISIX version (run apisix version):
3.14.1
Operating system (run uname -a):
Linux (Kubernetes container, official APISIX Docker image)
OpenResty / Nginx version (run openresty -V or nginx -V):
OpenResty (bundled with APISIX Docker image)
etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):
v3.6.0 (self-deployed in Kubernetes)
APISIX Dashboard version, if relevant:
Not used
Plugin runner version, for issues related to plugin runners:
Not used (using a serverless-post-function)
LuaRocks version, for installation issues (run luarocks --version):
Not applicable (using official Docker image)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: ai-rate-limiting is applied globally instead of per consumer #12896

Current Behavior

Expected Behavior

Error Logs

Steps to Reproduce

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug: ai-rate-limiting is applied globally instead of per consumer #12896

Description

Current Behavior

Expected Behavior

Error Logs

Steps to Reproduce

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions