Skip to content

bug: ai-rate-limiting is applied globally instead of per consumer #12896

@jman0815

Description

@jman0815

Current Behavior

When I add a consumer (e.g. georg) with an ai-rate-limitin configuration as shown in the example, and then create a second consumer (e.g. martin) with the same plugin configuration but a different key-auth API key, both consumers end up sharing the same rate limit for an instance.
Once georg reaches the configured token limit, request from martin are also rejected with 429 "Configured rate limit reached", even though martin has his own consumer entry and API key.
The rate limitin appears to be applied globally per model, rather than per consumer, as described in the documentation.

{
  "username":"georg",
  "plugins":{
    "key-auth":{
      "key":"Bearer "
    },
    "ai-rate-limiting":{
      "instances":[
        {
          "name":"gpt-oss-120b",
          "limit_strategy":"prompt_tokens",
          "time_window":60,
          "limit":400
        },
        {
          "name":"bge-m3",
          "limit_strategy":"prompt_tokens",
          "time_window":60,
          "limit":10000
        },
        {
          "name":"gpt-oss-120b",
          "limit_strategy":"completion_tokens",
          "time_window":60,
          "limit":400
        }
      ],
      "rejected_code":429,
      "rejected_msg":"Configured rate limit reached",
      "show_limit_quota_header":true
    }
  }
}

Expected Behavior

Each consumer should have an independent rate limit quota.
When multiple consumers (e.g. georg and marting) are configured with the same ai-rate-limiting plugin settings but different key-auth API keys, the token limit should be enforced per consumer, not shared globally.
If georg reaches his configured token limit, only request from georg should be rejected with 429 ..., while martin should still be able to make requests within his own quota.

Error Logs

No response

Steps to Reproduce

  • Create a Consumer named georg with key-auth enabled and configure the ai-rate-limiting plugin with a token limit (e.g. 400 tokens per 60 seconds for an instance).
  • Create a second Consumer named martin with the same ai-rate-limiting configuration, but with a different key-auth API key.
  • Send requests using georg’s API key until the configured token limit is reached and requests start returning: "429 Configured rate limit reached"
  • Immediately send requests using martin’s API key.
  • Observe that martin’s requests are also rejected with: "429 Configured rate limit reached"

Environment

  • APISIX version (run apisix version):
    3.14.1

  • Operating system (run uname -a):
    Linux (Kubernetes container, official APISIX Docker image)

  • OpenResty / Nginx version (run openresty -V or nginx -V):
    OpenResty (bundled with APISIX Docker image)

  • etcd version, if relevant (run curl http://127.0.0.1:9090/v1/server_info):
    v3.6.0 (self-deployed in Kubernetes)

  • APISIX Dashboard version, if relevant:
    Not used

  • Plugin runner version, for issues related to plugin runners:
    Not used (using a serverless-post-function)

  • LuaRocks version, for installation issues (run luarocks --version):
    Not applicable (using official Docker image)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    ✅ Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions