LuaJIT-based OpenResty services frequently exhibit a specific failure pattern in production: RSS climbs continuously while Lua GC metrics remain healthy, until the process is OOM killed by Kubernetes.

This is not a code bug. It is a structural limitation of LuaJIT’s default memory allocator — freed objects are reclaimed by the GC, but the underlying physical memory pages are never returned to the operating system. The result is a process that accumulates memory in one direction only.

LuaJIT-plus resolves this at the allocator level by introducing proactive memory reclamation, eliminating RSS inflation without application restarts or code changes.

What Is the LuaJIT “Pseudo Memory Leak”?

The LuaJIT pseudo memory leak occurs when the garbage collector successfully reclaims Lua objects internally, but the allocator retains the underlying physical memory pages rather than releasing them back to the operating system. The result: RSS grows continuously while collectgarbage("count") reports a healthy low value.

Why GC Metrics Look Fine While RSS Keeps Growing

In typical scenarios involving long-lived connections, traffic spikes, or intensive computation, LuaJIT rapidly creates a vast number of short-lived objects (Table, String, Closure). While the Lua GC mechanism effectively reclaims these objects, marking them as available for reuse, the operating system perceives the situation quite differently:

  • Application Layer Perspective (Lua VM): Memory is considered freed and immediately available for reuse. The collectgarbage("count") function returns a healthy low value.
  • System Layer Perspective (OS): The process continues to hold physical memory pages, and its Resident Set Size (RSS) remains high.

The Allocator-OS Communication Gap Explained

The fundamental issue with this decoupling phenomenon is that releasing objects does not equate to returning physical memory to the OS. This leads to significant memory fragmentation within the process. LuaJIT’s default allocator strategy tends to retain these pages for future use rather than immediately releasing them back to the operating system. Consequently, the process becomes a “resource black hole” – a one-way street for memory that only consumes but does not release.

Diagnosing the Problem with Real Production Data

To quantify this behavior, we analyzed a production process with an RSS of 512 MB using lj-resty-memory. The results are unambiguous.

Step 1 — Who Is Consuming the 512MB RSS?

We conducted a snapshot analysis on a process with an RSS of 512MB:

71% of RSS — 363 MB — is held entirely by LuaJIT’s internal allocator. Business logic accounts for the remainder. The leak is in the runtime, not the application.

Step 2 — Inside LuaJIT’s Memory: 94% Is Fragmented Free Pages

Next, we drilled down into this 71% memory region, revealing a startling discovery:

Of the 515MB held by LuaJIT, only 5.9% is actively used by live GC objects. The remaining 94.1% consists of fragmented free pages the allocator retains but never returns to the OS — the direct cause of RSS inflation.

If your service shows similar RSS behavior, this ratio is the number to look for.

What This Means for Your Kubernetes Pod Limits

When the majority of LuaJIT-held memory consists of fragmented free pages, RSS bears little relation to actual working set. Kubernetes memory limits measure RSS, not GC metrics — so pods get OOM killed even though collectgarbage("count") looks fine. Before raising limits or over-provisioning, check whether your RSS-to-GC ratio matches the pattern above.

Why Standard Fixes Don’t Work

Before the intervention of LuaJIT-plus, engineering teams typically employed a series of standard optimization techniques. However, when confronted with issues at the allocator level, these methods often proved insufficient:

Object Pooling and GC Tuning: Necessary but Not Sufficient

Reusing Tables through Object Pooling or manually triggering GC are indeed good programming practices that can reduce GC pressure. Yet, this only addresses the issue of “object reuse” and does not solve the problem of “physical memory reclamation.” This is akin to packing up the garbage in your room (GC), but not taking the garbage bag out of the house (returning it to the OS); the room remains cluttered.

Why Swapping the System Allocator Doesn’t Help

This is a common debugging pitfall. Engineers often attempt to optimize performance by modifying system-level memory management configurations. However, high-performance runtimes, in their quest for maximum efficiency, typically bypass standard system memory management mechanisms and adopt custom-tailored memory allocation strategies. Consequently, any system-level memory optimization is an ineffective measure for such self-managed runtime environments.

Scheduled Restarts: A Band-Aid That Costs You Availability

Setting scheduled tasks or Liveness Probes to force container restarts represents the ultimate operational compromise. While this masks the symptom of RSS growth, it comes at the cost of sacrificing long-connection stability, losing runtime state, and introducing service jitter. This is a “band-aid solution,” not an engineering solution.

The core difficulty we face lies in the lack of visibility and control. For a long time, LuaJIT’s memory allocator has been a black box for developers. We lacked both the tools to observe the degree of fragmentation within the internal memory pool and the mechanisms to actively intervene in memory page reclamation strategies at runtime.

This is precisely the problem LuaJIT-plus aims to solve: by providing in-depth observability and fine-grained control over the memory allocator, it transfers the responsibility of memory management back to the business teams, thereby definitively addressing the architectural risks posed by “pseudo memory leaks.”

How LuaJIT-plus Fixes This at the Allocator Level

From Passive Retention to Proactive Memory Reclamation

LuaJIT-plus changes the allocator’s behavior in one specific way: instead of retaining freed memory pages indefinitely, it actively evaluates fragmentation at runtime and returns idle physical pages to the OS via explicit system calls.

This happens continuously in the background, with no application restarts and no impact on live connections. From the OS perspective, the process memory footprint now tracks actual workload — it rises under load and recedes during idle periods.

Three operational outcomes follow directly:

  • OOM kills caused by RSS inflation are eliminated
  • Memory limits can be set based on actual working set, not worst-case peaks
  • Capacity planning and HPA thresholds become predictable

How the Fragmentation-Aware Signaling Mechanism Works

LuaJIT-plus evaluates memory page fragmentation at runtime rather than blindly accumulating pages. When the system identifies large blocks of physical memory that are held but no longer logically in use, it actively initiates a system call, signaling the operating system that these physical resources can be safely reclaimed.

Before vs. After: The “Breathing” Memory Curve

This fundamental shift in the underlying mechanism brings a profound difference to higher-level applications. It’s not merely the introduction of a tool, but a transformation in the resource governance paradigm:

  • Constructive vs. Destructive: Scheduled restarts forcibly release memory by “killing processes,” which constitutes a destructive reset. In contrast, LuaJIT-plus performs millisecond-level, imperceptible memory reclamation while business operations continue uninterrupted and long-lived connections remain online. This represents surgical precision in resource management, rather than a violent tear-down and rebuild.
  • Separation of Concerns: Application-layer code optimization focuses on “reducing garbage generation,” whereas LuaJIT-plus addresses “how to efficiently manage already idle resources.” This division of labor allows business developers to concentrate solely on the correctness of business logic, free from the heavy burden of low-level memory management.

Our most immediate gain is the transformation of the previously ever-increasing, anxiety-inducing “staircase-like” memory curve into a healthy “breathing curve” that dynamically adjusts with business load.

  • During traffic peaks, memory scales on demand to support business throughput;
  • During troughs, memory quickly recedes to baseline levels, freeing up valuable resources.

Production Impact After Deployment

For large-scale production environments striving for “five nines” (99.999%) availability, the impact of this unpredictable memory behavior goes far beyond a simple restart.

No More OOM Kills Without Over-Provisioning

To mitigate occasional RSS peaks, operations teams are often forced to allocate memory limits for services that far exceed their actual needs. For instance, a gateway service typically requiring only 200MB of memory might be configured with a 2GB resource limit due to uncontrolled RSS growth. Under the cloud-native pay-as-you-go cost model, this 10x resource redundancy directly drives up the infrastructure’s TCO (Total Cost of Ownership).

Predictable Memory for HPA and Capacity Planning

Unpredictable memory behavior undermines capacity planning baselines. When we cannot accurately estimate the upper limit of memory consumption for a single instance, setting the threshold for Horizontal Pod Autoscaling (HPA) becomes a guessing game. This uncertainty significantly limits the system’s elasticity in the face of sudden traffic spikes.

Engineering Time Saved on Phantom Leak Investigations

This problem is often hidden and difficult to reproduce, like a phantom, it drains the energy of senior engineers. Teams spend considerable time troubleshooting code, but often misdiagnose the problem (attempting to fix application-level memory leaks instead of allocator behavior), rendering their efforts fruitless and severely impeding the iteration speed of core business.

This predictability is the cornerstone for building large-scale, high-reliability services. It not only eliminates the risks of OOM and brings inflated TCO costs down to a realistic level, but also liberates senior engineers from endless troubleshooting of “phantom issues.” LuaJIT-plus is more than just a memory optimization tool; it’s a more robust, modern underlying runtime environment, providing a rock-solid infrastructure foundation for your core business.

LuaJIT-plus is an enterprise-grade LuaJIT runtime meticulously developed by our team, drawing on years of experience maintaining large-scale OpenResty services. Beyond addressing the memory fragmentation problem thoroughly analyzed in this article, it incorporates a suite of performance optimization and stability enhancement features, designed to offer robust and reliable foundational support for your critical operations.

If you are facing similar challenges or wish to further enhance system performance and predictability, we invite you to explore and try LuaJIT-plus, and let this professional tool empower your success.

FAQ

Q: Does LuaJIT-plus require application code changes? A: No. LuaJIT-plus is a drop-in runtime enhancement. It adds proactive memory reclamation at the allocator level without application restarts or code changes.

Q: Is this compatible with standard OpenResty deployments? A: Yes. LuaJIT-plus is an enterprise-grade LuaJIT runtime built on years of experience maintaining large-scale OpenResty services, designed as a direct replacement for the standard LuaJIT runtime in OpenResty deployments.

Q: How is LuaJIT-plus different from using jemalloc or tcmalloc? A: High-performance runtimes like LuaJIT typically use custom allocators that bypass standard system memory management. Swapping the system allocator does not reach LuaJIT’s internal allocator, which is where the fragmented free pages accumulate. LuaJIT-plus addresses reclamation inside LuaJIT’s own allocator.

About The Author

Yichun Zhang (Github handle: agentzh), is the original creator of the OpenResty® open-source project and the CEO of OpenResty Inc..

Yichun is one of the earliest advocates and leaders of “open-source technology”. He worked at many internationally renowned tech companies, such as Cloudflare, Yahoo!. He is a pioneer of “edge computing”, “dynamic tracing” and “machine coding”, with over 22 years of programming and 16 years of open source experience. Yichun is well-known in the open-source space as the project leader of OpenResty®, adopted by more than 40 million global website domains.

OpenResty Inc., the enterprise software start-up founded by Yichun in 2017, has customers from some of the biggest companies in the world. Its flagship product, OpenResty XRay, is a non-invasive profiling and troubleshooting tool that significantly enhances and utilizes dynamic tracing technology. And its OpenResty Edge product is a powerful distributed traffic management and private CDN software product.

As an avid open-source contributor, Yichun has contributed more than a million lines of code to numerous open-source projects, including Linux kernel, Nginx, LuaJIT, GDB, SystemTap, LLVM, Perl, etc. He has also authored more than 60 open-source software libraries.