Scaling of PHP Application in Kubernetes Based on FPM Workers Utilization

6 min readSep 27, 2023

Running your application in Kubernetes offers numerous advantages, one of the most significant is the HorizontalPodAutoscaler (HPA). This powerful tool enables automatic horizontal scaling of your application based on metrics. However, out of the box, Kubernetes provides only resource utilization metrics, such as CPU utilization and memory consumption. While these metrics are invaluable for many scenarios, there are situations where they fall short. In this article, we delve into the need for custom metrics and explore how to leverage them to scale PHP applications effectively in Kubernetes.

When Do You Need Custom Metrics?

Custom metrics in Kubernetes become essential in scenarios where resource utilization metrics like CPU and memory are insufficient to make accurate scaling decisions. One such scenario arises when running PHP applications in Kubernetes.

In today’s Kubernetes landscape, the most popular way to run PHP is by using a combination of PHP-FPM and Nginx containers.

To understand why custom metrics are necessary in PHP applications, let’s take a closer look at how PHP-FPM operates within Kubernetes.

PHP-FPM consists of a pool of workers with a single master process, where each worker can process one request at a time. FPM also maintains a queue for incoming requests, typically with 10–20 workers available. As traffic grows, CPU or memory consumption also increases, leading to HPA adding new pods to your deployment.

While resource-based scaling works well in many cases, it may not address a crucial challenge: external requests to other services. Consider the following scenario:

Your PHP application relies on external HTTP calls to services, such as validation or authorization token checks. At some point, the remote service starts responding slowly, taking 5–10 seconds or more to reply. When a worker in your PHP-FPM pool makes such an external call, it waits for the response. During this waiting period, the worker doesn’t utilize CPU or memory, but it remains blocked, unable to process any other requests. At some point you even can have all your workers occupied by waiting for the response, all new incoming requests, even those unrelated to the external service, get stuck in the queue. This cascade effect leads to a failure of the entire application, all because of an issue in one component!

How To Fix This?

The problem is that CPU and memory utilization remain below the threshold even though the worker can’t process any requests. The most obvious solution, aside from adding timeouts to your calls, is to scale your application based on the number of free workers.

Export PHP-FPM metrics

Use the php-fpm-exporter or a similar tool to scrape and expose relevant PHP-FPM metrics.

There are two metrics that are interesting for use:

phpfpm_active_processes — the number of active processes (workers that are processing request at this moment)
phpfpm_total_processes — total number of processes (active + idle workers)

By calculating the utilization of workers using these metrics (active processes divided by total processes), you can precisely determine the need for scaling.

Here’s an example of a scraper container that can be added to your pod:

php-fpm-exporter:
  image: 'hipages/php-fpm_exporter:2.0.3'
  env:
    - name: PHP_FPM_WEB_LISTEN_ADDRESS
      value: ":{{ .Values.services.TCP.prometheusPort }}"
    - name: PHP_FPM_SCRAPE_URI
      value: "unix:/run/php/php-fpm.sock;/fpm_status"
    - name: PHP_FPM_FIX_PROCESS_COUNT
      value: "true"
  ports:
    - containerPort: '{{ .Values.services.TCP.prometheusPort }}'
  resources:
    cpu:
      requests: '100m'
      limits: '150m'
    memory:
      requests: '64M'
      limits: '128M'
  volumeMounts:
    - name: '{{ (index .Values.volumes 0).name }}'
      mountPath: '/run/php'
  restartPolicy: Always

Configure Prometheus Adapter

To make these metrics available for HorizontalPodAutoscaler (HPA) in Kubernetes, you need to configure the Prometheus Adapter. Create a rule for Prometheus Adapter to expose the custom metric for HPA. Here’s an example configuration for the rule:

- seriesQuery: 'phpfpm_active_processes{namespace!="",pod_name!=""}'
  resources:
    overrides:
      namespace:
        resource: namespace
      pod_name:
        resource: pod
  name:
    matches: "phpfpm_active_processes"
    as: "phpfpm_active_processes_utilization"
  metricsQuery: avg(phpfpm_active_processes{<<.LabelMatchers>>} / phpfpm_total_processes{<<.LabelMatchers>>}) by (<<.GroupBy>>)

Use the Custom Metric in HPA

Now that you have exposed the phpfpm_active_processes_utilization metric, you can use it in your HPA configuration.

Define an HPA rule that utilizes this metric to determine when to scale your PHP application:

- type: Pods
  pods:
    metric:
      name: phpfpm_active_processes_utilization
    target:
      type: AverageValue
      averageValue: 0.7

In this example, the HPA will scale the application when it detects that 70% of all workers are blocked while executing PHP code.

Does It Actually Work?

To validate the effectiveness of the custom metric-based scaling configuration, a real-world stress test was conducted. The scenario emulated a situation where a remote API responded in 2 seconds (rather than the expected 200ms). The application was subjected to stress testing with up to 50 simultaneous users, while the HorizontalPodAutoscaler (HPA) was configured with a maximum of 5 replicas, each having 10 PHP-FPM workers.

Baseline Scenario (Without Custom Metrics)

Median response time increased from 2 seconds to a staggering 12 seconds over time.

HPA did not scale the application effectively.

The application spent a significant portion of its time waiting for responses, rather than executing logic and utilizing CPU or memory resources.

Custom Metric-Based Configuration

Response times consistently remained around 2–3 seconds.

The application dynamically scaled up to 5 pods as needed.

The alternative configuration, which relied on custom metrics, demonstrated its effectiveness in addressing the challenge of PHP-FPM workers being blocked while awaiting responses from external services. The findings from this test scenario highlight several important points:

The custom metric-based approach significantly improved application responsiveness.
Scaling based on the number of free workers prevented the situation where all workers became blocked, ensuring that API requests did not degrade.

Conclusions

Implementing custom metrics in Kubernetes is a game-changer for optimizing the scaling of any kind of applications and addressing specific challenges. Custom metrics provide the granularity needed to make informed scaling decisions, ensuring your application remains responsive and resilient, even in complex real-world scenarios.

The approach for the pool of PHP-FPM workers that we described above also can be used, for example, for scaling message broker consumers depending on the size of the queue (e.g. RabbitMQ).

Useful links

The author has used AI tools for writing this article. Even though initial idea and structure were their own, inspired by real life experience of working in IT, AI has been utilized to make grammatically correct text with varied vocabulary.