OpenShift container running out of PIDs all the time

My java application that I deployed on Openshift spectacularly. After humming along happily for a good while, it got completely stuck, botching the standard out stream in the process, leaving me in the blind without even a log to tell me why Maven was hanging.

It turned out the reason why my JVM eventually fails to create new threads. Openshift has created a cgroup with a maximum number of PIDs to be 1024. Each native thread needs its own PID and will therefore bail out when it reaches this number:

$ oc exec -ti dc/my-app bash
1001480000@my-app-n8shg-8lggr:/$ cat /sys/fs/cgroup/pids/pids.max
1024

Watching the JUnit tests as /sys/fs/cgroup/pids/pids.current approaches 1024 beats anything Netflix has on offer.

The before mentioned cgroup is created with a Kubernetes feature called SupportPodPidsLimit, it's off by default in Kubernetes and on by default in OpenShift.

The reason why so many threads were created was because of java.net.http.HttpClient couldn't close them fast enough to not hit the 1024 boundary set by OpenShift (cri-o). In my case, the fix was to re-use the clients, my-app now has one dedicated client for non-authorized requests and one HTTP client per user/pass combination. With these changes, my-app never exceeds 61 native threads on OpenShift.

OpenShift container running out of PIDs all the time

Further reading