-
Couldn't load subscription status.
- Fork 6.1k
8370572: Cgroups hierarchical memory limit is not honored after JDK-8322420 #28006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
8370572: Cgroups hierarchical memory limit is not honored after JDK-8322420 #28006
Conversation
|
👋 Welcome back shade! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
Webrevs
|
|
Logs from local reproducer (see bug for details) -- asking to run with 1G in parent slice, and 25% of container memory as heap size. The goal is to have 256M heap size then. Current mainline (broken): This fix: |
|
ECS team also reports positive results with this patch. @jerboaa, take a look, please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert in this area, but after looking at JDK-8322420 and its two rather lengthy review threads for PR #17198 (which was abandoned) and PR #20646 which was finally approved and which contained the changes which led to this regression it seems that @jerboaa's comment in the first PR:
It is the hope to no longer needing to do this hierarchical look-up since we know where in the hierarchy we ought to look for the memory limit.
which referred to this code:
bool is_ok = reader()->read_numerical_key_value("/memory.stat", "hierarchical_memory_limit", &hier_memlimit);
if (!is_ok) {
return OSCONTAINER_ERROR;
}
log_trace(os, container)("Hierarchical Memory Limit is: " JULONG_FORMAT, hier_memlimit);
if (hier_memlimit < phys_mem) {
verbose_log(hier_memlimit, phys_mem);
return (jlong)hier_memlimit;
}
log_trace(os, container)("Hierarchical Memory Limit is: Unlimited");was a little bit too optimistic, and finally led to this regression.
I'm therefore inclined to approve this fix after sleeping on it one more night :)
Without a proper regression test this is bound to fall through the cracks again. So are you sure this cannot be tested? It should be fine if the test needs root privileges (we could skip it if not root). But it would be better than not having one. |
Yes, I tried to write a test, but it was not simple at all. AFAICS, you need to configure the host in a particular way to get to the interesting configuration, when part of hierarchy is hidden. So not only it would require root, it would also make changes to the host cgroup config (and properly revert them at the end of testing!). It would be better if we could come up with something like Docker-in-Docker kind of test, but that is probably a headache as well. Anyway, we are dealing with the real-world, customer-facing breakage here, so I reasoned it was unwise to delay the immediately deployable fix, just because it was unclear how to write a reliable regression test for it :) |
See the bug for more discussion.
We are seeing customer regressions in 21.0.9, notably on ECS Fargate. We root-caused it to JDK-8322420. That patch removed the handling of
hierarchical_memory_limit, look at this hunk.But at least cgroupv1 still needs them in some conditions, notably in ECS. There is a way to reproduce it with local Docker as well. The key is to set up host cgroup that would not be visible to the container, and so that the only way for container to know the memory limits would be to look into
hierarchical_*values that kernel computes itself.Unfortunately, it is not easy to revert the offending hunks from 21.0.9, as there were follow-up refactoring backports. So, to make it work, this PR reinstantiates the hunks using the new cgroups support code. It also makes code (subjectively) easier to read, and is in the spirit of past refactorings.
We are planning to pick this patch up to 21.0.9, at least into Corretto downstream as soon as possible to unbreak users. Therefore, the patch is also kept as crisp as possible.
I tried to come up with a regression test for it, but could not: local reproducers require amending host configuration, which requires superuser privileges, among other hassle it introduces.
Additional testing:
containers/passes on cgroupsv1 hostcontainers/passes on cgroupsv2 hostProgress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28006/head:pull/28006$ git checkout pull/28006Update a local copy of the PR:
$ git checkout pull/28006$ git pull https://git.openjdk.org/jdk.git pull/28006/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 28006View PR using the GUI difftool:
$ git pr show -t 28006Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28006.diff
Using Webrev
Link to Webrev Comment