Skip to content

Conversation

@aemerson
Copy link
Contributor

This commit introduces a new --drop-exec option that allows users to
drop the first N execution samples when running with --exec-multisample.
This is useful for mitigating warmup effects that can skew performance
measurements.

The option accepts an integer N specifying how many initial samples to
drop, and works with all execution modes (normal, --exec, and
--exec-interleaved-builds).

This commit introduces a new --drop-exec option that allows users to
drop the first N execution samples when running with --exec-multisample.
This is useful for mitigating warmup effects that can skew performance
measurements.

The option accepts an integer N specifying how many initial samples to
drop, and works with all execution modes (normal, --exec, and
--exec-interleaved-builds).
@aemerson aemerson marked this pull request as ready for review October 25, 2025 05:14
Copy link
Contributor Author

aemerson commented Oct 25, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@aemerson aemerson requested a review from ldionne October 25, 2025 05:41
Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question: one could argue that the handling of warmups should be done at the benchmark level. For example, GoogleBenchmark does that, they will discard some number of warmup runs and then iterate until there is a stable result. That way, when you run a benchmark, you get a result that is immediately usable.

Is that not the case for the benchmarks that are part of the LLVM test suite? Do you see an actual difference between results with and without dropping "warmup" runs?

# Check for incompatible options
if opts.only_compile:
self._fatal("--drop-exec cannot be used with --only-compile")
if opts.build:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think --build means that we're necessarily skipping the tests, right? Can't you have both --build and --exec, in which case it would make sense to have --build and --drop-exec at the same time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think --build and --exec is redundant as a concept because that's the default behavior. These options are only useful as a stopping point.

That said there's an issue which I don't think the current implementation is self consistent with. --exec (née --test-prebuilt) is supposed to skip previous steps, which is fine. But --build doesn't skip the --configure phase (and doing so seems like bad UX).

@aemerson
Copy link
Contributor Author

General question: one could argue that the handling of warmups should be done at the benchmark level. For example, GoogleBenchmark does that, they will discard some number of warmup runs and then iterate until there is a stable result. That way, when you run a benchmark, you get a result that is immediately usable.

Is that not the case for the benchmarks that are part of the LLVM test suite? Do you see an actual difference between results with and without dropping "warmup" runs?

For some benchmark suites that are plugged into the test-suite as an "external" suite, the current best practices are that the first run be dropped. Whether or not this makes a meaningful difference isn't something I have evidence of, but this is the current recommendation that I'm hearing.

Intuitively it does make some sense, for example when I run some industry standard benchmarks I first build with high parallelism (--build-threads 10 for example) and then execute serially (-j1) and there's a chance that the machine after building is thermally recovering from the build phase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants