Skip to content

Conversation

@konstin
Copy link
Member

@konstin konstin commented Oct 17, 2025

When a process is running and another calls uv cache clean or uv cache prune we currently deadlock - sometimes until the CI timeout (astral-sh/setup-uv#588). To avoid this, we add a default 5 min timeout waiting for a lock. 5 min balances allowing in-progress builds to finish, especially with larger native dependencies, while also giving timely errors for deadlocks on (remote) systems.

Commit 1 is a refactoring.

@konstin konstin added the bug Something isn't working label Oct 17, 2025
@konstin konstin temporarily deployed to uv-test-registries October 17, 2025 11:55 — with GitHub Actions Inactive
When a process is running and another calls `uv cache clean` or `uv cache prune` we currently deadlock - sometimes until the CI timeout (astral-sh/setup-uv#588). To avoid this, we add a default 5 min timeout waiting for a lock. 5 min balances allowing in-progress builds to finish, especially with larger native dependencies, while also giving timely errors for deadlocks on (remote) systems.
@konstin konstin force-pushed the konsti/locked-file-timeout branch from 2287b53 to 2a5b779 Compare October 17, 2025 12:48
@konstin konstin temporarily deployed to uv-test-registries October 17, 2025 12:49 — with GitHub Actions Inactive
timeout: Duration,
) -> Option<Output> {
let (sender, receiver) = std::sync::mpsc::channel();
thread::spawn(move || {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should happen rarely and already involves waiting, so we can spawn a thread. I quickly looked into making it generally async but it didn't seem worth the churn.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we only have a couple calls to the blocking / sync versions of the lock APIs. I'd be tempted to make them async.

I wonder if we should move this timeout handling to the API functions so we can run_with_timeout in the blocking / sync versions and just use an async timeout in the async versions? I'm wary of spawning a thread just for a timeout in the async case.

@konstin konstin temporarily deployed to uv-test-registries October 17, 2025 13:52 — with GitHub Actions Inactive
@konstin konstin temporarily deployed to uv-test-registries October 17, 2025 15:18 — with GitHub Actions Inactive
Comment on lines +15 to +35
/// Parsed value of `UV_LOCK_TIMEOUT`, with a default of 5 min.
static LOCK_TIMEOUT: LazyLock<Duration> = LazyLock::new(|| {
let default_timeout = Duration::from_secs(300);
let Some(lock_timeout) = env::var_os(EnvVars::UV_LOCK_TIMEOUT) else {
return default_timeout;
};

if let Some(lock_timeout) = lock_timeout
.to_str()
.and_then(|lock_timeout| lock_timeout.parse::<u64>().ok())
{
Duration::from_secs(lock_timeout)
} else {
warn!(
"Could not parse value of {} as integer: {:?}",
EnvVars::UV_LOCK_TIMEOUT,
lock_timeout
);
default_timeout
}
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have this to our standard environment variable parsing in EnvironmentOptions instead, I don't want to keep adding ad-hoc parsing like this.

If you want to defer it to reduce churn, that's okay — but we should add it the tracking issue and make sure it's moved.

Copy link
Member Author

@konstin konstin Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added it to #14720, do you want a separate tracking issue?

I had looked into parsing this centrally but the locks are called in a lot of locations including e.g. a LazyLock in a Default impl (

match TextCredentialStore::read(&path) {
)

#[derive(Debug, Error)]
pub enum LockedFileError {
#[error(
"Timeout ({}s) when waiting for lock on `{}` at `{}`, is another uv process running? Set `{}` to increase the timeout.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to say "You can set ... to increase the timeout" instead of "Set" which makes it sounds like you should do that as the solution.

Comment on lines 239 to 273
// Write a test package that builds for a while
let child_pyproject_toml = context.temp_dir.child("pyproject.toml");
child_pyproject_toml.write_str(indoc! {r#"
[project]
name = "child"
version = "0.1.0"
requires-python = ">=3.9"
[build-system]
requires = []
backend-path = ["."]
build-backend = "build_backend"
"#})?;
// File to wait until the lock is acquired from starting the build.
let ready_file = context.temp_dir.child("ready_file.txt");
let build_backend = context.temp_dir.child("build_backend.py");
build_backend.write_str(&formatdoc! {r#"
import time
from pathlib import Path
Path(r"{}").touch()
# Make the test fail quickly if something goes wrong
time.sleep(10)
"#,
// Don't run tests in directories with double quotes, please.
ready_file.display(),
})?;

let mut child = context.pip_install().arg(".").spawn()?;

// Wait until we've acquired the lock in the first process.
while !ready_file.exists() {
std::thread::sleep(std::time::Duration::from_millis(1));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more complicated than it needs to be. We can just do

let _cache = uv_cache::Cache::from_path(context.cache_dir.path()).with_exclusive_lock();

@konstin konstin temporarily deployed to uv-test-registries October 22, 2025 08:50 — with GitHub Actions Inactive
@konstin konstin temporarily deployed to uv-test-publish October 22, 2025 08:51 — with GitHub Actions Inactive
Copy link
Member

@zanieb zanieb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#16342 (comment) is my main remaining caveat.

We should probably also add a note in https://docs.astral.sh/uv/concepts/cache since that's the main place this will be relevant.

@zanieb zanieb added enhancement New feature or improvement to existing functionality and removed bug Something isn't working labels Oct 22, 2025
@zanieb
Copy link
Member

zanieb commented Oct 22, 2025

On the timing, I guess I might expect something like 60s rather than 5m? 5m is nice and conservative though, we could reduce it later once we see that 5m doesn't break anything

) -> anyhow::Result<Vec<PathBuf>> {
let cache = Cache::from_path(temp_dir.child("cache").to_path_buf()).init()?;
let cache = Cache::from_path(temp_dir.child("cache").to_path_buf())
.init_no_wait()?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit more risky change because it assumes tests do not lock or spawn something in the background and then operate on Python versions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative is making every integration test async

@konstin konstin force-pushed the konsti/locked-file-timeout branch from e15037e to 762fbf6 Compare October 27, 2025 13:05
@konstin konstin temporarily deployed to uv-test-registries October 27, 2025 13:07 — with GitHub Actions Inactive
@konstin
Copy link
Member Author

konstin commented Oct 27, 2025

I rewrote it entirely async and removed the duplication between sync and async as well as shared and exclusive.

On the timing, I guess I might expect something like 60s rather than 5m? 5m is nice and conservative though, we could reduce it later once we see that 5m doesn't break anything

I can see some (e.g. Rust) build taking >60s, so I'd like to go with a higher timeout.

@konstin konstin temporarily deployed to uv-test-registries October 28, 2025 14:34 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or improvement to existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants