Always check refCount after acquiring lock #1986

andrewbranch · 2025-10-30T18:57:11Z

Fixes a race noticed by @sheetalkamat
Probably fixes #1983

jakebailey

I never noticed that we had two things with nearly the same implementation

Copilot

Pull Request Overview

This PR addresses race conditions in reference-counted cache implementations by adding checks for deleted entries and adjusting lock timing. The changes ensure that operations handle cases where entries are marked for deletion (refCount <= 0) between the time they're looked up and when their locks are acquired.

Key Changes

Added resurrection logic in Ref() methods to handle entries deleted while acquiring locks
Added recursive retry logic in loadOrStoreNewLockedEntry() for deleted entries
Moved mu.Unlock() in Deref() methods to after Delete() operations to prevent race conditions

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
internal/project/parsecache.go	Adds race condition handling for deleted parse cache entries and fixes unlock ordering in Deref
internal/project/extendedconfigcache.go	Adds race condition handling for deleted config cache entries and fixes unlock ordering in Deref

Comments suppressed due to low confidence (1)

internal/project/extendedconfigcache.go:1

This recursive retry could loop indefinitely under sustained contention. Consider adding a maximum retry count to prevent potential stack overflow or infinite loops in edge cases where entries are repeatedly deleted.

package project

internal/project/parsecache.go

Copilot · 2025-10-30T19:01:01Z

internal/project/parsecache.go

+		if existing.refCount <= 0 {
+			// Existing entry was deleted while we were acquiring the lock
+			existing.mu.Unlock()
+			return c.loadOrStoreNewLockedEntry(key)
+		}


This recursive retry pattern could potentially loop indefinitely if there's sustained contention where entries are constantly being deleted. Consider adding a retry limit or exponential backoff to prevent unbounded recursion in pathological cases.

internal/project/extendedconfigcache.go

andrewbranch · 2025-10-30T19:12:23Z

Oh, it's blowing up because we never delete entries in our test suites since they're likely to come back

jakebailey · 2025-10-30T21:08:56Z

Doesn't that imply we aren't testing this new code?

andrewbranch · 2025-10-30T21:12:09Z

Yes. AFAIK it's not possible to deterministically trigger the race, because there's no callback into test code that could occur between the sync map load and the mutex lock.

DanielRosenwasser · 2025-10-31T18:28:30Z

internal/project/extendedconfigcache.go

 	if entry, ok := c.entries.Load(path); ok {
 		entry.mu.Lock()
+		if entry.refCount <= 0 {
+			// Entry was deleted while we were acquiring the lock


Isn't possible that the entry got deleted before the cal to c.entries.Load(path) as well?

Always check refCount after acquiring lock

386ef19

andrewbranch requested review from Copilot, jakebailey and sheetalkamat October 30, 2025 18:57

jakebailey approved these changes Oct 30, 2025

View reviewed changes

Copilot AI reviewed Oct 30, 2025

View reviewed changes

Disable new logic when DisableDeletion is enabled

67ec235

andrewbranch requested a review from jakebailey October 30, 2025 21:06

sheetalkamat approved these changes Oct 31, 2025

View reviewed changes

DanielRosenwasser reviewed Oct 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Always check refCount after acquiring lock #1986

Always check refCount after acquiring lock #1986

andrewbranch commented Oct 30, 2025

Uh oh!

jakebailey left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Oct 30, 2025

Uh oh!

Uh oh!

andrewbranch commented Oct 30, 2025

Uh oh!

jakebailey commented Oct 30, 2025

Uh oh!

andrewbranch commented Oct 30, 2025

Uh oh!

DanielRosenwasser Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Always check refCount after acquiring lock #1986

Are you sure you want to change the base?

Always check refCount after acquiring lock #1986

Conversation

andrewbranch commented Oct 30, 2025

Uh oh!

jakebailey left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andrewbranch commented Oct 30, 2025

Uh oh!

jakebailey commented Oct 30, 2025

Uh oh!

andrewbranch commented Oct 30, 2025

Uh oh!

DanielRosenwasser Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants