Skip to content

Conversation

@fabiovincenzi
Copy link
Contributor

@fabiovincenzi fabiovincenzi commented Oct 13, 2025

Issue: #1207

Summary

Two-tier cache system for faster git operations: persistent bare repos + ephemeral working copies.

Architecture

  • .remote/cache/ - Shared bare repositories (persistent)
  • .remote/work/<push-id>/ - Per-push working copies (temporary)

Configuration

Add to proxy.config.json:

{
  "cache": {
    "maxSizeGB": 2,
    "maxRepositories": 50,
    "cacheDir": "./.remote/cache"
  }
}

Security

  • Each push gets unique working copy (.remote/work//)
  • Bare cache contains only git objects (no user files, no credentials)
  • sanitizeRepositoryConfig() immediately removes credentials from git config
  • Working copies deleted after push completes

@netlify
Copy link

netlify bot commented Oct 13, 2025

Deploy Preview for endearing-brigadeiros-63f9d0 canceled.

Name Link
🔨 Latest commit 133e5e6
🔍 Latest deploy log https://app.netlify.com/projects/endearing-brigadeiros-63f9d0/deploys/68ff7d335b586300080bbc85

@codecov
Copy link

codecov bot commented Oct 13, 2025

Codecov Report

❌ Patch coverage is 83.24873% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.66%. Comparing base (6f56f15) to head (627137b).

Files with missing lines Patch % Lines
src/proxy/processors/push-action/clearBareClone.ts 37.50% 9 Missing and 1 partial ⚠️
src/proxy/processors/push-action/git-operations.ts 81.63% 6 Missing and 3 partials ⚠️
src/proxy/processors/push-action/pullRemote.ts 85.41% 7 Missing ⚠️
src/proxy/processors/push-action/cache-manager.ts 90.62% 6 Missing ⚠️
src/config/index.ts 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1246      +/-   ##
==========================================
+ Coverage   82.64%   82.66%   +0.02%     
==========================================
  Files          70       73       +3     
  Lines        3007     3185     +178     
  Branches      501      537      +36     
==========================================
+ Hits         2485     2633     +148     
- Misses        419      444      +25     
- Partials      103      108       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@fabiovincenzi fabiovincenzi marked this pull request as ready for review October 22, 2025 14:13
Copy link
Contributor

@06kellyjac 06kellyjac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking forward to faster pulls!

Caches are a common place for logic or security problems so we may want added scrutiny and testing for this PR to be extra safe :)

Added some thoughts, not ran this locally just yet.

Comment on lines 10 to 22
if (process.env.NODE_ENV === 'test') {
// TEST: Full cleanup (bare cache + all working copies)
try {
if (fs.existsSync('./.remote')) {
fs.rmSync('./.remote', { recursive: true, force: true });
step.log('Test environment: Full .remote directory cleaned');
} else {
step.log('Test environment: .remote directory already clean');
}
} catch (err) {
step.log(`Warning: Could not clean .remote directory: ${err}`);
}
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a safer option that ensures what we're testing more closely matches prod would be to either do some separate cleanup as necessary in the tests, or adjust config to have a much lower max (or even 0) then we can see if CacheManager itself is doing the cleanup.

Comment on lines 92 to 95
// Sort repositories by last accessed (oldest first for removal)
const reposToEvaluate = [...stats.repositories].sort(
(a, b) => a.lastAccessed.getTime() - b.lastAccessed.getTime(),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember if we can use toSorted yet. If we still cant a comment to use toSorted in the future would help to find this later :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be available from node 20 so yes I'll use it

return 0;
}

return Math.round(totalBytes / (1024 * 1024)); // Convert to MB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Math.round(totalBytes / (1024 * 1024)); // Convert to MB
return Math.ceil(totalBytes / (1024 * 1024)); // Convert to MB

rounding down would hide some amount of KB up to 511KB. Not the end of the world but safest to always round up rather than be shortchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no conversion anymore since we are using bytes everywhere

Copy link
Contributor

@jescalada jescalada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a code review for now - I'll be testing the code later!

I was wondering if you could do some profiling to verify/document the speed improvements with this PR. Would be nice to try it out with massive repos (for example https://github.com/backstage/backstage which is >10GB) to see the difference.

I'll try to do some profiling soon as well 🙂

}

export class CacheManager {
private cacheDir: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this cacheDir related to the cachedir library in package.json or the cacheDir used in the ConfigLoader? If not, we might want to pick a different name to disambiguate them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what would happen if two pulls happen simultaneously and execute touchRepository or enforceLimits at the same time - wouldn't this cause some kind of race condition? Especially with cache statistics in the mix, we wouldn't be able to accurately pick up the latest accessed repository.

let freedMB = 0;

// Sort repositories by last accessed (oldest first for removal)
const reposToEvaluate = [...stats.repositories].sort(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to sort the repositories twice? The first sort happens inside the getCacheStats() call right above.

/**
* Calculate directory size in MB
*/
private getDirectorySize(dirPath: string): number {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the error clauses here be logging something instead of failing silently? Such as when there's an error on calculateSize(dirPath).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have a short README on how the pullRemote works and what the hybrid cache architecture is doing. This'll make things much clearer for future development and finding potential flaws in our logic 🙂

@06kellyjac
Copy link
Contributor

why on earth is backstage's repo 10GB? 😭
Even nixpkgs is only 6.1GB ATM on my machine

@jescalada
Copy link
Contributor

@06kellyjac It might be because of the tons of plugins that come with the repo (many of which have redundant components). Pulling with git clone --filter=blob:none https://github.com/backstage/backstage.git seems to make things a lot more speedy, however I'm not sure if we want to do things that way in the pullRemote. Perhaps a config option to filter out the BLOBs could help with these massive repos?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants