feat(ENG-21324): Implement hosted_git_info.zig #24138

markovejnovic · 2025-10-28T00:02:02Z

What does this PR do?

How did you verify your code works?

linear · 2025-10-28T00:02:05Z

ENG-21324 Implement `hosted_git_info.zig`

robobun · 2025-10-28T00:02:12Z

^{Updated 10:23 AM PT - Oct 28th, 2025}

@markovejnovic, your commit ae765a0 is building: #30429

coderabbitai · 2025-10-28T00:10:02Z

Walkthrough

Adds a hosted-git-info Zig implementation, integrates it into dependency parsing, introduces URL fragment accessors, several string/path utilities, and extensive tests for hosted-git-info parsing and parsing helpers.

Changes

Cohort / File(s)	Summary
URL Fragment Accessors `src/bun.js/bindings/BunString.cpp`, `src/bun.js/bindings/URL.zig`	Added two C externs `URL__hash` and `URL__fragmentIdentifier` and Zig externs plus public wrapper methods `hash` and `fragmentIdentifier` to return URL fragment with and without leading `#`.
Hosted Git Info Implementation `src/install/hosted_git_info.zig`	New module `HostedGitInfo` with parsing, normalization, provider configs, representation formatting, JS testing APIs, and memory/error handling for Git URL/shortcut support across multiple providers.
Dependency Resolution Updates `src/install/dependency.zig`	Removed `isGitHubRepoPath`; switched to `hosted_git_info` detection (`isGitHubShorthand` / `fromUrl`) across parse/infer paths, added allocator-backed slow-paths and bridging logic for non-GitHub hosts.
String Utilities & Buffers `src/string/immutable.zig`, `src/string/immutable/paths.zig`	Added helpers: `lastIndexBeforeChar`, `charAtT`, `substring`, `countLeadingChar`, case-insensitive `endsWith` variants, `SlicedBuffer` type; removed `toWDirNormalized`; updated concat signatures to `bun.OOM`.
Path Separator Detection `src/resolver/resolve_path.zig`	Added `pub fn hasPlatformPathSeparators(input_path: []const u8) bool` to detect platform-specific path separators.
Comptime Map Helper `src/comptime_string_map.zig`	Added `pub fn getKey(value: V) ?[]const KeyType` to lookup a key by value via linear search.
JS Testing Exports `src/js/internal-for-testing.ts`	Exposed `hostedGitInfo` testing object with `parseUrl` and `fromUrl` Zig-backed functions.
Tests / Fixtures `test/cli/install/hosted-git-info/cases.ts`, `test/cli/install/hosted-git-info/from-url.test.ts`, `test/cli/install/hosted-git-info/parse-url.test.ts`	Added comprehensive valid/invalid URL datasets and tests asserting `parseUrl`/`fromUrl` behavior across providers and formats.

Possibly related PRs

fix(npa): Cleanup NPA code #23358: Adds the same URL fragment accessor functions (URL__hash, URL__fragmentIdentifier) and corresponding Zig bindings, indicating a direct code-level overlap.

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The pull request description contains only the template section headers "What does this PR do?" and "How did you verify your code works?" without any substantive content. Both required sections remain empty placeholders, failing to explain what changes were made or how the implementation was verified. The template structure is present but unfilled, making the description incomplete and providing no meaningful information about the PR.	The author should fill in both sections of the description. The "What does this PR do?" section should explain that this implements a Zig clone of the npm hosted-git-info library, including URL parsing, normalization, and multi-provider support. The "How did you verify your code works?" section should describe the testing approach, such as the test files added (parse-url.test.ts, from-url.test.ts) and any manual verification performed.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title "feat(ENG-21324): Implement hosted_git_info.zig" is specific and concise, clearly identifying the main change without noise or vague terminology. It directly corresponds to the primary objective outlined in the linked issue ENG-21324, which is to implement the hosted_git_info.zig library. The title accurately reflects that this is a feature addition for implementing this specific module.
Linked Issues Check	✅ Passed	The PR successfully implements the primary objective from ENG-21324: creating a Zig library that clones the npm hosted-git-info package. The new src/install/hosted_git_info.zig file provides comprehensive functionality including HostedGitInfo struct, fromUrl parsing, toString representation, URL normalization, and multi-provider support (github, bitbucket, gitlab, gist, sourcehut). Supporting changes integrate this library into the codebase (dependency.zig refactoring, testing APIs in internal-for-testing.ts) and include test cases validating the parsing logic.
Out of Scope Changes Check	✅ Passed	The PR includes supporting infrastructure changes beyond the core hosted_git_info.zig implementation: URL binding additions (BunString.cpp, URL.zig), string utility functions (immutable.zig), path resolution utilities (resolve_path.zig), and data structure enhancements (comptime_string_map.zig). These appear to be foundational utilities needed to support URL parsing, normalization, and string manipulation within the hosted_git_info implementation, representing necessary supporting infrastructure rather than unrelated scope creep.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3f7668a and de14245.

📒 Files selected for processing (1)

src/install/dependency.zig (10 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.zig

📄 CodeRabbit inference engine (.cursor/rules/javascriptcore-class.mdc)

**/*.zig: Declare the extern C symbol in Zig and export a Zig-friendly alias for use
Wrap the Bun____toJS extern in a Zig method that takes a JSGlobalObject and returns JSC.JSValue

Files:

src/install/dependency.zig

src/**/*.zig

📄 CodeRabbit inference engine (.cursor/rules/building-bun.mdc)

When adding debug logs in Zig, create a scoped logger and log via Bun APIs: const log = bun.Output.scoped(.${SCOPE}, .hidden); then log("...", .{})

src/**/*.zig: Use Zig private fields with the # prefix for encapsulation (e.g., struct { #foo: u32 })
Prefer Decl literals for initialization (e.g., const decl: Decl = .{ .binding = 0, .value = 0 };)
Place @import statements at the bottom of the file (formatter will handle ordering)

src/**/*.zig: In Zig code, manage memory carefully: use appropriate allocators and defer for cleanup
Cache JavaScriptCore class structures in ZigGlobalObject when adding new classes

Files:

src/install/dependency.zig

🔇 Additional comments (2)

src/install/dependency.zig (2)

820-828: JS bridge: conversion path looks good

Converts JSValue -> BunString -> UTF‑8 and defers cleanup, then calls Tag.infer. Solid and safe.

1505-1505: Import placement matches project guidance

@import("./hosted_git_info.zig") is at the bottom with other imports, aligning with src/**/*.zig guideline.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 14

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 668eba0 and 3f7668a.

📒 Files selected for processing (12)

src/bun.js/bindings/BunString.cpp (1 hunks)
src/bun.js/bindings/URL.zig (1 hunks)
src/comptime_string_map.zig (1 hunks)
src/install/dependency.zig (10 hunks)
src/install/hosted_git_info.zig (1 hunks)
src/js/internal-for-testing.ts (1 hunks)
src/resolver/resolve_path.zig (1 hunks)
src/string/immutable.zig (10 hunks)
src/string/immutable/paths.zig (0 hunks)
test/cli/install/hosted-git-info/cases.ts (1 hunks)
test/cli/install/hosted-git-info/from-url.test.ts (1 hunks)
test/cli/install/hosted-git-info/parse-url.test.ts (1 hunks)

💤 Files with no reviewable changes (1)

src/string/immutable/paths.zig

🧰 Additional context used

📓 Path-based instructions (13)

**/*.zig

📄 CodeRabbit inference engine (.cursor/rules/javascriptcore-class.mdc)

**/*.zig: Declare the extern C symbol in Zig and export a Zig-friendly alias for use
Wrap the Bun____toJS extern in a Zig method that takes a JSGlobalObject and returns JSC.JSValue

Files:

src/comptime_string_map.zig
src/bun.js/bindings/URL.zig
src/resolver/resolve_path.zig
src/install/dependency.zig
src/install/hosted_git_info.zig
src/string/immutable.zig

src/**/*.zig

📄 CodeRabbit inference engine (.cursor/rules/building-bun.mdc)

When adding debug logs in Zig, create a scoped logger and log via Bun APIs: const log = bun.Output.scoped(.${SCOPE}, .hidden); then log("...", .{})

src/**/*.zig: Use Zig private fields with the # prefix for encapsulation (e.g., struct { #foo: u32 })
Prefer Decl literals for initialization (e.g., const decl: Decl = .{ .binding = 0, .value = 0 };)
Place @import statements at the bottom of the file (formatter will handle ordering)

src/**/*.zig: In Zig code, manage memory carefully: use appropriate allocators and defer for cleanup
Cache JavaScriptCore class structures in ZigGlobalObject when adding new classes

Files:

src/comptime_string_map.zig
src/bun.js/bindings/URL.zig
src/resolver/resolve_path.zig
src/install/dependency.zig
src/install/hosted_git_info.zig
src/string/immutable.zig

test/**

📄 CodeRabbit inference engine (.cursor/rules/writing-tests.mdc)

Place all tests under the test/ directory

Files:

test/cli/install/hosted-git-info/from-url.test.ts
test/cli/install/hosted-git-info/parse-url.test.ts
test/cli/install/hosted-git-info/cases.ts

test/cli/**/*.{js,ts}

📄 CodeRabbit inference engine (.cursor/rules/writing-tests.mdc)

test/cli/**/*.{js,ts}: Place CLI command tests (e.g., bun install, bun init) under test/cli/
When testing Bun as a CLI, use spawn with bunExe() and bunEnv from harness, and capture stdout/stderr via pipes

Files:

test/cli/install/hosted-git-info/from-url.test.ts
test/cli/install/hosted-git-info/parse-url.test.ts
test/cli/install/hosted-git-info/cases.ts

test/**/*.{js,ts}

📄 CodeRabbit inference engine (.cursor/rules/writing-tests.mdc)

test/**/*.{js,ts}: Write tests in JavaScript or TypeScript using Bun’s Jest-style APIs (test, describe, expect) and run with bun test
Prefer data-driven tests (e.g., test.each) to reduce boilerplate
Use shared utilities from test/harness.ts where applicable

Files:

test/cli/install/hosted-git-info/from-url.test.ts
test/cli/install/hosted-git-info/parse-url.test.ts
test/cli/install/hosted-git-info/cases.ts

test/**/*.test.{ts,js,jsx,tsx,mjs,cjs}

📄 CodeRabbit inference engine (test/CLAUDE.md)

test/**/*.test.{ts,js,jsx,tsx,mjs,cjs}: Use bun:test for files ending with *.test.{ts,js,jsx,tsx,mjs,cjs}
Prefer concurrent tests (test.concurrent/describe.concurrent) over sequential when feasible
Organize tests with describe blocks to group related tests
Use utilities like describe.each, toMatchSnapshot, and lifecycle hooks (beforeAll, beforeEach, afterEach) and track resources for cleanup

Files:

test/cli/install/hosted-git-info/from-url.test.ts
test/cli/install/hosted-git-info/parse-url.test.ts

test/**/*.{ts,tsx,js,jsx,mjs,cjs}

📄 CodeRabbit inference engine (test/CLAUDE.md)

For large/repetitive strings, use Buffer.alloc(count, fill).toString() instead of "A".repeat(count)

Files:

test/cli/install/hosted-git-info/from-url.test.ts
test/cli/install/hosted-git-info/parse-url.test.ts
test/cli/install/hosted-git-info/cases.ts

test/**/*.test.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

test/**/*.test.{ts,tsx}: Test files must live under test/ and end with .test.ts or .test.tsx
In tests, always use port: 0; do not hardcode ports or invent random port functions
Prefer snapshot assertions and use normalizeBunSnapshot for snapshot output in tests
Never write tests that assert absence of crashes (e.g., no "panic" or "uncaught exception") in output
Use tempDir from "harness" for temporary directories; do not use tmpdirSync or fs.mkdtempSync in tests
When spawning processes in tests, assert on stdout before asserting exitCode
Do not use setTimeout in tests; await conditions instead to avoid flakiness

Files:

test/cli/install/hosted-git-info/from-url.test.ts
test/cli/install/hosted-git-info/parse-url.test.ts

test/**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Avoid shell commands in tests (e.g., find, grep); use Bun's Glob and built-in tools

Files:

test/cli/install/hosted-git-info/from-url.test.ts
test/cli/install/hosted-git-info/parse-url.test.ts
test/cli/install/hosted-git-info/cases.ts

src/bun.js/**/*.zig

📄 CodeRabbit inference engine (.cursor/rules/zig-javascriptcore-classes.mdc)

src/bun.js/**/*.zig: In Zig binding structs, expose generated bindings via pub const js = JSC.Codegen.JS and re-export toJS/fromJS/fromJSDirect
Constructors and prototype methods should return bun.JSError!JSC.JSValue to integrate Zig error handling with JS exceptions
Use parameter name globalObject (not ctx) and accept (*JSC.JSGlobalObject, *JSC.CallFrame) in binding methods/constructors
Implement getters as get(this, globalObject) returning JSC.JSValue and matching the .classes.ts interface
Provide deinit() for resource cleanup and finalize() that calls deinit(); use bun.destroy(this) or appropriate destroy pattern
Access JS call data via CallFrame (argument(i), argumentCount(), thisValue()) and throw errors with globalObject.throw(...)
For properties marked cache: true, use the generated Zig accessors (NameSetCached/GetCached) to work with GC-owned values
In finalize() for objects holding JS references, release them using .deref() before destroy

Files:

src/bun.js/bindings/URL.zig

**/*.{cpp,h}

📄 CodeRabbit inference engine (.cursor/rules/javascriptcore-class.mdc)

**/*.{cpp,h}: When exposing a JS class with public Constructor and Prototype, define three C++ types: class Foo : public JSC::DestructibleObject (if it has C++ fields), class FooPrototype : public JSC::JSNonFinalObject, and class FooConstructor : public JSC::InternalFunction
If the class has C++ data members, inherit from JSC::DestructibleObject and provide proper destruction; if it has no C++ fields (only JS properties), avoid a class and use JSC::constructEmptyObject(vm, structure) with putDirectOffset
Prefer placing the subspaceFor implementation in the .cpp file rather than the header when possible

Files:

src/bun.js/bindings/BunString.cpp

**/*.cpp

📄 CodeRabbit inference engine (.cursor/rules/javascriptcore-class.mdc)

**/*.cpp: Include "root.h" at the top of C++ binding files to satisfy lints
Define prototype properties using a const HashTableValue array and declare accessors/functions with JSC_DECLARE_* macros
Prototype classes should subclass JSC::JSNonFinalObject, provide create/createStructure, DECLARE_INFO, finishCreation that reifies static properties, and set mayBePrototype on the Structure
Custom getters should use JSC_DEFINE_CUSTOM_GETTER, jsDynamicCast to validate this, and throwThisTypeError on mismatch
Custom setters should use JSC_DEFINE_CUSTOM_SETTER, validate this via jsDynamicCast, and store via WriteBarrier/set semantics
Prototype functions should use JSC_DEFINE_HOST_FUNCTION, validate this with jsDynamicCast, and return encoded JSValue
Constructors should subclass JSC::InternalFunction, return internalFunctionSpace in subspaceFor, set the prototype property as non-configurable/non-writable, and provide create/createStructure
Provide a setup function that builds the Prototype, Constructor, and Structure, and assigns them to the LazyClassStructure initializer
Use the cached Structure via globalObject->m_.get(globalObject) when constructing instances
Expose constructors to Zig via an extern "C" function that returns the constructor from the LazyClassStructure
Provide an extern "C" Bun____toJS function that creates an instance using the cached Structure and returns an EncodedJSValue

Files:

src/bun.js/bindings/BunString.cpp

src/bun.js/bindings/**/*.cpp

📄 CodeRabbit inference engine (CLAUDE.md)

src/bun.js/bindings/**/*.cpp: When implementing a JavaScript class with a public constructor in C++, create Foo, FooPrototype, and FooConstructor classes
Define class properties using HashTableValue arrays in C++ bindings
Add iso subspaces for classes with C++ fields in JavaScriptCore bindings

Files:

src/bun.js/bindings/BunString.cpp

🧬 Code graph analysis (2)

test/cli/install/hosted-git-info/from-url.test.ts (2)

test/cli/install/hosted-git-info/cases.ts (2)

validGitUrls (26-2407)

invalidGitUrls (2409-2472)

src/js/internal-for-testing.ts (1)

hostedGitInfo (214-217)

test/cli/install/hosted-git-info/parse-url.test.ts (1)

src/js/internal-for-testing.ts (1)

hostedGitInfo (214-217)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Format

🔇 Additional comments (11)

src/comptime_string_map.zig (1)

307-312: Let me examine the actual file directly:

Add test coverage for the new getKey method.

This new public API method lacks test coverage. The existing test suite validates forward lookups (get, has, indexOf) but doesn't test the reverse lookup functionality.

Consider adding tests that verify:

Finding a key for an existing value

Returning null for a non-existent value

Behavior when multiple keys map to the same value (returns first in sorted order)

src/string/immutable.zig (7)

417-422: LGTM: lastIndexBeforeChar utility is correct and efficient.

476-481: LGTM: charAtT is bounds-safe and generic.

1028-1036: LGTM: case-insensitive suffix checks reuse existing ASCII-insensitive compare.

1561-1575: LGTM: countLeadingChar is correct and clear.

1815-1818: LGTM: trimSpaces wraps trim with shared whitespace set.

1825-1832: LGTM: containsUppercaseAscii is fine; consider SIMD later.

2084-2097: API change verified: all call sites properly handle bun.OOM error type.

The verification confirms that the concatWithLength and concat functions now correctly return bun.OOM![]u8, and all identified call sites are properly handling the error type:

Try-wrapped calls propagate the error to callers (e.g., src/linker.zig:311, src/cli/Arguments.zig:1101/1105/1109)

Unreachable assertions are used when allocation is guaranteed not to fail (e.g., src/string/immutable.zig:2131)

handleOom wrappers provide explicit error handling (e.g., src/install/hosted_git_info.zig:345)

Tail calls correctly propagate the error type (e.g., src/string/immutable.zig:2107)

No call sites ignore or mismatch the new return type.

src/js/internal-for-testing.ts (1)

214-217: LGTM: HostedGitInfo testing hooks correctly exposed.

Verification complete. The symbols TestingAPIs.jsParseUrl and TestingAPIs.jsFromUrl are correctly exported in src/install/hosted_git_info.zig (lines 1737 and 1767 respectively, within the TestingAPIs struct at line 1736). The filename-only reference pattern used here matches all other $newZigFunction calls throughout the file (e.g., patch.zig, shell.zig, socket.zig), confirming the bindings are correct.

src/bun.js/bindings/URL.zig (1)

19-33: URL fragment getters look good

Bindings mirror the C++ externs and match existing getter patterns. No issues found.

src/install/dependency.zig (1)

820-830: JS glue looks fine

Conversion to UTF‑8 and delegation to Tag.infer is straightforward.

coderabbitai · 2025-10-28T00:10:05Z

src/bun.js/bindings/BunString.cpp

+extern "C" BunString URL__hash(WTF::URL* url)
+{
+    const auto& fragment = url->fragmentIdentifier().isEmpty()
+        ? emptyString()
+        : url->fragmentIdentifierWithLeadingNumberSign().toString();
+    return Bun::toStringRef(fragment);
+}
+
+extern "C" BunString URL__fragmentIdentifier(WTF::URL* url)
+{
+    const auto& fragment = url->fragmentIdentifier().isEmpty()
+        ? emptyString()
+        : url->fragmentIdentifier().toString();
+    return Bun::toStringRef(fragment);
+}
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Fix namespace for emptyString() to avoid compile error

emptyString() is in the WTF namespace; unqualified calls will not compile here. Qualify with WTF::emptyString(). Optionally use toStringWithoutCopying() to avoid an extra copy.

- const auto& fragment = url->fragmentIdentifier().isEmpty() - ? emptyString() - : url->fragmentIdentifierWithLeadingNumberSign().toString(); + const auto& fragment = url->fragmentIdentifier().isEmpty() + ? WTF::emptyString() + : url->fragmentIdentifierWithLeadingNumberSign().toString(); @@ - const auto& fragment = url->fragmentIdentifier().isEmpty() - ? emptyString() - : url->fragmentIdentifier().toString(); + const auto& fragment = url->fragmentIdentifier().isEmpty() + ? WTF::emptyString() + : url->fragmentIdentifier().toString();

Nit: consider replacing both .toString() calls above with .toStringWithoutCopying() for consistency with nearby helpers.

🏁 Script executed:

#!/bin/bash # Check includes and namespace context at top of BunString.cpp echo "=== First 50 lines of BunString.cpp (includes/namespace) ===" head -50 src/bun.js/bindings/BunString.cpp echo -e "\n=== Lines 620-650 to see target code and context ===" sed -n '620,650p' src/bun.js/bindings/BunString.cpp echo -e "\n=== Check for 'using namespace' or 'using WTF' declarations ===" grep -n "using namespace\|using WTF" src/bun.js/bindings/BunString.cpp || echo "No 'using namespace/WTF' found" echo -e "\n=== Check for emptyString usage elsewhere in file ===" grep -n "emptyString()" src/bun.js/bindings/BunString.cpp | head -20

Length of output: 2795

🏁 Script executed:

#!/bin/bash # Check for toStringWithoutCopying usage and fragmentIdentifier() API echo "=== Search for toStringWithoutCopying in file ===" grep -n "toStringWithoutCopying" src/bun.js/bindings/BunString.cpp || echo "Not found in this file" echo -e "\n=== Search for fragmentIdentifier calls to understand return type ===" grep -n "fragmentIdentifier\|fragmentIdentifierWithLeadingNumberSign" src/bun.js/bindings/BunString.cpp echo -e "\n=== Check URL header for fragmentIdentifier() return type ===" grep -A 2 "fragmentIdentifier\|fragmentIdentifierWithLeadingNumberSign" /usr/include/wtf/URL.h 2>/dev/null | head -20 || echo "Standard header not available, checking includes in file"

Length of output: 795

Apply both the namespace fix and optional optimization for consistency

The unqualified emptyString() calls at lines 630 and 638 will fail to compile—the file does not have using namespace WTF; and already uses WTF::emptyString() elsewhere (lines 793, 883, 918). The fix is critical.

Additionally, other URL accessors in this file (protocol, query, host, path at lines 655, 680, 692, 720) use .toStringWithoutCopying(), so aligning fragmentIdentifier methods to the same pattern is a valid consistency improvement, not just a nit.

extern "C" BunString URL__hash(WTF::URL* url) { const auto& fragment = url->fragmentIdentifier().isEmpty() - ? emptyString() - : url->fragmentIdentifierWithLeadingNumberSign().toString(); + ? WTF::emptyString() + : url->fragmentIdentifierWithLeadingNumberSign().toStringWithoutCopying(); return Bun::toStringRef(fragment); } extern "C" BunString URL__fragmentIdentifier(WTF::URL* url) { const auto& fragment = url->fragmentIdentifier().isEmpty() - ? emptyString() - : url->fragmentIdentifier().toString(); + ? WTF::emptyString() + : url->fragmentIdentifier().toStringWithoutCopying(); return Bun::toStringRef(fragment); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

extern "C" BunString URL__hash(WTF::URL* url)

{

const auto& fragment = url->fragmentIdentifier().isEmpty()

? emptyString()

: url->fragmentIdentifierWithLeadingNumberSign().toString();

return Bun::toStringRef(fragment);

}

extern "C" BunString URL__fragmentIdentifier(WTF::URL* url)

{

const auto& fragment = url->fragmentIdentifier().isEmpty()

? emptyString()

: url->fragmentIdentifier().toString();

return Bun::toStringRef(fragment);

}

extern "C" BunString URL__hash(WTF::URL* url)

{

const auto& fragment = url->fragmentIdentifier().isEmpty()

? WTF::emptyString()

: url->fragmentIdentifierWithLeadingNumberSign().toStringWithoutCopying();

return Bun::toStringRef(fragment);

}

extern "C" BunString URL__fragmentIdentifier(WTF::URL* url)

{

const auto& fragment = url->fragmentIdentifier().isEmpty()

? WTF::emptyString()

: url->fragmentIdentifier().toStringWithoutCopying();

return Bun::toStringRef(fragment);

}

🤖 Prompt for AI Agents

In src/bun.js/bindings/BunString.cpp around lines 627-642, the code uses unqualified emptyString() (which will not compile here) and returns .toString()/toStringWithLeadingNumberSign(), which is inconsistent with other URL accessors; change emptyString() to WTF::emptyString() at both sites and replace .toString()/toStringWithLeadingNumberSign().toString() with .toStringWithoutCopying() (or .toStringWithoutCopyingWithLeadingNumberSign() if available) so the fragment branches match the pattern used by protocol/query/host/path, then return Bun::toStringRef of that non-copying string.

coderabbitai · 2025-10-28T00:10:05Z

src/comptime_string_map.zig

+        /// Lookup the first-defined string key for a given value.
+        ///
+        /// Linear search.
+        pub fn getKey(value: V) ?[]const KeyType {


🧹 Nitpick | 🔵 Trivial

Clarify "first-defined" in documentation.

The documentation states "first-defined string key," but since kvs is sorted by length and then lexicographically (line 40), this method returns the first key in sorted order, not definition order. When multiple keys map to the same value, the result is deterministic but based on sorted order rather than the order keys were defined in the input list.

Consider revising to "first matching key (in sorted order)" for accuracy.

🤖 Prompt for AI Agents

In src/comptime_string_map.zig around lines 303 to 306, the doc comment "Lookup the first-defined string key" is misleading because the internal kvs array is sorted by length then lexicographically (see line 40), so the function actually returns the first matching key in sorted order, not definition order; update the doc comment to read something like "Lookup the first matching key (in sorted order)" and mention that when multiple keys map to the same value the choice is deterministic based on the map's sort order rather than the input definition order.

coderabbitai · 2025-10-28T00:10:05Z

src/install/dependency.zig

+            // Allocator necessary for slow paths.
+            var stackFallback = std.heap.stackFallback(1024, bun.default_allocator);
+            var allocator = stackFallback.get();
+


🧹 Nitpick | 🔵 Trivial

Minor: reduce duplicate slow-path parsing and allocations

Several branches dupe dependency and call HostedGitInfo.fromUrl just to select .github vs .git. Consider a small helper that:

accepts allocator and dependency,

returns { is_github: bool, is_shortcut: bool } or the provider enum,

encapsulates the dupe/free/fromUrl/deinit dance.

This removes repetition and centralizes edge-case handling.

Also applies to: 561-589, 616-640, 680-688, 708-717, 785-803

🤖 Prompt for AI Agents

In src/install/dependency.zig around lines 497-500 (and also affecting 561-589, 616-640, 680-688, 708-717, 785-803), multiple branches duplicate the slow-path parsing/allocations by calling HostedGitInfo.fromUrl, inspecting .github vs .git, and repeating free/deinit logic; introduce a small helper function (e.g., parseDependencyProvider) that accepts the allocator and the dependency string and returns a compact result (either {is_github: bool, is_shortcut: bool} or a provider enum) and internally runs HostedGitInfo.fromUrl, handles error paths, and performs deinit/free before returning; replace the duplicated blocks with calls to this helper and update callers to use the returned flags/enum, removing repeated allocation/deinit/free code and centralizing edge-case handling.

src/install/dependency.zig

coderabbitai · 2025-10-28T00:10:06Z

src/install/hosted_git_info.zig

+                    sb.count(user_part);
+                    sb.count(project);
+                    if (committish) |c| sb.count(c);
+
+                    sb.allocate(allocator) catch return null;
+
+                    const user_slice = blk: {
+                        const writable = sb.writable();
+                        var stream = std.io.fixedBufferStream(writable);
+                        const decoded_len = PercentEncoding.decode(
+                            @TypeOf(stream.writer()),
+                            stream.writer(),
+                            user_part,
+                        ) catch return null;
+                        sb.len += decoded_len;
+                        break :blk writable[0..decoded_len];
+                    };
+                    const project_slice = blk: {
+                        const writable = sb.writable();
+                        var stream = std.io.fixedBufferStream(writable);
+                        const decoded_len = PercentEncoding.decode(
+                            @TypeOf(stream.writer()),
+                            stream.writer(),
+                            project,
+                        ) catch return null;
+                        sb.len += decoded_len;
+                        break :blk writable[0..decoded_len];
+                    };
+                    const committish_slice = if (committish) |c| blk: {
+                        const writable = sb.writable();
+                        var stream = std.io.fixedBufferStream(writable);
+                        const decoded_len = PercentEncoding.decode(
+                            @TypeOf(stream.writer()),
+                            stream.writer(),
+                            c,
+                        ) catch return null;
+                        sb.len += decoded_len;
+                        break :blk writable[0..decoded_len];
+                    } else null;
+
+                    return .{
+                        .user = user_slice,
+                        .project = project_slice,
+                        .committish = committish_slice,
+                        ._owned_buffer = sb.allocatedSlice(),
+                        ._allocator = allocator,
+                    };
+                }


🧹 Nitpick | 🔵 Trivial

Consistency: reuse decodeAndAppend in sourcehut extractor

The sourcehut extractor inlines percent-decoding three times. For consistency and less code, consider using HostedGitInfo.decodeAndAppend like other extractors.

🤖 Prompt for AI Agents

In src/install/hosted_git_info.zig around lines 1467 to 1514, the sourcehut extractor duplicates percent-decoding logic for user, project, and committish; replace the three inline decoding blocks with calls to the existing HostedGitInfo.decodeAndAppend helper to avoid repetition and ensure consistency with other extractors. Specifically, after sb.allocate(allocator) use decodeAndAppend(sb, allocator, user_part) and decodeAndAppend(sb, allocator, project), and if committish is present call decodeAndAppend(sb, allocator, committish) (handling and propagating errors the same way decodeAndAppend does), then use sb.allocatedSlice() and sb._allocator as before to return the struct. Ensure sb.len updates and error handling are delegated to decodeAndAppend so the duplicate fixedBufferStream/PercentEncoding.decode code is removed.

coderabbitai · 2025-10-28T00:10:07Z

test/cli/install/hosted-git-info/cases.ts

+export const validGitUrls: { [K in Provider]: { [K in string]: object } } = {
+  bitbucket: {


🧹 Nitpick | 🔵 Trivial

Type tightening for dataset shape

The mapped type can be simplified to improve readability:

-export const validGitUrls: { [K in Provider]: { [K in string]: object } } = { +export const validGitUrls: Record<Provider, Record<string, Record<string, unknown>>> = {

If you prefer stricter value typing later, replace Record<string, unknown> with a dedicated interface.

🤖 Prompt for AI Agents

In test/cli/install/hosted-git-info/cases.ts around lines 26-27, the mapped type { [K in Provider]: { [K in string]: object } } is overly verbose and uses the broad object type; change the declaration to use Record for clarity, e.g. Record<Provider, Record<string, unknown>>, or swap the inner Record<string, unknown> for a dedicated interface if you want stricter value typing, and update the variable's type annotation accordingly.

coderabbitai · 2025-10-28T00:10:07Z

test/cli/install/hosted-git-info/cases.ts

+    // shortucts
+    //
+    // NOTE auth is accepted but ignored
+    "bitbucket:foo/bar": { ...defaults.bitbucket, default: "shortcut" },


🧹 Nitpick | 🔵 Trivial

Nit: fix typos in comments

“shortucts” → “shortcuts”

“protocos” → “protocols”

No behavior change, just clarity.

- // shortucts + // shortcuts @@ - // shortucts + // shortcuts @@ - // invalid protocos + // invalid protocols

Also applies to: 2358-2365, 2467-2471

🤖 Prompt for AI Agents

In test/cli/install/hosted-git-info/cases.ts around lines 28-31 (and also update occurrences at 2358-2365 and 2467-2471), there are typos in comment text: change “shortucts” to “shortcuts” and “protocos” to “protocols”; update each affected comment occurrence accordingly with no code behavior changes.

coderabbitai · 2025-10-28T00:10:08Z

test/cli/install/hosted-git-info/from-url.test.ts

+describe("fromUrl", () => {
+  describe("valid urls", () => {
+    describe.each(Object.entries(validGitUrls))("%s", (_, urlset: object) => {
+      it.each(Object.entries(urlset))("parses %s", (url, expected) => {
+        expect(hostedGitInfo.fromUrl(url)).toMatchObject({
+          ...(expected.type && { type: expected.type }),
+          ...(expected.domain && { domain: expected.domain }),
+          ...(expected.user && { user: expected.user }),
+          ...(expected.project && { project: expected.project }),
+          ...(expected.committish && { committish: expected.committish }),
+          ...(expected.default && { default: expected.default }),
+        });
+      });
+    });
+  });


🧹 Nitpick | 🔵 Trivial

Prefer concurrent tests; tighten TS typing for expected.

Use it.concurrent.each for faster runs.

Replace loose object typing with Record<string, any> to avoid TS property access warnings in editors.

Apply:

-describe.each(Object.entries(validGitUrls))("%s", (_, urlset: object) => { - it.each(Object.entries(urlset))("parses %s", (url, expected) => { +describe.each(Object.entries(validGitUrls))("%s", (_, urlset: Record<string, any>) => { + it.concurrent.each(Object.entries(urlset))("parses %s", (url, expected: any) => {

🤖 Prompt for AI Agents

In test/cli/install/hosted-git-info/from-url.test.ts around lines 5 to 19, change the test to use concurrent runs and tighten the TypeScript types: replace it.each with it.concurrent.each to run tests in parallel, and replace the loose object typing for the iterated values with Record<string, any> (e.g., urlset: Record<string, any> and expected: Record<string, any>) so property access like expected.type does not emit TS warnings; keep the same assertion structure.

coderabbitai · 2025-10-28T00:10:08Z

test/cli/install/hosted-git-info/from-url.test.ts

+  // TODO(markovejnovic): Unskip these tests.
+  describe.skip("invalid urls", () => {
+    describe.each(Object.entries(invalidGitUrls))("%s", (_, urls: (string | null | undefined)[]) => {
+      it.each(urls)("does not permit %s", url => {
+        expect(() => {
+          hostedGitInfo.fromUrl(url);
+        }).toThrow();
+      });
+    });
+  });
+});


🧹 Nitpick | 🔵 Trivial

Track enabling invalid-url tests; consider returning null instead of throw.

Tests are skipped. Decide whether fromUrl should throw or return null for invalid inputs and align tests accordingly. Document behavior either way.

I can adjust tests once the desired behavior is confirmed.

🤖 Prompt for AI Agents

In test/cli/install/hosted-git-info/from-url.test.ts around lines 21-31, the invalid-url tests are skipped and the review asks to standardize behavior; decide to have hostedGitInfo.fromUrl return null for invalid inputs, then unskip these tests and change their expectations to assert returned value is null (instead of expecting a throw), and update the fromUrl implementation and its JSDoc/type definitions to explicitly return null for invalid or non-matching inputs and add a brief note in the module README describing this behavior.

coderabbitai · 2025-10-28T00:10:08Z

test/cli/install/hosted-git-info/parse-url.test.ts

+/**
+ * Mimics https://github.com/npm/hosted-git-info/blob/main/test/parse-url.js
+ */
+import { hostedGitInfo } from "bun:internal-for-testing";
+import { describe, expect, it } from "bun:test";
+
+const okCases = [
+  // These come straight out of the hosted-git-info tests
+  "git+ssh://git@abc:frontend/utils.git#6d45447e0c5eb6cd2e3edf05a8c5a9bb81950c79",
+  // These are custom cases added for Bun
+  "ssh://:password@bitbucket.org:foo/bar.git",
+  "git@bitbucket.org:foo/bar",
+  "gist:user:password@/feedbeef#branch",
+  "github:foo/bar#branch with space",
+];
+
+describe("parseUrl", () => {
+  it.each(okCases)("parses %s", url => {
+    expect(hostedGitInfo.parseUrl(url)).not.toBeNull();
+  });
+});


🧹 Nitpick | 🔵 Trivial

Make tests concurrent and centralize cases.

Prefer concurrent tests for speed.

Consider moving okCases into cases.ts for consistency with from-url tests.

Apply:

-import { describe, expect, it } from "bun:test"; +import { describe, expect, it } from "bun:test"; @@ -describe("parseUrl", () => { - it.each(okCases)("parses %s", url => { +describe("parseUrl", () => { + it.concurrent.each(okCases)("parses %s", url => { expect(hostedGitInfo.parseUrl(url)).not.toBeNull(); }); });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

/**

* Mimics https://github.com/npm/hosted-git-info/blob/main/test/parse-url.js

*/

import { hostedGitInfo } from "bun:internal-for-testing";

import { describe, expect, it } from "bun:test";

const okCases = [

// These come straight out of the hosted-git-info tests

"git+ssh://git@abc:frontend/utils.git#6d45447e0c5eb6cd2e3edf05a8c5a9bb81950c79",

// These are custom cases added for Bun

"ssh://:password@bitbucket.org:foo/bar.git",

"git@bitbucket.org:foo/bar",

"gist:user:password@/feedbeef#branch",

"github:foo/bar#branch with space",

];

describe("parseUrl", () => {

it.each(okCases)("parses %s", url => {

expect(hostedGitInfo.parseUrl(url)).not.toBeNull();

});

});

/**

* Mimics https://github.com/npm/hosted-git-info/blob/main/test/parse-url.js

*/

import { hostedGitInfo } from "bun:internal-for-testing";

import { describe, expect, it } from "bun:test";

const okCases = [

// These come straight out of the hosted-git-info tests

"git+ssh://git@abc:frontend/utils.git#6d45447e0c5eb6cd2e3edf05a8c5a9bb81950c79",

// These are custom cases added for Bun

"ssh://:password@bitbucket.org:foo/bar.git",

"git@bitbucket.org:foo/bar",

"gist:user:password@/feedbeef#branch",

"github:foo/bar#branch with space",

];

describe("parseUrl", () => {

it.concurrent.each(okCases)("parses %s", url => {

expect(hostedGitInfo.parseUrl(url)).not.toBeNull();

});

});

🤖 Prompt for AI Agents

In test/cli/install/hosted-git-info/parse-url.test.ts lines 1-21, the test should run concurrently and reuse centralized test cases; replace the local okCases array by importing the shared cases from test/cli/install/hosted-git-info/cases.ts (or appropriate cases.ts used by other from-url tests) and change the test runner from it.each to it.concurrent.each (or the Bun equivalent) so the table-driven tests run in parallel for speed while reusing the canonical cases file.

feat(ENG-21324): Implement hosted_git_info.zig

f9e4acf

coderabbitai bot reviewed Oct 28, 2025

View reviewed changes

feat(ENG-21327): Integrate hosted_git_info.zig into dependency.zig

ae765a0

markovejnovic force-pushed the user/markovejnovic/hosted-git-info branch from de14245 to ae765a0 Compare October 28, 2025 17:23

		export const validGitUrls: { [K in Provider]: { [K in string]: object } } = {
		bitbucket: {

Uh oh!

feat(ENG-21324): Implement hosted_git_info.zig #24138

Are you sure you want to change the base?

feat(ENG-21324): Implement hosted_git_info.zig #24138

Conversation

markovejnovic commented Oct 28, 2025

What does this PR do?

How did you verify your code works?

Uh oh!

linear bot commented Oct 28, 2025

Uh oh!

robobun commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robobun commented Oct 28, 2025 •

edited

Loading

coderabbitai bot commented Oct 28, 2025 •

edited

Loading