Skip to content

Conversation

@maxi297
Copy link
Contributor

@maxi297 maxi297 commented Nov 7, 2025

…add sensible defaults

What

We've seen auth error being flagged system_error (example with source-google-analytics-data-api). It seems like we only support validating the oauth response if a refresh_token_updater is defined.

How

I've moved the refresh_token_error on the root of the OAuthAuthenticatorModel. I've also added sensible defaults so that we don't have to configure this for all the connectors.

Summary by CodeRabbit

  • New Features

    • OAuth2 authenticator gains three configurable refresh-token error settings: status codes (default [400]), error key (default "error"), and error values (default ["invalid_grant","invalid_permissions"]).
  • Deprecations

    • Refresh-token error configuration fields in nested components are deprecated; use the new top-level OAuth authenticator fields instead.

@github-actions github-actions bot added the enhancement New feature or request label Nov 7, 2025
@github-actions
Copy link

github-actions bot commented Nov 7, 2025

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/support_config_error_on_oauth_authentication#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/support_config_error_on_oauth_authentication

Helpful Resources

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment

📝 Edit this welcome message.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 7, 2025

📝 Walkthrough

Walkthrough

The PR promotes refresh-token error configuration from nested RefreshTokenUpdater to the top-level OAuthAuthenticator, adding three new fields (refresh_token_error_status_codes, refresh_token_error_key, refresh_token_error_values) and marking the nested fields as deprecated.

Changes

Cohort / File(s) Summary
OAuth Authenticator Implementation
airbyte_cdk/sources/declarative/auth/oauth.py
Added three new fields to DeclarativeOauth2Authenticator: refresh_token_error_status_codes: Tuple[int, ...], refresh_token_error_key: str, refresh_token_error_values: Tuple[str, ...]; updated __post_init__ to pass these values to super().__init__.
Declarative Component Schema (YAML)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml
Added refresh_token_error_status_codes, refresh_token_error_key, and refresh_token_error_values properties to OAuthAuthenticator with defaults ([400], "error", ["invalid_grant","invalid_permissions"]) and marked the corresponding nested RefreshTokenUpdater fields as deprecated in titles.
Pydantic Model Schema
airbyte_cdk/sources/declarative/models/declarative_component_schema.py
Introduced refresh_token_error_status_codes: Optional[List[int]], refresh_token_error_key: Optional[str], and refresh_token_error_values: Optional[List[str]] on OAuthAuthenticator (with defaults) and updated RefreshTokenUpdater field titles to indicate deprecation.
Model → Component Factory
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
Added _get_refresh_token_error_information(model) static helper to centralize retrieval/prioritization of refresh-token error config (from RefreshTokenUpdater or OAuthAuthenticator) with defaults; updated OAuth authenticator creation paths to use this helper.

Sequence Diagram(s)

sequenceDiagram
    participant Factory as ModelToComponentFactory
    participant Model as OAuthAuthenticatorModel
    participant RTU as RefreshTokenUpdaterModel
    participant Auth as DeclarativeOauth2Authenticator

    Factory->>Factory: _get_refresh_token_error_information(model)
    alt RTU defines error config (deprecated)
        Factory->>RTU: read refresh_token_error_status_codes/key/values
        RTU-->>Factory: refresh-token error info (deprecated source)
    else OAuthAuthenticator defines error config
        Factory->>Model: read refresh_token_error_status_codes/key/values
        Model-->>Factory: refresh-token error info (authenticator-level)
    else Neither defined
        Factory->>Factory: use defaults [400], "error", ["invalid_grant","invalid_permissions"]
    end
    Factory->>Auth: __init__(..., refresh_token_error_status_codes=..., refresh_token_error_key=..., refresh_token_error_values=...)
    Auth-->>Factory: authenticator instance
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Heterogeneous edits: schema (YAML), pydantic models, runtime factory logic, and authenticator class changes.
  • Areas to review closely:
    • Does the factory correctly prioritize OAuthAuthenticator fields over deprecated RefreshTokenUpdater fields when both are present, and is that precedence intentional—wdyt?
    • Are the default values ([400], "error", ["invalid_grant","invalid_permissions"]) consistent across YAML, model, and runtime code?
    • Confirm type choices (tuples vs lists) and their use in DeclarativeOauth2Authenticator/superclass match expected typing and serialization behavior.

Possibly related PRs

Suggested labels

oauth, declarative_oauth

Suggested reviewers

  • bazarnov
  • bnchrch
  • dbgold17

Would you like me to run a quick grep of occurrences for the old nested field names to ensure no references were missed, wdyt?

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly conveys the main change: moving refresh_token_error configuration to the OAuthAuthenticatorModel level, which aligns with the PR's primary objective of improving error classification.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch maxi297/support_config_error_on_oauth_authentication

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c74c37a and b553004.

📒 Files selected for processing (1)
  • airbyte_cdk/sources/declarative/auth/oauth.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • airbyte_cdk/sources/declarative/auth/oauth.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-shopify
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/auth/oauth.py (1)

83-89: The comment on line 83 seems misleading, wdyt?

The comment mentions "Convert lists to tuples for parent class compatibility", but no conversion is actually happening in this __post_init__ method. The values are passed directly to the parent class as-is.

Looking at the factory code in model_to_component_factory.py, the conversion from lists to tuples happens in the _get_refresh_token_error_information helper method before the dataclass is instantiated. Would it be clearer to either:

  1. Remove this comment entirely since it describes work done elsewhere, or
  2. Rephrase it to something like "Pass refresh token error parameters to parent class"?
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

2928-2930: Consider documenting the default values more prominently, wdyt?

The hardcoded defaults (400,), "error", ("invalid_grant", "invalid_permissions") represent a behavior change from the previous implementation. Previously, if no refresh_token_updater was defined, these values would be empty tuples/strings. Now, error checking will always happen with these sensible defaults.

While this aligns with the PR objective of providing "sensible defaults so connectors do not need explicit configuration," it might be worth:

  1. Adding a comment explaining why these specific values were chosen as defaults
  2. Mentioning this behavior in the docstring of create_oauth_authenticator or in the PR description

This would help future maintainers understand the rationale and make it easier to adjust if different defaults are needed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6504148 and c74c37a.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/declarative/auth/oauth.py (3 hunks)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (3 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (5 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2024-12-11T16:34:46.319Z
Learnt from: pnilan
Repo: airbytehq/airbyte-python-cdk PR: 0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the `declarative_component_schema.py` file is auto-generated from `declarative_component_schema.yaml` and should be ignored in the recommended reviewing order.

Applied to files:

  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
📚 Learning: 2024-11-18T23:40:06.391Z
Learnt from: ChristoGrab
Repo: airbytehq/airbyte-python-cdk PR: 58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.

Applied to files:

  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py
📚 Learning: 2025-01-14T00:20:32.310Z
Learnt from: aaronsteers
Repo: airbytehq/airbyte-python-cdk PR: 174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the `airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py` file, the strict module name checks in `_get_class_from_fully_qualified_class_name` (requiring `module_name` to be "components" and `module_name_full` to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.

Applied to files:

  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
🧬 Code graph analysis (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
  • RefreshTokenUpdater (398-440)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Check: source-shopify
  • GitHub Check: Check: destination-motherduck
  • GitHub Check: Check: source-pokeapi
  • GitHub Check: Check: source-intercom
  • GitHub Check: Check: source-hardcoded-records
  • GitHub Check: SDM Docker Image Build
  • GitHub Check: Manifest Server Docker Image Build
  • GitHub Check: Pytest (All, Python 3.11, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.13, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.12, Ubuntu)
  • GitHub Check: Pytest (All, Python 3.10, Ubuntu)
  • GitHub Check: Pytest (Fast)
🔇 Additional comments (10)
airbyte_cdk/sources/declarative/auth/oauth.py (1)

49-51: LGTM! Well-structured field additions.

The new refresh token error configuration fields are properly typed and documented. Using tuples for immutability is a good choice, and the empty defaults are sensible fallbacks.

Also applies to: 78-80

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (4)

21-21: LGTM! Necessary imports for the new functionality.

The Tuple and RefreshTokenUpdaterModel imports are correctly placed and required for the new helper method.

Also applies to: 404-406


2883-2906: Nice validation logic for mutual exclusivity!

The check ensuring refresh_token_error information is defined in only one place (either on RefreshTokenUpdaterModel or OAuthAuthenticatorModel) is excellent defensive programming. This prevents configuration errors and makes the deprecation path clear.


2893-2901: Clarification question: Is the truthy check for empty lists intentional?

The logic uses truthy checks to determine if error information is defined:

is_defined_on_refresh_token_updated = refresh_token_updater and (
    refresh_token_updater.refresh_token_error_status_codes
    or refresh_token_updater.refresh_token_error_key
    or refresh_token_updater.refresh_token_error_values
)

This means that:

  • refresh_token_error_status_codes = [] (empty list) is treated as "not defined"
  • refresh_token_error_key = "" (empty string) is treated as "not defined"

Is this the intended behavior? It seems reasonable (treating empty values as "not configured"), but it might be surprising to users who explicitly set empty lists/strings. If this is intentional, it might be worth adding a brief comment to clarify this behavior.


2796-2798: LGTM! Clean integration of the helper method.

The tuple unpacking and passing of refresh_token_error parameters to both authenticator types is consistent and correct. The values are properly extracted once via the helper method and reused for both DeclarativeSingleUseRefreshTokenOauth2Authenticator and DeclarativeOauth2Authenticator.

Also applies to: 2849-2851, 2878-2880

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

1430-1454: New OAuth refresh token error fields are well-defined with sensible defaults.

The three new top-level fields follow the existing schema patterns and the defaults ([400] for status codes, "error" for key, ["invalid_grant", "invalid_permissions"] for values) align well with common OAuth error scenarios. This eliminates the need for explicit configuration in most connectors.


1495-1519: Clear deprecation strategy with empty defaults on nested fields.

The nested fields are properly marked as deprecated with titles referencing the OAuthAuthenticator level. The empty defaults ([], "") are appropriate for deprecated fields—they avoid conflicting with the top-level values.

One question: Does the implementation logic ensure that when both nested and top-level fields are specified, the top-level takes precedence (or vice versa if that's the intended behavior)? This would be worth verifying in the implementation code to ensure smooth migration for users who might have configured these fields at the nested level. Wdyt?

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)

1-2: Note: This is an auto-generated file.

As noted in the learnings, this file is auto-generated from declarative_component_schema.yaml. The changes here reflect modifications to the source YAML file. If this PR includes the YAML file, that would be the more appropriate place to review the schema design decisions. The generated Python code here looks consistent with typical Pydantic model patterns.


423-440: Backward compatibility is properly handled via explicit factory migration logic—no action needed.

The concern is addressed. The _get_refresh_token_error_information() method in model_to_component_factory.py (lines 2884-2929) explicitly handles the migration between deprecated RefreshTokenUpdater fields and new OAuthAuthenticator fields.

The strategy:

  • If fields are defined on both, a ValueError is raised
  • The factory prioritizes values from RefreshTokenUpdater if set, otherwise from OAuthAuthenticator, with fallback defaults of (400,), "error", ("invalid_grant", "invalid_permissions")
  • The intentional default discrepancy ([] vs [400]) preserves old opt-in behavior while enabling sensible defaults for new adopters

The title-based deprecation (without deprecated=True) is paired with factory-level enforcement, which is sufficient for this codebase's migration pattern.


1901-1918: Defaults look good—these fields are intentionally shifting from opt-in to opt-out, and the values align with OAuth2 standards.

The verification shows these are new fields on OAuthAuthenticator that replace the deprecated equivalents on RefreshTokenUpdater (which had empty defaults [], "", etc.). The intentional behavioral shift from opt-in to opt-out makes sense here because:

  1. The defaults are RFC 6749–compliant: [400] is the standard HTTP status for OAuth errors, "error" is the standard response key per the spec, and ["invalid_grant", "invalid_permissions"] are the most common token refresh failures.

  2. Backward compatibility is preserved: The model factory's _get_refresh_token_error_information() method handles both old and new locations, prevents conflicts, and gracefully defaults to empty values when nothing is configured.

  3. The deprecation notices on RefreshTokenUpdater fields show this is a deliberate migration path, not an accidental breaking change.

Existing connectors without explicit refresh_token_updater configuration will now get automatic error detection out-of-the-box, which should generally be beneficial. The implementation is sound—this aligns well with the PR's goal of sensible defaults.

information is defined only once and return the right fields.
"""
refresh_token_updater = model.refresh_token_updater
is_defined_on_refresh_token_updated = refresh_token_updater and (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggle a bit with is_defined_on_refresh_token_updated and is_defined_on_oauth_authenticator because if someone wanted to disable the error handling to flag these as system_errors, they would have to set impossible values instead of defining them with none values. That being said, it seems like a weird case to handle and there is an escape path so I'm not too worried

if is_defined_on_refresh_token_updated:
not_optional_refresh_token_updater: RefreshTokenUpdaterModel = refresh_token_updater # type: ignore # we know from the condition that this is not None
return (
tuple(not_optional_refresh_token_updater.refresh_token_error_status_codes)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this will have the same behavior as the previous code but is a bit different. The previous code would pass refresh_token_updater.refresh_token_error_status_codes directly which felt very dangerous because the typing was different (Tuple[int, ...] vs Optional[List[int]]). I fear that the previous code could lead to None exception here if refresh_token_error_status_codes was not defined. I don't have signals to believe this actually happen in prod though...

@maxi297 maxi297 requested a review from tolik0 November 7, 2025 15:28
@github-actions
Copy link

github-actions bot commented Nov 7, 2025

PyTest Results (Fast)

1 212 tests   - 2 605   1 200 ✅  - 2 605   5m 8s ⏱️ - 1m 35s
    1 suites ±    0      11 💤  -     1 
    1 files   ±    0       1 ❌ +    1 

For more details on these failures, see this check.

Results for commit b553004. ± Comparison against base commit 6504148.

This pull request removes 2605 tests.
unit_tests.sources.declarative.parsers.test_model_to_component_factory ‑ test_api_budget
unit_tests.sources.declarative.parsers.test_model_to_component_factory ‑ test_api_budget_fixed_window_policy
unit_tests.sources.declarative.parsers.test_model_to_component_factory ‑ test_client_side_incremental
unit_tests.sources.declarative.parsers.test_model_to_component_factory ‑ test_client_side_incremental_with_partition_router
unit_tests.sources.declarative.parsers.test_model_to_component_factory ‑ test_config_with_defaults
unit_tests.sources.declarative.parsers.test_model_to_component_factory ‑ test_create_async_retriever
unit_tests.sources.declarative.parsers.test_model_to_component_factory ‑ test_create_composite_error_handler
unit_tests.sources.declarative.parsers.test_model_to_component_factory ‑ test_create_composite_error_handler_with_custom_error_handler
unit_tests.sources.declarative.parsers.test_model_to_component_factory ‑ test_create_concurrent_cursor_from_datetime_based_cursor[test_no_cursor_granularity_exception]
unit_tests.sources.declarative.parsers.test_model_to_component_factory ‑ test_create_concurrent_cursor_from_datetime_based_cursor[test_no_lookback_window0]
…

@github-actions
Copy link

github-actions bot commented Nov 7, 2025

PyTest Results (Full)

3 820 tests   3 807 ✅  11m 18s ⏱️
    1 suites     12 💤
    1 files        1 ❌

For more details on these failures, see this check.

Results for commit b553004.

or model.refresh_token_error_values
)
if is_defined_on_refresh_token_updated and is_defined_on_oauth_authenticator:
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests are failing because the default value are for OAuthAuthenticatorModel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants