Skip to content

Conversation

@kaovilai
Copy link
Member

@kaovilai kaovilai commented Oct 21, 2025

Fixes ESO-234: NodeAgent was restarting every ~30s when External Secrets
Operator managed the cloud-credentials secret. ESO's metadata-only updates
were triggering unnecessary DPA reconciliations.

Changes:

  • Updated labelHandler.Update() to skip reconciliation for Secret objects
    when only metadata changes (ResourceVersion, annotations, etc.)
  • Added comprehensive unit tests for labelHandler covering all scenarios
  • Maintains backward compatibility for non-Secret resources

This prevents unnecessary NodeAgent daemonset updates while preserving
reconciliation for actual data or label changes.

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

Why the changes were made

How to test the changes made

Continuation of #1998 on a thawed oadp-dev branch.

Fixes ESO-234: NodeAgent was restarting every ~30s when External Secrets
Operator managed the cloud-credentials secret. ESO's metadata-only updates
were triggering unnecessary DPA reconciliations.

Changes:
- Updated labelHandler.Update() to skip reconciliation for Secret objects
  when only metadata changes (ResourceVersion, annotations, etc.)
- Added comprehensive unit tests for labelHandler covering all scenarios
- Maintains backward compatibility for non-Secret resources

This prevents unnecessary NodeAgent daemonset updates while preserving
reconciliation for actual data or label changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 21, 2025
Copy link
Contributor

@weshayutin weshayutin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @kaovilai ! Please do not cherry pick until 1.5.3. is GA :)

@openshift-ci
Copy link

openshift-ci bot commented Oct 21, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai, weshayutin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

// This filters out metadata-only updates (ResourceVersion, ManagedFields, etc.)
if reflect.DeepEqual(oldSecret.Data, newSecret.Data) &&
reflect.DeepEqual(oldSecret.StringData, newSecret.StringData) &&
reflect.DeepEqual(oldSecret.Labels, newSecret.Labels) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weshayutin although.. do we know if they are changing label every 30 seconds? are we supposed to ignore label changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple notes that I have would be.

  1. I could be wrong but it's my understanding the external secrets operator is tech-preview.
  2. I'm not the smartest guy in the world but it seems to me that ESO should check the secret but if no change is required, I think ANY metadata on the secret should remain UNCHANGED, hence why I moved the bug to ESO.
  3. I DO like being defensive in this case, and we either need to set it up and try/test or enquire w/ the ESO team re: labels.
  4. I don't think this is a priority for our attention atm based on my above understanding. I could be wrong though.

@kaovilai
Copy link
Member Author

/retest

ai-retester: The end-to-end test e2e-test-aws failed because the MySQL application KOPIA test timed out. Specifically the todolist container in pod todolist-1-vcz9l showed ContainersNotReady condition, failing to start after an extended period. This seems to be the primary cause, even though the *mysql container triggered warning messages regarding failed liveness probes too.

@kaovilai
Copy link
Member Author

/retest

ai-retester: The e2e test failed because the MySQL application KOPIA test timed out and ultimately failed due to the todolist container in pod todolist-1-9znqw repeatedly failing readiness checks (containers not ready, "PodInitializing"). The underlying issues may have been with the liveness probe failing for the mysql database indicating i/o timeout .

@openshift-ci
Copy link

openshift-ci bot commented Oct 23, 2025

@kaovilai: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants