- 
                Notifications
    
You must be signed in to change notification settings  - Fork 721
 
feat: Add full-featured Iceberg MERGE INTO conditional merges support and argument validation #3201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add full-featured Iceberg MERGE INTO conditional merges support and argument validation #3201
Conversation
      
        
              This comment was marked as outdated.
        
        
      
    
  This comment was marked as outdated.
      
        
              This comment was marked as outdated.
        
        
      
    
  This comment was marked as outdated.
| 
           Currently failing 1 Athena test: 
 
 Plan to fix on next commit.  | 
    
      
        
              This comment was marked as outdated.
        
        
      
    
  This comment was marked as outdated.
      
        
              This comment was marked as outdated.
        
        
      
    
  This comment was marked as outdated.
      
        
              This comment was marked as outdated.
        
        
      
    
  This comment was marked as outdated.
      
        
              This comment was marked as outdated.
        
        
      
    
  This comment was marked as outdated.
      
        
              This comment was marked as outdated.
        
        
      
    
  This comment was marked as outdated.
      
        
              This comment was marked as outdated.
        
        
      
    
  This comment was marked as outdated.
      
        
              This comment was marked as outdated.
        
        
      
    
  This comment was marked as outdated.
      
        
              This comment was marked as outdated.
        
        
      
    
  This comment was marked as outdated.
      
        
              This comment was marked as outdated.
        
        
      
    
  This comment was marked as outdated.
      
        
              This comment was marked as outdated.
        
        
      
    
  This comment was marked as outdated.
          AWS CodeBuild CI Report
 Powered by github-codebuild-logs, available on the AWS Serverless Application Repository  | 
    
| 
           Hi @pedrorfdez it looks like there is an error in   | 
    
          AWS CodeBuild CI Report
 Powered by github-codebuild-logs, available on the AWS Serverless Application Repository  | 
    
| "Cannot specify both merge_cols and merge_on_clause. Use either merge_cols for simple equality matching or merge_on_clause for custom logic." | ||
| ) | ||
| 
               | 
          ||
| if merge_on_clause and merge_match_nulls: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly better to also check if merge_cols - as it will validate that merge cols are actually set
| raise exceptions.InvalidArgumentValue(f"merge_conditional_clauses[{i}] must contain 'action' field.") | ||
| if clause["when"] not in ["MATCHED", "NOT MATCHED", "NOT MATCHED BY SOURCE"]: | ||
| raise exceptions.InvalidArgumentValue( | ||
| f"merge_conditional_clauses[{i}]['when'] must be one of ['MATCHED', 'NOT MATCHED', 'NOT MATCHED BY SOURCE']." | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe indicate in the error message that the operators are case sensitive
| ) | ||
| if clause["action"] not in ["UPDATE", "DELETE", "INSERT"]: | ||
| raise exceptions.InvalidArgumentValue( | ||
| f"merge_conditional_clauses[{i}]['action'] must be one of ['UPDATE', 'DELETE', 'INSERT']." | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe indicate in the error message that the action is case sensitive
| - 'update': Update matched rows and insert non-matched rows. | ||
| - 'ignore': Only insert non-matched rows. | ||
| - 'conditional_merge': Use custom conditional clauses for merge actions. | ||
| merge_conditional_clauses : List[dict], optional | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the type hint be list[_MergeClause]?
| UPDATE SET {", ".join([f'"{x}" = source."{x}"' for x in df.columns])}""" | ||
| else: | ||
| match_condition = "" | ||
| if merge_cols or merge_on_clause: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if code becomes unwieldy imho, maybe refactor to a helper function to formulate the conditions
Feature or Bugfix
Detail
MERGE INTOoperations in _write_iceberg.py/to_icebergmerge_on_clause: CustomONstatement in theMERGE INTO ... USING ... ON [custom_expression]to allow<,<=,>and>=operators. Until now, only column equality was allowed. Risk of having more than one match in target table is warned in stringdocs.merge_condition: Added new accepted valueconditional_mergemerge_conditional_clauses: List of dictionaries specifying custom conditional clauses for theMERGE INTOstatement.Each dictionary should have:
- 'when': One of ['MATCHED', 'NOT MATCHED', 'NOT MATCHED BY SOURCE']
- 'action': One of ['UPDATE', 'DELETE', 'INSERT']
- 'condition': (optional) Additional SQL condition for the clause
- 'columns': (optional) List of columns to update or insert
Used only when merge_condition is 'conditional_merge'.
merge_cols,merge_on_clause,merge_match_nulls,merge_condition,merge_conditional_clauses,Relates
This is the first draft of the implementation, feel free to suggest any changes in the approach. I am open to suggestions.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.