Pass Standard Tests #35
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR makes the necessary changes to make sure our integrations pass the standard tests offered in
langchain-tests.Changes include:
Previously, inserting documents with duplicate IDs could raise a unique constraint error and fail the entire batch. We now use batcherrors=True (https://python-oracledb.readthedocs.io/en/latest/user_guide/batch_statement.html#handling-data-errors ) so per-row errors don’t invalidate other inserts. Only successfully inserted IDs are returned.
Optional upsert behavior: Standard tests expect rows with duplicate IDs to be updated rather than erroring. To preserve backward compatibility, we introduced a constructor parameter
mutate_on_duplicate:False (default): preserve previous behavior (no updates on duplicate IDs).
True: update existing rows (texts, metadata, etc.) when duplicate IDs are provided.
New methods: Added
get_by_idsandaget_by_ids.ID handling and hashing
add_texts, we generate them viauuid.uuid4()and store a hashed version in a RAW column. Users need these generated ids to use indeleteorget_by_ids. To enable thisadd_textsis expected to return these generated ids.deleteorget_by_idsas we hash them again to search in the documents:This behaviour is fixed to return the unhashed versions.
similarity_searchfunctions returnedDocumentsdid not have theidfield as we did not have the original unhashed ids not saved to DB. To keep the table structure same for users with existing tables, these original ids are added to themetadatawith the key"__orcl_internal_doc_id", which is then used to returnDocumentsincluding theidfields.