-
Couldn't load subscription status.
- Fork 2.5k
Description
Describe the bug
A PipelineSnapshot is a dataclass representing a snapshot of a Pipeline at certain point in it's execution. It's meant to be an object that is easily serializable and deserializable so users can easily inspect the snapshot as well as restart a pipeline from it.
It appears we overlooked serializing and the pipeline_outputs before adding it to the PipelineSnapshot which can cause JSON serialization errors even when saving Haystack dataclasses.
For example this fails
from haystack.core.pipeline.breakpoint import _create_pipeline_snapshot, _save_pipeline_snapshot
snapshot = _create_pipeline_snapshot(
inputs={},
component_inputs={},
break_point=Breakpoint(component_name="comp2", snapshot_file_path=str(tmp_path)),
component_visits={"comp1": 1, "comp2": 0},
original_input_data={},
ordered_component_names=["comp1", "comp2"],
include_outputs_from={"comp1"},
pipeline_outputs={"comp1": {"result": Document(blob=ByteStream(data=b"test"))}},
)
with pytest.raises(TypeError):
_save_pipeline_snapshot(snapshot)NOTE: Please use this branch to reproduce the error since it contains a slight refactor to _create_pipeline_snapshot. To be clear this bug does exist in main and is not unique to the branch.
Update: Above branch has been merged into main so no need to use a special branch.
Error message
E TypeError: Object of type bytes is not JSON serializable
Expected behavior
For the pipeline snapshot to be successfully saved.
- We should use
_serialize_value_with_schemato serialize the pipeline outputs before adding it to the PipelineSnapshot like we do for the pipeline inputs. - Additionally when loading a pipeline snapshot in
Pipeline.runwe should deserialize the pipeline outputs with_deserialize_value_with_schemalike we do for pipeline inputs