Skip to content

Missing serialization of pipeline_outputs when creating a PipelineSnapshot #9872

@sjrl

Description

@sjrl

Describe the bug
A PipelineSnapshot is a dataclass representing a snapshot of a Pipeline at certain point in it's execution. It's meant to be an object that is easily serializable and deserializable so users can easily inspect the snapshot as well as restart a pipeline from it.

It appears we overlooked serializing and the pipeline_outputs before adding it to the PipelineSnapshot which can cause JSON serialization errors even when saving Haystack dataclasses.

For example this fails

from haystack.core.pipeline.breakpoint import _create_pipeline_snapshot, _save_pipeline_snapshot

snapshot = _create_pipeline_snapshot(
    inputs={},
    component_inputs={},
    break_point=Breakpoint(component_name="comp2", snapshot_file_path=str(tmp_path)),
    component_visits={"comp1": 1, "comp2": 0},
    original_input_data={},
    ordered_component_names=["comp1", "comp2"],
    include_outputs_from={"comp1"},
    pipeline_outputs={"comp1": {"result": Document(blob=ByteStream(data=b"test"))}},
)

with pytest.raises(TypeError):
    _save_pipeline_snapshot(snapshot)

NOTE: Please use this branch to reproduce the error since it contains a slight refactor to _create_pipeline_snapshot. To be clear this bug does exist in main and is not unique to the branch.

Update: Above branch has been merged into main so no need to use a special branch.

Error message
E TypeError: Object of type bytes is not JSON serializable

Expected behavior
For the pipeline snapshot to be successfully saved.

  • We should use _serialize_value_with_schema to serialize the pipeline outputs before adding it to the PipelineSnapshot like we do for the pipeline inputs.
  • Additionally when loading a pipeline snapshot in Pipeline.run we should deserialize the pipeline outputs with _deserialize_value_with_schema like we do for pipeline inputs

Metadata

Metadata

Assignees

Labels

P1High priority, add to the next sprint

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions