Skip to content

Conversation

@ChenZiHong-Gavin
Copy link
Collaborator

This PR aims for the protein QA pipeline. It will help standardize and streamline protein-related QA generation tasks.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a protein QA generation pipeline by refactoring the knowledge graph building architecture. The changes consolidate separate text and multi-modal KG building functions into a unified build_kg function that handles both types internally, and adds scaffolding for multi-omics (protein) knowledge graph extraction.

Key Changes:

  • Consolidated build_text_kg and build_mm_kg into a single build_kg operator that dispatches based on chunk type
  • Refactored GraphGen class from dataclass to standard class with __init__ method
  • Added placeholder implementations for multi-omics KG builder (MOKGBuilder) and operator (build_mo_kg)

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
graphgen/operators/build_kg/build_mo_kg.py Adds new multi-omics KG building operator with placeholder implementation
graphgen/operators/build_kg/build_kg.py Creates unified KG building function that routes text vs multi-modal chunks to appropriate handlers
graphgen/operators/build_kg/init.py Updates exports to expose only the new unified build_kg function
graphgen/operators/init.py Updates imports to use consolidated build_kg instead of separate functions
graphgen/models/kg_builder/mo_kg_builder.py Adds multi-omics KG builder class with placeholder extraction logic
graphgen/graphgen.py Refactors GraphGen from dataclass to regular class and simplifies document insertion to use unified KG builder
graphgen/configs/protein_qa_config.yaml Adds configuration file for protein QA pipeline with anchor-based partitioning

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +19 to +20
:param kg_instance
:param chunks
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing descriptions for the kg_instance and chunks parameters in the docstring. These should document the expected types and purposes of these parameters.

Suggested change
:param kg_instance
:param chunks
:param kg_instance: BaseGraphStorage instance where the multi-omics knowledge graph will be merged.
:param chunks: List of Chunk objects representing the input data to extract entities and relationships from.

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +24
:param kg_instance
:param chunks
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing descriptions for the kg_instance and chunks parameters in the docstring. These should document the expected types and purposes of these parameters.

Suggested change
:param kg_instance
:param chunks
:param kg_instance: BaseGraphStorage instance where the extracted knowledge graph will be merged.
:param chunks: List of Chunk objects to process for entity and relation extraction.

Copilot uses AI. Check for mistakes.
Step2: Get more details about the protein by querying external databases if necessary.
Step3: Construct entities and relationships for the protein knowledge graph.
Step4: Return the entities and relationships.
:param chunk
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing description for the chunk parameter in the docstring. Should document the expected type and purpose of this parameter.

Suggested change
:param chunk
:param chunk: Chunk: The input data chunk containing information to extract protein entities and relationships from.

Copilot uses AI. Check for mistakes.
progress_bar=self.progress_bar,
)
if not _add_entities_and_relations:
logger.warning("No entities or relations extracted from text chunks")
Copy link

Copilot AI Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning message refers to 'text chunks' but this code path handles all chunk types (both text and multi-modal). The message should be updated to 'No entities or relations extracted from chunks' to accurately reflect the unified processing.

Suggested change
logger.warning("No entities or relations extracted from text chunks")
logger.warning("No entities or relations extracted from chunks")

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants