-
Couldn't load subscription status.
- Fork 38
feat: add protein_qa generation #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a protein QA generation pipeline by refactoring the knowledge graph building architecture. The changes consolidate separate text and multi-modal KG building functions into a unified build_kg function that handles both types internally, and adds scaffolding for multi-omics (protein) knowledge graph extraction.
Key Changes:
- Consolidated
build_text_kgandbuild_mm_kginto a singlebuild_kgoperator that dispatches based on chunk type - Refactored
GraphGenclass from dataclass to standard class with__init__method - Added placeholder implementations for multi-omics KG builder (
MOKGBuilder) and operator (build_mo_kg)
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| graphgen/operators/build_kg/build_mo_kg.py | Adds new multi-omics KG building operator with placeholder implementation |
| graphgen/operators/build_kg/build_kg.py | Creates unified KG building function that routes text vs multi-modal chunks to appropriate handlers |
| graphgen/operators/build_kg/init.py | Updates exports to expose only the new unified build_kg function |
| graphgen/operators/init.py | Updates imports to use consolidated build_kg instead of separate functions |
| graphgen/models/kg_builder/mo_kg_builder.py | Adds multi-omics KG builder class with placeholder extraction logic |
| graphgen/graphgen.py | Refactors GraphGen from dataclass to regular class and simplifies document insertion to use unified KG builder |
| graphgen/configs/protein_qa_config.yaml | Adds configuration file for protein QA pipeline with anchor-based partitioning |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| :param kg_instance | ||
| :param chunks |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing descriptions for the kg_instance and chunks parameters in the docstring. These should document the expected types and purposes of these parameters.
| :param kg_instance | |
| :param chunks | |
| :param kg_instance: BaseGraphStorage instance where the multi-omics knowledge graph will be merged. | |
| :param chunks: List of Chunk objects representing the input data to extract entities and relationships from. |
| :param kg_instance | ||
| :param chunks |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing descriptions for the kg_instance and chunks parameters in the docstring. These should document the expected types and purposes of these parameters.
| :param kg_instance | |
| :param chunks | |
| :param kg_instance: BaseGraphStorage instance where the extracted knowledge graph will be merged. | |
| :param chunks: List of Chunk objects to process for entity and relation extraction. |
| Step2: Get more details about the protein by querying external databases if necessary. | ||
| Step3: Construct entities and relationships for the protein knowledge graph. | ||
| Step4: Return the entities and relationships. | ||
| :param chunk |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing description for the chunk parameter in the docstring. Should document the expected type and purpose of this parameter.
| :param chunk | |
| :param chunk: Chunk: The input data chunk containing information to extract protein entities and relationships from. |
| progress_bar=self.progress_bar, | ||
| ) | ||
| if not _add_entities_and_relations: | ||
| logger.warning("No entities or relations extracted from text chunks") |
Copilot
AI
Oct 24, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning message refers to 'text chunks' but this code path handles all chunk types (both text and multi-modal). The message should be updated to 'No entities or relations extracted from chunks' to accurately reflect the unified processing.
| logger.warning("No entities or relations extracted from text chunks") | |
| logger.warning("No entities or relations extracted from chunks") |
This PR aims for the protein QA pipeline. It will help standardize and streamline protein-related QA generation tasks.