[DMP 2025]: AI Code generation for lesson plans and model abstraction layer

### Ticket Contents

## Description

This feature aims to develop and integrate an AI-powered code generation system within the Music Blocks platform to automatically produce project code snippets from lesson plans. The goal is to make it easier for educators and students to create, understand, and extend Music Blocks projects using natural language inputs.

By training an open-source Large Language Model (LLM) on curated lesson plans and project data, the system will allow seamless generation of Music Blocks-compatible code. A model abstraction layer will be introduced to ensure flexibility—so different AI models can be plugged in over time without changing the core application logic.

This will make the Music Blocks platform more accessible, intuitive, and sustainable for future generations of learners, educators, and developers.


### Goals & Mid-Point Milestone

## Goals

- [ ] Train an open-source Large Language Model (LLM) to generate Music Blocks project code from lesson plans.
- [ ] Implement a model abstraction layer to keep the AI system flexible and model-agnostic.
- [ ] Expand the dataset by adding more Music Blocks lesson plans and project metadata.
- [ ] Integrate Approximate Nearest Neighbor (ANN) algorithms for efficient code/context retrieval.
- [ ] Create and document FastAPI endpoints for deploying the AI model.
- [ ] Develop strategies and safeguards to minimize AI hallucinations and ensure accurate outputs.
- [ ] Document the technical setup, dataset structure, and contributor guidelines for future maintainers.

###  Goals Achieved By Mid-point Milestone (1.5 Months)

- [ ] LLM is selected, fine-tuned, and generates basic Music Blocks code from a prompt.
- [ ] A functional model abstraction layer is implemented with at least two interchangeable models.
- [ ] At least 30–50 new lesson plans or projects added to the dataset.
- [ ] Initial version of ANN-based retrieval is working with sample queries.
- [ ] Initial FastAPI service is set up and running locally or in a test environment.



### Setup/Installation

_No response_

### Expected Outcome

## Expected Outcome

The final product will be an integrated AI-assisted code generation system within the Music Blocks platform that empowers educators and learners to automatically generate Music Blocks project code from natural language lesson plans or prompts.

### Key Features and Behaviors:

-  **Code Generation Interface**  
  Users can input lesson objectives or prompts (e.g., "Create a loop-based melody with tempo variation") and receive ready-to-use Music Blocks code snippets.

-  **Model-Agnostic AI Backend**  
  A model abstraction layer allows swapping or upgrading LLMs (e.g., switching from a fine-tuned GPT-J to Mistral or another open-source model) without changing the front-end or API structure.

-  **Expanded and Searchable Dataset**  
  The system is trained on a diverse set of Music Blocks lesson plans and project files, organized and indexed for fast retrieval using Approximate Nearest Neighbor (ANN) algorithms.

-  **FastAPI Deployment**  
  The AI code generation engine is exposed via RESTful FastAPI endpoints that support local or cloud-based deployment, enabling integration with both the Music Blocks app and external tools.

-  **Reduced Hallucination & High Relevance**  
  Responses from the AI model are grounded in actual project data via Retrieval-Augmented Generation (RAG), minimizing hallucinations and ensuring accuracy.

-  **Well-Documented for Open Source**  
  All components—datasets, model training scripts, APIs, and abstraction layers—are clearly documented with setup guides and contribution instructions to help new contributors onboard quickly.

### Final Behavior Summary:

When a user (teacher, student, or developer) types a lesson idea, the system will:
1. Retrieve similar past lesson plans or code blocks using ANN.
2. Use the LLM to generate relevant Music Blocks code with annotations.
3. Display or export the code for immediate use or customization in the Music Blocks app.


### Acceptance Criteria

## Acceptance Criteria

The feature will be considered complete and accepted when the following criteria are met:

- [ ]  The open-source LLM is trained/fine-tuned and generates relevant Music Blocks code based on natural language prompts.
- [ ]  A model abstraction layer is implemented, tested, and allows switching between at least two different models without modifying the core API.
- [ ]  At least 50 high-quality lesson plans and project examples are added to the training dataset.
- [ ]  Approximate Nearest Neighbor (ANN)-based retrieval is integrated and improves code relevance through context-aware retrieval.
- [ ]  A FastAPI backend is available with endpoints for:
  - Prompt submission
  - Code snippet retrieval
  - Model switching (optional)
- [ ]  The AI-generated code runs without errors in the Music Blocks application and is pedagogically meaningful.
- [ ]  A technical guide and contributor documentation is available in the repository or wiki.
- [ ]  Unit tests and integration tests are written for major components (model, API, abstraction layer).
- [ ]  Clear instructions for setup, usage, and contribution are provided for future developers and contributors.
- [ ]  AI hallucinations are minimized by using Retrieval-Augmented Generation (RAG) or other grounding techniques.

### Implementation Details

## Implementation Details

The implementation of this feature will involve several technical components across AI model training, API development, and system integration. Below are the key details:

###  Technologies & Tools
- **Programming Languages**: Python (backend, AI), JavaScript (Music Blocks frontend)
- **Frameworks/Libraries**:
  - **FastAPI** – for building RESTful APIs to serve the AI model
  - **Transformers (Hugging Face)** – for working with and fine-tuning open-source LLMs (e.g., Mistral, GPT-J, LLaMA)
  - **Faiss or Annoy** – for implementing Approximate Nearest Neighbor (ANN) search for retrieving similar lesson plans
  - **LangChain or Haystack (optional)** – for building Retrieval-Augmented Generation (RAG) pipelines
  - **Pandas, NumPy** – for data handling and preprocessing
  - **Docker** – for containerizing the application and model server
- **Deployment Tools**: Uvicorn (FastAPI server), optionally Hugging Face Spaces or local server

###  AI Model Training
- Use an open-source LLM as the base (e.g., GPT-J, Mistral, Phi-2).
- Fine-tune the model on a curated dataset of Music Blocks lesson plans and project descriptions.
- Preprocess lesson plans and code into prompt–completion pairs for supervised fine-tuning.

###  Model Abstraction Layer
- Design an abstraction class/interface (e.g., `ModelInterface`) with methods like `generate_code(prompt)` and `get_model_info()`.
- Implement separate adapters for each model backend (e.g., `MistralAdapter`, `GPTJAdapter`).
- Allow dynamic switching of models via config or API call without affecting the front-end logic.

###  Approximate Nearest Neighbor (ANN) Integration
- Convert lesson plans and code examples into embeddings using SentenceTransformers or OpenAI-compatible models.
- Index embeddings using FAISS or Annoy for fast similarity search.
- Use top-k retrieved examples as additional context in RAG.

###  API Layer
- Expose endpoints like:
  - `POST /generate` – accepts a prompt and returns generated code
  - `GET /lesson-plan/<id>` – returns code from stored lesson plan
  - `POST /switch-model` – switches between supported models (if enabled)
- Validate and sanitize all input/output.

###  Testing & Validation
- Write unit tests for all components (model adapters, API routes, retrieval logic).
- Evaluate model output quality using real lesson plan prompts and rubric for relevance, correctness, and usability.
- Include regression tests to prevent degradation when updating models.

###  Documentation
- Provide setup instructions, API usage examples, and dataset schema in the project wiki or README.
- Include a contributor guide to onboard new developers easily.


### Mockups/Wireframes

_No response_

### Product Name

Music Blocks AI Composer

### Organisation Name

Sugar Labs

### Domain

⁠Education

### Tech Skills Needed

Artificial Intelligence

### Mentor(s)

@walterbender  
@sumitsrv (Sumit Srivastava)  
@devinulibarri  

This issue proposes a high-impact feature to enhance Music Blocks with AI-powered code generation and model-agnostic architecture. The feature aligns with the mission of Sugar Labs and Music Blocks to make learning engaging, creative, and accessible.

I request your review and guidance on this ticket. If approved, this would be a great issue for GSoC contributors or open-source AI enthusiasts interested in the intersection of education, music, and AI. Kindly assign the issue if suitable or suggest modifications.

Thank you for your support and mentorship!
Joshitha Chennamsetty

### Category

AI

Uh oh!

Uh oh!

[DMP 2025]: AI Code generation for lesson plans and model abstraction layer #4671

Description

Ticket Contents

Description

Goals & Mid-Point Milestone

Goals

Goals Achieved By Mid-point Milestone (1.5 Months)

Setup/Installation

Expected Outcome

Expected Outcome

Key Features and Behaviors:

Final Behavior Summary:

Acceptance Criteria

Acceptance Criteria

Implementation Details

Implementation Details

Technologies & Tools

AI Model Training

Model Abstraction Layer

Approximate Nearest Neighbor (ANN) Integration

API Layer

Testing & Validation

Documentation

Mockups/Wireframes

Product Name

Organisation Name

Domain

Tech Skills Needed

Mentor(s)

Category

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions