Build an embeddable RAG Chatbot for your website using Cloudflare Workers. The JavaScript widget is stored locally for easy maintenance.
Data source: The content/ directory (Markdown) of your Hugo website repository website.
Model backend: Switchable between Gemini and Qwen (Tongyi Qianwen).
- Markdown -> Plain text -> Chunking -> Embedding -> Write to Vectorize
- /chat: Retrieve Top-K chunks + assemble prompt + call LLM to generate Chinese answers
- Returns source references (source, url)
- Embeddable frontend
widget.jsfor your Hugo site
See the repository tree (src/, scripts/).
-
Cloudflare account +
wranglerinstalled -
Create a Vectorize index (dimension consistent with
wrangler.toml, default 1024), and bind inwrangler.toml:[[vectorize]] binding = "VECTORIZE" index_name = "website-rag"
-
Set Secrets / Vars
wrangler secret put ADMIN_TOKEN # For Vectorize admin API wrangler secret put GOOGLE_API_KEY # If PROVIDER=gemini wrangler secret put QWEN_API_KEY # If PROVIDER=qwen # (Optional) wrangler secret put QWEN_BASE # (Optional) wrangler secret put QWEN_EMBED_MODEL
And set in
[vars]ofwrangler.toml:PROVIDER,EMBED_DIM,LLM_MODEL.
-
Install dependencies:
npm i
-
Local development (Cloudflare login required):
npm run dev
-
Deploy to Cloudflare:
npm run deploy
-
Save
widget.jslocally and ensure your site references it:- Reference the local
widget.jspath in your HTML.
- Reference the local
Run locally (Node 20+):
# Example:
PROVIDER=gemini \
GOOGLE_API_KEY=your_google_api_key \
ADMIN_TOKEN=your_admin_token \
WORKER_URL=https://<your-worker>.workers.dev \
CONTENT_DIR=../website/content \
BASE_URL=https://your-site.com \
EMBED_DIM=1024 \
npm run ingestOr switch to Qwen:
PROVIDER=qwen \
QWEN_API_KEY=your_qwen_api_key \
ADMIN_TOKEN=your_admin_token \
WORKER_URL=https://<your-worker>.workers.dev \
CONTENT_DIR=../website/content \
BASE_URL=https://your-site.com \
EMBED_DIM=1024 \
npm run ingestTip: Ensure the Vectorize index dimension (e.g., 1024) matches the embedding dimension.
In your Hugo template (e.g., layouts/partials/footer.html), add the following, referencing your local JavaScript path:
<script
src="/path/to/your/local/widget.js"
data-endpoint="https://<your-worker>.workers.dev"
defer
></script>This will display the chat widget in the bottom right corner of your site.
- Rerank: Call a rerank model on retrieval results to improve relevance.
- Chunking strategy: Optimize chunk length based on Chinese punctuation and headings.
- Source links: The mapping in
scripts/ingest.ts'stoUrlFromPathcan be further refined with Hugo routing rules. - Conversation memory: Integrate KV / D1 to store user chat history for summarization and compression.
To completely rebuild the vector index, follow these steps:
-
Clear the index:
# Use the admin API to clear all vector data curl -X DELETE "https://<your-worker-url>/admin/clear-all" \ -H "Authorization: Bearer $ADMIN_TOKEN"
-
Perform full reindex:
npm run full-reindex
This command will automatically clear the database and reindex all content.
-
ADMIN_TOKEN Permission Notes:
ADMIN_TOKENauthorizes admin operations (clear DB, batch upload, etc.)- Store securely, usually as an environment variable or Cloudflare Secret
- Has full DB read/write permissions—keep it safe
For first deployment, complete these steps:
-
Create Vectorize index:
# Create vector index in Cloudflare console # Or use wrangler command (if supported) wrangler vectorize create website-rag --dimensions=1024
-
Configure embedding dimension: In
wrangler.toml:[vars] EMBED_DIM = 1024 # Must match Vectorize index dimension PROVIDER = "qwen" # Or "gemini"
-
Initial run of indexing:
# Run after configuring all required env variables PROVIDER=qwen \ QWEN_API_KEY=your_qwen_api_key \ ADMIN_TOKEN=your_admin_token \ WORKER_URL=https://<your-worker>.workers.dev \ CONTENT_DIR=../website/content \ EMBED_DIM=1024 \ npm run ingest
Supports extracting and updating new Chinese/English bilingual blog content:
-
Extract new blogs: Use
manual-ingest.tsto extract new bilingual blogs, ensuring the vector DB contains the latest content.# Extract a single Chinese blog npm run manual-ingest ../../content/zh/blog/new-post/index.md # Extract a single English blog npm run manual-ingest ../../content/en/blog/new-post/index.md # Extract both Chinese and English versions npm run manual-ingest ../../content/zh/blog/new-post/index.md ../../content/en/blog/new-post/index.md
-
Update title dictionary: After adding new bilingual blogs, regenerate the title mapping file to support title translation.
npm run generate-titles
On initialization, the system processes bilingual blogs as follows:
-
Deduplication strategy:
- Scans all blogs under
content/zh/blog/andcontent/en/blog/ - For blogs with both Chinese and English versions, Chinese version is prioritized for vectorization
- Only extracts English version if Chinese is absent
- Each vector entry includes a
languagemetadata field for language filtering during retrieval
- Scans all blogs under
-
Title mapping file generation:
generate-title-dictionary.tsscans all blogs with both Chinese and English versions- Extracts
titleorTitlefrom frontmatter - Generates a mapping from Chinese to English titles
- Saves as both JSON and TypeScript:
src/rag/title-dictionary.jsonsrc/rag/title-dictionary.ts
-
Recommended bilingual blog update workflow:
For new bilingual blogs, follow this order:
# Step 1: Extract vector data npm run manual-ingest ../../content/zh/blog/new-post/index.md ../../content/en/blog/new-post/index.md # Step 2: Update title dictionary npm run generate-title-dict
This ensures the vector DB has the latest content and title translation works properly.
-
Language retrieval mechanism:
- On user query, the system filters by current page language (zh/en)
- Returns content in the corresponding language first; falls back to all languages if not found
- Supports auto-detecting language by URL path (
/en/for English, others for Chinese)
For updating the index with single or a few files:
- Upload a single file with script:
# Upload a specific Markdown file
npm run manual-ingest ../website/content/blog/new-post.md
# Upload multiple files
npm run manual-ingest file1.md file2.md file3.md-
Upload directly via API (advanced):
# Call the Worker's admin API directly curl -X POST "https://<your-worker-url>/admin/upsert" \ -H "Authorization: Bearer $ADMIN_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "items": [{ "id": "doc-1", "vector": [0.1, 0.2, ...], "text": "Document content", "source": "blog/example.md", "title": "Sample Article", "url": "https://your-site.com/blog/example/" }] }'
Full Cloudflare environment setup steps:
-
Wrangler authentication:
# Log in to Cloudflare wrangler login # Verify login status wrangler whoami
-
Configure wrangler.toml:
name = "website-rag-worker" main = "src/worker.ts" compatibility_date = "2024-07-01" # Environment variables [vars] PROVIDER = "qwen" # Model provider: gemini or qwen EMBED_DIM = 1024 # Embedding dimension, must match index LLM_MODEL = "qwen-turbo-latest" # LLM model name QWEN_EMBED_MODEL = "text-embedding-v4" # Qwen embedding model # Vectorize binding [[vectorize]] binding = "VECTORIZE" index_name = "website-rag" # Index name # Optional: KV storage binding (for chat memory, etc.) # [[kv_namespaces]] # binding = "CHAT_HISTORY" # id = "your-kv-namespace-id"
-
Set environment variables and secrets:
# Required secrets wrangler secret put ADMIN_TOKEN # Admin token # Set according to PROVIDER wrangler secret put GOOGLE_API_KEY # Gemini API key wrangler secret put QWEN_API_KEY # Qwen API key # Optional wrangler secret put QWEN_BASE # Custom Qwen API endpoint
-
Deploy and test:
# Build project npm run build # Local test npm run dev # Deploy to production npm run deploy
-
Billing notes:
- Vectorize: Billed by vector count and query times
- Workers: Billed by request count and CPU time
- KV (if used): Billed by storage and operation count
- Monitor usage and set appropriate limits
- Free tier available for development/testing
-
Embedding dimension mismatch: Ensure
EMBED_DIMmatches the Vectorize index dimension -
API quota exceeded: Adjust
MAX_CONCURRENT_EMBEDDINGSand batch size parameters -
Permission errors: Check if
ADMIN_TOKENis set correctly and not expired -
Network proxy: Set
https_proxyenvironment variable if needed
Apache License, Version 2.0