This is an example Node.js application that utilizes embeddings and the LLaMA model for text retrieval and response generation. It processes a text corpus, generates embeddings for "chunks", and uses these embeddings to performa a "similarity search" in response to queries. The system consists of a node.js server that handles API requests and a p5.js sketch for client interaction.
server.js: Server file that handles API requests and integrates with the Replicate API.save-embeddings.js: Process a text file and generate embeddings.test-embeddings.js: Test the embeddings search functionality without all that client server stuff.embeddings.json: Precomputed embeddings generated from the text corpus.public/: p5.js sketch.env: API token
- Using open-source models for faster and cheaper text embeddings
- How to use retrieval augmented generation
- Install Dependencies
npm install- Set up the
.envfile with your Replicate API token:
REPLICATE_API_TOKEN=your_api_token_here- Generate the
embeddings.jsonfile by runningsave-embeddings.js. (You'll need to hard-code a text filename and adjust how the text is split up depending on the format of your data.)
const raw = fs.readFileSync('text-corpus.txt', 'utf-8');
let chunks = raw.split(/\n+/);node save-embeddings.js- Run the Server
node server.jsOpen browser to: http://localhost:3000 (or whatever port is specified.)