Next.js OpenAI Doc Search Starter
Next.js OpenAI Doc Search Starter
This starter project is designed to help you build a custom ChatGPT-powered document search engine using Next.js and OpenAI’s GPT-3. It allows you to search through your documents and find relevant content by leveraging the power of AI. Here are some key details about this starter:
Deployment: You can deploy this starter to Vercel, and the Supabase integration will automatically set up the required environment variables and configure your database schema. You’ll need to provide your own OpenAI API key to make it work seamlessly.
Technical Details:
- Build Time:
- Pre-process the knowledge base (your .mdx files in the pages folder).
- Store embeddings in PostgreSQL with pgvector.
During build time, a script called generate-embeddings is executed, which processes your .mdx files and performs the following tasks:
- Splits .mdx pages into sections.
- Creates and stores embeddings for these sections.
- Generates a checksum for each .mdx file to detect changes.
- Runtime:
- Perform vector similarity search to find relevant content.
- Inject content into OpenAI GPT-3 text completion prompts and stream the response to the client.
At runtime, when a user submits a question, the following sequence of tasks occurs:
- The query is received from the client.
- Edge Function creates an embedding for the query.
- Edge Function performs a vector similarity search in the database.
- Relevant document content is retrieved.
- Content is injected into an OpenAI GPT-3 prompt, and the response is streamed to the client.
Local Development:
- Configuration:
- Copy .env.example to .env.
- Set your OPENAI_KEY in the newly created .env file.
- Set NEXT_PUBLIC_SUPABASE_ANON_KEY and SUPABASE_SERVICE_ROLE_KEY.
- Start Supabase:
- Ensure Docker is installed and running.
- Run supabase start to start Supabase.
- Retrieve NEXT_PUBLIC_SUPABASE_ANON_KEY and SUPABASE_SERVICE_ROLE_KEY using supabase status.
- Start the Next.js App:
- In a new terminal window, run pnpm dev.
Using Your Custom .mdx Docs: Your documentation should be in .mdx format by default. You can convert existing markdown (.md) files by renaming them to .mdx. To regenerate embeddings, run pnpm run embeddings.
This starter is a powerful tool for building document search engines with AI capabilities. It leverages Next.js, PostgreSQL with pgvector, and OpenAI’s GPT-3 to provide a seamless search experience.
For more details, you can read the associated blog post on how ChatGPT was built for the Supabase Docs and explore the pgvector documentation for information on embeddings and vector similarity. Additionally, you can watch Greg’s “How I built this” video on the Rabbit Hole Syndrome YouTube Channel.
The project is licensed under Apache 2.0, making it open-source and flexible for customization. Happy building! 🚀