Tip

Chat with a Document Using TypeScript, OpenAI, and Pinecone

Learn how to build a chatbot that uses a PDF document as a database, extracting text and answering questions using TypeScript, OpenAI, and Pinecone.

Lets dive into a high-level explainer on how you can use TypeScript to chat with a document. For this, we'll be employing two main tools: OpenAI and Pinecone.

We'll have 4 main steps that I'm going to walk you through:

  1. Document Setup: I've got a social media study in PDF format, about 75 pages, stored in S3.

  2. Data Extraction: We download this document and use a library called PDF parse to extract its data. This data is then split by paragraph or a set character limit. We also have logic to remove any chunk that's 10 words or less because, let's be real, there's not likely much meaningful information in such short snippets.

  3. Creating Embeddings: We then iterate over these chunks to create embeddings. I'll deep dive into what embeddings are in a more detailed video soon. These embeddings are prepared and upserted as vectors into Pinecone.

  4. Interacting with the Document: When a user poses a question, we craft a new embedding from that prompt. This is used to create a query request which is then sent to the Pinecone index. Pinecone returns the top five results. We construct a context prompt from these, feed it to OpenAI, and obtain a response to present to the user.

process image

I've set up two endpoints in Postman that will interact with this document AI:

  • Summary: This endpoint processes the document, splits the text into chunks, and upserts the data into Pinecone.
  • Chat: Using this, we can ask questions against our processed document. It grabs a user's query, makes an embedding, and fetches the top results from Pinecone to generate a relevant answer.