Document Question Answering Chatbot with the help of OpenAI, LangChain, VectorDB, and Gradio UI.

Just in touch with Karthikeyan Rathinam: Linkedin, GitHub, Youtube
Building an OpenAI Chatbot with Gradio UI Using LangChain
In this tutorial, we’ll walk through the process of creating a chatbot powered by OpenAI, integrated into a Gradio UI, and enhanced with LangChain for document handling. This powerful combination allows for intelligent document searching and question-answering capabilities.
Prerequisites
Before we dive into the code, make sure you have the following dependencies installed:
- langchain
- unstructured
- pandas
- chromadb
- tiktoken
- openai
- gradio
- adaptive
- pdf2image
- pytesseract
Create a requirements.txt
file with these dependencies and install them using:
requirements.txt
langchain
unstructured
pandas
chromadb
tiktoken
openai
gradio
adaptive
pdf2image
pytesseract
pip install -r requirements.txt
The Code
app.py
Import libraries and packages
import gradio as gr
from langchain.document_loaders import UnstructuredPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
import pinecone
import os
Set Secret key:
os.system("!sudo apt-get install tesseract-ocr")
os.system("!sudo apt-get install poppler-utils")
os.system("!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.10/index.html")
os.system("!pip install -qU pinecone-client")
Init Pinecone keys :
pinecone.init(
api_key=PINECONE_API_KEY,
environment=PINECONE_API_ENV
)
Set LLM and Chain :
llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
chain = load_qa_chain(llm, chain_type="stuff")
Load PDF Document :
def load_pdf_document(file_path):
loader = UnstructuredPDFLoader(file_path)
return loader.load()
PDF Document to Chunks:
def split_document_to_chunks(documents):
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
return text_splitter.split_documents(documents)
Search Document :
def documentsearch(texts, embeddings, index_name):
return Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)
LLM Responce Block:
def responces(query, docsearch):
docs = docsearch.similarity_search(query, include_metadata=True)
return chain.run(input_documents=docs, question=query)
Clear Chat:
def clear_chat():
global history
history = []
iface.update_chat([])
ChatBot Block :
docsearch = None
history = []
def chatbot(file, question):
global history
global docsearch
if file is not None:
data = load_pdf_document(file.name)
texts = split_document_to_chunks(data)
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
docsearch = documentsearch(texts, embeddings, index_name)
if docsearch is not None and question is not None:
history.append(("User", question))
response = responces(question, docsearch)
history.append(("Bot", response))
return history
iface = gr.Interface(fn=chatbot, inputs=["file", "text"], outputs="list")
Launch Chatbot :
iface.launch()
Complete Code Block :
import gradio as gr
from langchain.document_loaders import UnstructuredPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.chains.question_answering import load_qa_chain
import pinecone
import os
os.system("!sudo apt-get install tesseract-ocr")
os.system("!sudo apt-get install poppler-utils")
os.system("!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.10/index.html")
os.system("!pip install -qU pinecone-client")
OPENAI_API_KEY = '---'
PINECONE_API_KEY = '---'
PINECONE_API_ENV = '---'
index_name = "---"
pinecone.init(
api_key=PINECONE_API_KEY,
environment=PINECONE_API_ENV
)
llm = OpenAI(temperature=0, openai_api_key=OPENAI_API_KEY)
chain = load_qa_chain(llm, chain_type="stuff")
def load_pdf_document(file_path):
loader = UnstructuredPDFLoader(file_path)
return loader.load()
def split_document_to_chunks(documents):
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
return text_splitter.split_documents(documents)
def documentsearch(texts, embeddings, index_name):
return Pinecone.from_texts([t.page_content for t in texts], embeddings, index_name=index_name)
def responces(query, docsearch):
docs = docsearch.similarity_search(query, include_metadata=True)
return chain.run(input_documents=docs, question=query)
docsearch = None
history = []
def chatbot(file, question):
global history
global docsearch
if file is not None:
data = load_pdf_document(file.name)
texts = split_document_to_chunks(data)
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
docsearch = documentsearch(texts, embeddings, index_name)
if docsearch is not None and question is not None:
history.append(("User", question))
response = responces(question, docsearch)
history.append(("Bot", response))
return history
iface = gr.Interface(fn=chatbot, inputs=["file", "text"], outputs="list")
def clear_chat():
global history
history = []
iface.update_chat([])
iface.launch()
Explanation of key components:
- Loading Dependencies: We start by importing necessary libraries and installing required packages.
- Setting Up API Keys: Replace ‘ — -’ with your actual OpenAI and Pinecone API keys.
- Initializing Pinecone and OpenAI: Setting up Pinecone for document search and OpenAI for language understanding.
- Defining the Chatbot Function: The
chatbot
function orchestrates document loading, text splitting, and question-answering using LangChain. - Gradio Interface Setup: Creating a Gradio interface to interact with the chatbot, allowing users to upload a file and input text queries.

Document Handling with LangChain
The LangChain library is utilized for efficient document handling. It includes document loading, text splitting, and embeddings for intelligent question-answering.
Running the Application
To run the chatbot, execute the following commands:
sudo apt-get install tesseract-ocr poppler-utils
pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.10/index.html
pip install -qU pinecone-client
python app.py
Visit the provided Gradio UI link in your browser to interact with the chatbot.
Conclusion
Congratulations! You’ve built a sophisticated OpenAI-powered chatbot with a user-friendly Gradio interface, enhanced by LangChain for seamless document handling. This versatile system can be further customized and expanded to meet your specific requirements.
Feel free to experiment, add more features, or integrate additional functionalities to make your chatbot even more intelligent and useful.
Happy coding! 🚀
GitHub Repository : https://github.com/karthikeyanrathinam/Langchain-chatbot-with-openai
Just in touch with Karthikeyan Rathinam: Linkedin, GitHub, Youtube
Any Queries feel free to ask!!