YouTube Transcript Analyzer: Auto_YT GPT

Chat about a video to OpenAI's GPT-3.5 Turbo Model, using Langchain's LLM

ยท

10 min read

YouTube Transcript Analyzer: Auto_YT GPT

#Overview

Have you ever watched a YouTube video and wanted to quickly find a specific piece of information mentioned in the video? It can be a time-consuming and frustrating process to manually search through the transcript of the video to find what you're looking for. But what if there was a way to automatically generate a transcript and then search through it for specific information? That's where LangChain comes in.

The YouTube Transcript Analyzer is a mini-app that uses LangChain, an open-source library for natural language processing (NLP), to generate transcripts using OpenAI's GPT-3.5 Turbo model. The app then uses the generated transcripts to search for specific information in the video using LangChain's text similarity search capabilities.

The app uses LangChain's document loaders, text splitters, embeddings, chat models, chains, and vector stores to analyze the transcripts of YouTube videos. The app also provides answers to users' questions based on the factual information present in the video's transcript.

In this tutorial, we will build a mini-app that uses LangChain to generate youtube video transcripts using OpenAI's GPT-3 based Turbo model. We will then use the generated transcripts to search for specific information in the video using LangChain's text similarity search capabilities.

#Directory Structure

auto_YT
โ”‚---README.md
โ”‚---requirements.txt
|---app.py

#Requirements

Before we begin, make sure you have the following libraries installed:

  • langchain

  • faiss-cpu

  • youtube-transcript-api

  • validators

  • openai

  • tiktoken

You can install these libraries by running the following command:

$pip install langchain faiss-cpu youtube-transcript-api validators openai tiktoken

Alternatively, you can install all the required packages at once by running:

$pip install -r requirements.txt

#Let's Start Coding

Setting up the App >

First, we will import the necessary modules that we need for this project.

import streamlit as st
import os
import textwrap
import validators

from langchain.document_loaders import YoutubeLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain.vectorstores import FAISS
from langchain.prompts.chat import (
    ChatPromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
)

The first three modules, streamlit, os, and textwrap, are standard Python libraries for building interactive web applications, working with operating systems, and formatting text, respectively. The validators module is a third-party package for validating and sanitizing data.

The other modules, imported from langchain are specific to natural language processing. They include:

  • YoutubeLoader: a class for loading text from YouTube videos.

  • RecursiveCharacterTextSplitter: a class for splitting text into segments based on characters.

  • OpenAIEmbeddings: a class for generating word embeddings using OpenAI's GPT-3 model.

  • ChatOpenAI: a class for generating responses to text input using OpenAI's GPT-3 model.

  • LLMChain: a class for generating text based on a language model trained on a large corpus of text.

  • FAISS: a class for storing and querying vectors using the FAISS library.

  • ChatPromptTemplate: a class for generating prompts for chat conversations.

  • SystemMessagePromptTemplate: a class for generating prompts for system messages in a chat conversation.

  • HumanMessagePromptTemplate: a class for generating prompts for human messages in a chat conversation.

Loading OpenAI key from .env file (Optional) >

# This will Load the environment variables
# OPTIONAL: This is optional as we are requesting the API key from the user
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

Setting up the UI Layout using Streamlit >

After importing the necessary packages, the next step is to set up the UI layout using Streamlit. Streamlit is a popular Python library that simplifies the process of building web applications for data science and machine learning projects.

To create your own modifications you can refer to Streamlit documentation: https://docs.streamlit.io/

st.set_page_config(
    page_title="LangChain Quickstart",
    page_icon="snake"
)
# Force responsive layout for columns also on mobile
st.write("""
<style>
    [data-testid="column"] {
        width: calc(50% - 1rem);
        flex: 1 1 calc(50% - 1rem);
        min-width: calc(50% - 1rem);
    }
</style>
""",
    unsafe_allow_html=True,
)

# Title and Description of the App
st.title("๐Ÿฆœ๏ธ๐Ÿ”— Youtube Transcript Analyzer")
st.markdown(
    "DESCRIPTION"
)

Handling the OPENAI_API_KEY >

This code snippet allows the user to enter their OpenAI API key using Streamlit's text_input function. The type argument is set to "password" so that the user's input is hidden as they type, and the placeholder argument provides a hint to the user on the format expected for the API key.

The API key is then assigned to the openai_api_key variable and added to the environment variables using os.environ so that it can be accessed by other parts of the code.

openai_api_key = st.text_input(
    label="Enter your OpenAI API key",
    type="password",
    placeholder="sk-..."
)
os.environ["OPENAI_API_KEY"] = openai_api_key

Create OpenAI Embeddings >

The embeddings variable is an instance of the OpenAIEmbeddings class, which is used to load and access pre-trained embeddings from OpenAI's GPT models.

By passing the openai_api_key variable, which we obtained from the user input using Streamlit, we authenticate our access to OpenAI's API and can proceed to load the embeddings. We can use these embeddings to compute similarity scores between text inputs and generate responses to user prompts using the GPT models.

embeddings = OpenAIEmbeddings(openai_api_key= openai_api_key)

Loading the Transcript >

Next, we'll load the transcript of the YouTube video. We'll be using the YoutubeLoader class from LangChain's document_loaders module to load the transcript from the YouTube video URL. We'll also be using LangChain's RecursiveCharacterTextSplitter class to split the transcript into smaller chunks of text that can be more easily processed by our NLP model.

pythonCopy codefrom langchain.document_loaders import YoutubeLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load transcript from the YouTube video URL
def yt_loader(video_url):
    loader = YoutubeLoader.from_youtube_url(video_url)
    transcript = loader.load()

    # Split transcript into smaller chunks for easier processing
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    docs = text_splitter.split_documents(transcript)

    return docs

Handling User Inputs and Responses >

This is a function called app() that creates the Streamlit app interface. The function defines various input variables, such as video_url and query, which are displayed as text input fields in the Streamlit app. The input fields allow the user to enter a YouTube video URL and a question about the video.

The function first checks if the video_url is a valid URL using the validators library. If the URL is not valid, it displays an error message using the st.text_error method and returns from the function.

If the user clicks on the "Get Answer" button, the function retrieves the video data using the yt_loader() function and generates a response using the response_fromQuery() function. The response is then displayed in a container using the st.expander() method.

def app():
    # Input VIDEO_URL from the USER
    video_url = st.text_input(
        label="Enter the YouTube video URL.",
        placeholder="https://www.youtube.com/watch..."
    )

    # Validate if the entered input is a URL
    if not validators.url(video_url):
        st.text_error = "Invalid URL. Please enter a valid URL."
        return

    # get QUESTIONS from USER
    query = st.text_input(
        label="Ask your question about the video",
        placeholder="Describe video in 1 sentence or 10 words"
    )

    if st.button(label="Get Answer"):
        db = yt_loader(video_url)

        # Shows a LOADING SPINNER untill it gets the response
        with st.spinner("Generating response..."):
            response, docs = reponse_fromQuery(
                                db,
                                query,
                                openai_api_key
                            )

        # DISPLAY the generated Response
        st.markdown("""---""")
        with st.container():
            with st.expander("ANSWER >", expanded=True):
                st.info(textwrap.fill(response, width=80))

Process the Transcripted Data >

yt_loader function is responsible for loading the YouTube video using YoutubeLoader, extracting its transcript, splitting it into chunks of 1000 characters using RecursiveCharacterTextSplitter, and creating an FAISS index using FAISS.from_documents with the help of OpenAI embeddings to be used in the response_fromQuery function later.

Transcripts from video can be very long, containing thousands of words or sentences. Feeding such a long document to a machine-learning model can be computationally expensive and may also lead to performance issues.

By dividing the transcript into smaller chunks, the machine learning model can process the data more efficiently and in a timely manner. Moreover, it allows the model to focus on smaller sections of the transcript at a time, which helps in producing more accurate results.

def yt_loader(video_url):
    loader = YoutubeLoader.from_youtube_url(video_url)

    # COLLECT the TRANSCRIPT
    transcript = loader.load()

    # TEXT_SPLITTER divides TEXT into CHUNKS of 1000 characters
    text_splitter = RecursiveCharacterTextSplitter(
                        chunk_size=1000,
                        # CHUNK_OVERLAP makes sure that no data is     missed, by overlapping 100 chars from each chunk to the previous/next chunk
                        chunk_overlap=100
                    )

    docs = text_splitter.split_documents(transcript)

    db = FAISS.from_documents(docs, embeddings)
    return db

Then, the FAISS object is used to create an index of the documents, which is a high-dimensional vector space representation of the text. The index is used to search for relevant documents based on the user's query. The embeddings object is used to convert the text of each document into a vector representation that can be used to build the index.

The resulting index (db) is used to retrieve the most relevant document to the user's query and generate a response.

Extracting Answers from db Vectors >

Using the YouTube video transcript that we extracted earlier, we can now query the database using a user input query. This is where the reponse_fromQuery function comes into play.

The reponse_fromQuery function takes in the following parameters:

  • db: The database of the video transcript split into smaller documents and converted into a FAISS index for efficient similarity search.

  • query: The user input query, which is the question they want to ask about the video.

  • openai_api_key: The OpenAI API key required for the language model used to generate the response.

  • k: The number of documents to retrieve from the database that are most similar to the user's query.

The function begins by using the FAISS index to find the k most similar documents to the user's query. It then concatenates the page content of these documents into a single string. This concatenated string of document content is passed into a chatbot model based on GPT-3 called ChatOpenAI.

The ChatOpenAI model is initialized with a low temperature value of 0.2 to encourage conservative responses that are more factually accurate, and a prompt template is provided that informs the model what it is being used for and what it can and cannot do.

A chat prompt is then created that consists of two message templates: one for the system message and another for the human message. The system message template informs the user what they can ask and how they should phrase their question, while the human message template is where the user's question is inserted. The ChatPromptTemplate is created using these two message templates.

def reponse_fromQuery(db, query, openai_api_key, k=4):
    docs = db.similarity_search(query, k=k)
    docs_page_content = " ".join([d.page_content for d in docs])

    chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.2)

    template = """
        You are a helpful assistant that that can answer questions about youtube videos based on the video's transcript: {docs}

        Only use the factual information from the transcript to answer the question.

        If you feel like you don't have enough information to answer the question, say "I don't know".

        Your answers should be verbose, and detailed.
    """

    system_message_prompt = SystemMessagePromptTemplate.from_template(template)

    # Human question prompt
    human_template = "Answer the following question: {question}"
    human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)

    chat_prompt = ChatPromptTemplate.from_messages(
        [system_message_prompt, human_message_prompt]
    )

    chain = LLMChain(llm=chat, prompt=chat_prompt)

    response = chain.run(question=query, docs=docs_page_content)
    response = response.replace("\n", "")
    return response, docs

Finally, an LLMChain (Language Model Markov Chain) is created with the ChatOpenAI model as the language model and the ChatPromptTemplate as the prompt. The LLMChain is then used to generate a response to the user's query. The response is returned from the function along with the k most similar documents that were retrieved from the database.

#Possible future use cases and modifications

  • Developing a Chrome extension that automatically generates video summaries and key highlights whenever a video is played

  • Additionally, this tool could be designed to support multiple languages

  • Another possible feature would be to generate a transcript of the audio if no captions are available.

#Conclusion

To conclude, YouTube Transcript Analyzer GPT is a powerful tool for generating accurate transcripts quickly and efficiently. With its use of OpenAI's advanced GPT-3-based Turbo model, LLM(Large Language Model) has been shown to be a reliable and effective solution for natural language processing. We have explored the app's code and features, and hope that this blog has been informative and helpful in showcasing LLM's capabilities.

#Source Code

#References

ย