Introduction to OpenAI and LLMs

I focus most of my blog posts on the data platform and how companies can make better business decisions using structured data (think SQL tables), but I’m seeing more and more customers interested in OpenAI and how they can make better business decisions with OpenAI using unstructured data (think text in documents). And they want to know if it is possible to use OpenAI on structured data? This is my first blog in a three-part series on the topic.

This first blog will focus on using OpenAI on unstructured data, where the ideal solution is a bot, like ChatGPT, that is used to ask questions on documents from your company.

First I want to explain in layman’s terms what OpenAI, ChatGPT and Bing’s Copilot are. ChatGPT and Copilot are basically bots that work with what are called Generative AI models, commonly known as Large Language Models or LLMs, that were “trained” on multiple data sets that include essentially the entire web as well as millions of digital books. The models were built using OpenAI’s technology (OpenAI is a leading artificial intelligence research lab that focuses on developing advanced AI technologies). So these models are very smart! Sitting on top of these LLM’s are “bots” that allow you to ask questions (via prompts), and the bot returns answers using the LLM. The more details you put in the question, the better the answer will be (this technique is called prompt engineering – designing prompts for LLMs that improves accuracy and relevancy in responses, optimizing the performance of the model). An example question would be “What are the best cities in the USA?”, and the LLM would return an answer based on all the websites, blog posts, reedit posts, books, etc. that it found that talked about the best USA cities.

But what if you wanted to ask questions on data sets that the LLM’s did not use for its training, such as PDF’s that your company has that are not available publicly (not on their website)? For example, maybe you are a company that makes refrigerators and have a bunch of material such as refrigerator user guides, model specifications, repair manuals, customer problems and solutions, etc. And you would like to train a LLM on the text in those documents and have a bot on top of it so that customers can ask questions of that material. Just think of the improved customer service as customers would not need to talk to a customer service person and can get quick, accurate answers about features of a refrigerator, as well as get quick answers to fix problems they are having with their refrigerator.

LLMs, when it comes to using them in a real-world production scenario, have some limitations, mainly due to the fact that they can answer questions related only to the data they were trained on (called the base model or pre-trained LLM). This means that they do not know facts that happened after their date of training, and they do not have access to data protected by firewalls or not accessible to the internet. So how do you get LLMs to also use PDF’s from your company? There are two approaches that can supplement the base model: further training of the base model with new data, called fine-tuning, or RAG which uses prompt engineering to supplement or guide the model in real time.

Let’s first talk about RAG. RAG supplements the base model by providing the LLM with the relevant and freshest data to answer a user question by injecting the new information through the prompt. This means RAG works with pre-trained LLMs and your own data to generate responses. Your own data can be PDF documents.

A system that implements the RAG pattern has in its architecture a knowledge base, hosting the validated docs (usually private data) on which the model should base its answer on. Each time a user question comes to the system:

Information Retrieval: The user question is converted into a query to search into the knowledge base for relevant docs, which are your private docs such as the previously mentioned refrigerator user guides. An index is commonly used to optimize the search process
Prompt Engineering: The matching docs are combined with the user question and a system message and injected into the pre-trained LLM. The system message contains instructions that guides the LLM in generating the desired output, such as “the user is a 5th grader” so its answer will be more simple to understand
LLM Generation: The LLM, trained on a massive dataset of text, generates text based on the prompt and the retrieved information from the model
Output Response: The generated text is then presented to the user, written in natural language, providing them with insights and assistance based on their private docs

Note that you can choose to have user questions answered only with the knowledge base of private docs, or also with the text that was used to train the LLM (“the internet”). For example, if a user question is for an older refrigerator model that is not part of the private docs, you can decide to return an answer of “not found”, or you can choose to search the pre-trained LLM and return what is found from the public information. You can also choose to combine the two: for example, if the user question is for a model you have in your private docs, you can return information from the private docs and combine it with public information to give a more detailed answer, perhaps with the public information giving customer reviews that the private docs do not have (the system message is used to indicate this).

The other approach, fine-tuning, enhances an existing pre-trained LLM using example data, like your refrigerator user guides (a domain-specific dataset). This results in a new “custom” LLM, or fine-tuned LLM, that has been optimized for the provided example data. The main issue with fine-tuning is the time and cost it takes to enhance (“retrain”) the LLM, and it will still only have information from when it was retrained, as opposed to RAG that is “real-time”.

When deciding between RAG and fine-tuning, it’s essential to consider the distinct advantages each offers. RAG, by leveraging existing models to intelligently process new inputs through prompts, facilitates in-context learning without the significant costs associated with fine-tuning. This approach allows businesses to precisely tailor their solutions, maintaining data relevance and optimizing expenses. In contrast, fine-tuning enables models to adapt specifically to new domains, markedly enhancing their performance but often at a higher cost due to the extensive resources required. Employing RAG enables companies to harness the analytical capabilities of LLMs to interpret and respond to novel information efficiently, supporting the periodic incorporation of fresh data into the model’s framework without undergoing the fine-tuning process. This strategy simplifies the integration and maintenance of LLMs in business settings, effectively balancing performance improvement with cost efficiency.

Bing’s Copilot is using RAG to give you the most updated answers to your questions (by scrapping web pages so it is real-time), as opposed to rebuilding the LLM using fine-tuning, which would take tons of hours and just be impractical to do each day, and also lag behind real-time. Microsoft’s Copilot in its Office 365 products also uses RAG on your data (PowerPoint, Word files, etc) – see How Microsoft Copilot Incorporates Private Enterprise Data. Think of Office 365 Copilot as a customized bot for a specific purpose (working with Office 365 files).

Now that you understand the “what”, what is OpenAI and LLM, the next blog post will talk about the “how” part (via Azure OpenAI On Your Data), and the third blog post will be about using OpenAI on structured data (or on both unstructured data and structured data at the same time).

More info:

What is Retrieval Augmented Generation (RAG)?

Retrieval Augmented Generation using Azure Machine Learning prompt flow (preview)

Full Fine-Tuning, PEFT, Prompt Engineering, and RAG: Which One Is Right for You?

RAG vs. fine-tuning: A comparison of two techniques for enhancing LLMs

Building your own copilot – yes, but how? (Part 1 of 2)

How Microsoft 365 Copilot works

The Fashionable Truth About AI

The post Introduction to OpenAI and LLMs first appeared on James Serra's Blog.