Small Log Models: GenAI in Observability

Digital & Software Solutions

Generative AI

December 19, 2023

Introduction: Information in a Sequence and the Large Language Model (LLM)

Mathematically, a sequence is a collection of objects in which repetitions are allowed and order matters. Natural language is a sequence of words and thus carries information. A Transformer Neural Network extracts the information in the sequence and stores it in an LLM. During inference, a prompt, which is a sequence of words, is passed through the LLM, which predicts the next word in the sequence.

Thus, all that a Transformer needs is a sequence. Logs generated by an application/system are a sequence. The chronological log entries generated by an application/system are not random and, therefore, contain information that can be extracted by a Transformer and stored in a generative model. Once we create the model, real-time logs can be pushed as input into the model, which will then predict the next log entry. Here, the real-time streaming logs act as the prompt and we can call the model a Small Log Model. Small, because the number of parameters will be much less than for an LLM.

Thus, with the same steps we use to create an LLM on natural language, we can create a model on any sequence of objects.

The Proposed Approach to Building a Large Log Model

Defining the purpose — what to predict?

The first step will be to decide what we want the Small Log model to do. Instead of predicting the next log entry given a sequence of log entries as an LLM does for words, what will be more helpful (and potentially use less computing power to train) is to predict when specific errors will occur. Examples of specific errors to predict can be Event IDs, ITSM tickets, etc.

Preparing the training dataset

It is important that we ensure that all data in the dataset has a timestamp. Timestamps create the sequence. In the case of an application, you can create the dataset using the following:

Application logs
Application Performance Management (APM) data
Timestamped performance telemetry (e.g., CPU utilization)
OS and container logs
Virtualization logs
Hardware logs
Load balancer logs
End user activity logs
Business activity monitoring logs

We can merge all the above chronologically to create the input to the transformer during training.

Next, we need to create the ground truth or the output vectors to predict. The size of the output vector will be the number of unique error codes or ITSM tickets to predict. The number of such vectors should be roughly equal to the number of log entries in the training dataset.

Processing the training dataset

Natural language, upon which an LLM is trained, has potentially billions of words in a sequence but uses a finite set of unique words. We need to convert the log training dataset into a similar format. Even though we can have billions of individual log entries, not all of them are unique. For example, all Windows servers in an enterprise can generate billions of logs using just the predefined set of event IDs. So, if we replace all Windows logs with just their event IDs, we can create a log dataset like the natural language dataset.

Of course, not all logs will have easily identifiable IDs. In that case, we can cluster the logs with text clustering algorithms using the number of identical words as the distance metric. In this case, each unique cluster acts as a word. Each cluster will also have an associated regular expression. We can use these regular expressions to determine which cluster an incoming log entry belongs to.

Training

The training of an LLM starts with generating embeddings for each word. Similarly, to create a Small Log model, the first step will be to generate embeddings for each unique log entry type.

You can generate embeddings for log entries in two ways:

Generate from scratch.
Use existing LLMs to create initial embeddings for the log entries and then fine-tune the training dataset (potentially use less compute).

So, each event ID or log cluster will have its own embedding.

During training, we just need the encoder stack of a transformer. We will train the transformer on the training data to predict the problem vectors.

Making an inference

As discussed above, the incoming real-time logs will be used as a prompt to predict the problem vectors. The first step will be to convert the incoming real-time log stream into either event IDs or log clusters (using regex). Then, pass the embeddings through the model to get the problem vector.

The Right Approach

One can imagine that IT components (such as applications, VMs, containers, network devices, etc.) talk to humans using logs as their language. So, the question arises—How many log models do we need to build? Do we pool all logs from the entire enterprise and create a single log model, or do we need to build log models per the application stack? Also, will models created for one organization work for another organization?

I think the right approach will be to build many models focused on specific components of the IT landscape. This targeted model-building approach may help with valuable predictions on smaller datasets as they are easier, faster, and cheaper to train.

About the Author

Vineet Gangwar

Vineet has over 26 years of industry experience. As VP of Observability, he brings the latest innovations in AI to observability for Hexaware. He takes pride in being hands-on and has built a large-scale observability platform. He also builds private clouds, automations, serverless apps. His interests include reading history, wood-working and hiking.