Guide Labs is launching a new type of interpreting LLM

admin February 23, 2026

0 1 3 minutes read

Guide Labs is launching a new type of interpreting LLM

The challenge of wrangling a deep learning model is often to understand why it does what it does: Whether it’s the repeated struggles of xAI to fine-tune Grok’s strange politics, ChatGPT’s struggles with sycophancy, or false ideas, running a neural network with billions of parameters isn’t easy.

Guide Labs, a San Francisco startup founded by CEO Julius Adebayo and chief scientific officer Aya Abdelsalam Ismail, offers an answer to that problem today. On Monday, the company opened the source of the 8 billion parameter LLM, Stirling-8B, trained with a new architecture designed to make its actions easily interpretable: Every token produced by the model can be traced back to its origin in the LLM training data.

That can be as simple as determining the reference material for the facts cited by the model, or as complex as understanding the model’s understanding of humor or sex.

“If I have a billion ways to code gender, and I code it into 1 billion of the 1 trillion things that I have, you have to make sure you get all of those 1 billion things that I’ve coded, and then you have to be able to reliably turn that on, off,” Adebayo told TechCrunch. “You can do it with current models, but they’re very fragile … It’s kind of one of the holy grail questions.”

Adebayo began this work while pursuing his PhD at MIT, co-authoring a widely cited 2018 paper that showed existing methods for understanding deep learning models were unreliable. That work eventually led to the creation of a new way to build LLMs: Engineers put a conceptual layer into the model that buckets data into trackable categories. This required the annotation of previous data, but by using other AI models to help, they were able to train this model as their biggest proof of concept yet.

“The kind of interpretation that people are doing … is the neuroscience in the model, and we’re investigating that,” Adebayo said. “What we’re actually doing is engineering the model from the ground up so you don’t have to do the neuroscience.”

Photo credits:Guidance Labs

One concern with this approach is that it may eliminate some of the emergent behaviors that make LLMs so interesting: Their ability to connect in new ways about things they haven’t been trained in so far. Adebayo says that’s still happening in his company’s model: His team is tracking what he calls “discovery ideas” that the model was discovering on its own, like quantum computing.

Techcrunch event

Boston, MA
|
June 9, 2026

Adebayo says this interpretive building will be something everyone needs. For consumer-facing LLMs, these plans should allow model makers to do things like restrict the use of copyrighted material, or better control output on topics such as violence or drug abuse. Regulated industries will require regulated LLMs, for example in finance, where the model that evaluates loan applicants needs to consider things like financial records but not race. There is also a need for interpretation in scientific work, another area where Guide Labs has developed expertise. Protein folding has been a huge success in deep learning models, but scientists need to understand more about why their software found such a promising combination.

“This model shows that training interpretable models is no longer a science; it is now an engineering problem,” Adebayo said. “We’ve got the science and we can scale, and there’s no reason why this type of model shouldn’t work with boundary-level models,” which have more parameters.

Guide Labs claims that the Stirling-8B can achieve 90% of the power of existing models, but uses less training data, due to its novel architecture. The next step for the company, which emerged from Y Combinator and raised a $9 million seed in Initialized Capital in November 2024, is to build a large scale model and start offering API and agent access to users.

“The way we are currently trained is very primitive, so the interpretation of democracy will be a long-term positive for our role within humanity,” Adebayo told TechCrunch. “As we follow these models that will be very intelligent, you don’t want someone to make decisions on your behalf that you don’t understand.”

admin February 23, 2026

0 1 3 minutes read