Building Thinking Models with Chain of Thought (CoT)

Do you know how language models (LLMs) started to show reasoning abilities ?

They aren’t actually thinking like humans. Instead, they generate words step by step, predicting the next token. Yet, when guided properly, they can mimic a kind of reasoning process that looks surprisingly close to human problem solving. That’s where Chain of Thought (CoT) comes in.

CoT is like forcing the AI to show its work. You might remember in school when your math teacher wouldn’t give you marks unless you wrote all steps, not just the final answer. Same principle here. Instead of asking an AI to spit out the answer directly, you push it to reason step by step. This often leads to more accurate, more interpretable, and frankly more useful answers.

Why Chain of Thought?

Let’s be honest — LLMs can image things. Ask them a complex math question or a tricky logical puzzle and they might mess up. But when you ask them to “think step by step,” something changes. They slow down, break the problem apart, and walk through it. It doesn’t make them flawless, but the improvement is noticeable.

The reason is simple: reasoning is easier when decomposed. If you try to hold an entire equation in your head at once, mistakes creep in. Humans rely on scratch paper. CoT is that scratch paper for AI.

For example:

User: Can you solve 3+4*10-4*3

Without CoT:

3+4*10-4*3 = 7

(Wrong. Ouch.)

With CoT:

ASSISTANT : { "step": "START", "content" : " The user wants me to solve 3 + 4  10 - 4  3 maths problem" }

ASSISTANT : { "step": "THINK", "content" : "This is typical math problem where we use BODMAS formula for calculation" }

ASSISTANT : { "step": "THINK", "content" : "Lets breakdown the problem step by step" }

ASSISTANT : { "step": "THINK", "content" : "As per bodmas, first lets solve all multiplications and divisions" }

ASSISTANT : { "step": "THINK", "content" : "So, first we need to solve 4*10 that is 40" }

ASSISTANT : { "step": "THINK", "content" : "Great, now the equation looks like 3 + 40 - 4 * 3" }

ASSISTANT : { "step": "THINK", "content" : "Now, I can see one more multiplications to be done that is 4 * 3 = 12" }

ASSISTANT : { "step": "THINK", "content" : "Great, now the equestion looks like 3 + 40 - 12" }

ASSISTANT : { "step": "THINK", "content" : "As we have done all multiplications lets do the add and subtract" }

ASSISTANT : { "step": "THINK", "content" : "so, 3 + 40 = 43" }

ASSISTANT : { "step": "THINK", "content" : "new equations look like 43 - 12 which is 31" }

ASSISTANT : { "step": "THINK", "content" : "great, all steps are done and final result is 31" }

ASSISTANT : { "step": "OUTPUT", "content" : "3 + 4  10 - 4  3 = 31" }

Final Answer = 31

Notice the difference? Not only do we get the right result, but also the explanation of how the result was reached. If the model messed up, we’d see where the mistake happened, just like spotting a wrong calculation in your friend’s notebook.

How Does CoT Actually Work?

When you ask a model a question, you can phrase the prompt in two ways:

Direct prompting: “What is 38 * 47?” → Model outputs a single number.
CoT prompting: “Solve step by step: 38 * 47.” → Model explains multiplication, partial sums, carries, then final number.

The second approach encourages the model to generate intermediate reasoning tokens. These aren’t random. They serve as scaffolding, like the bamboo poles used while constructing a building. Once the reasoning is scaffolded, the final answer tends to be sturdier.

There’s even research showing that CoT can unlock latent abilities in models. A model that fails a logic puzzle when answering directly can suddenly solve it if you just say “think step by step.” Almost like giving it permission to slow down.

Example: Word Problem with and without CoT

Problem: A train leaves city A at 2 PM at 60 km/h. Another train leaves city B at 3 PM at 90 km/h, heading towards city A. The distance between A and B is 300 km. When do they meet?

Without CoT (direct answer):

They meet at 5 PM.

(Wrong, unless you like teleporting trains.)

With CoT:

ASSISTANT: { "step": "START", "content": "We need to calculate when the two trains meet" }
ASSISTANT: { "step": "THINK", "content": "Train A leaves at 2 PM at 60 km/h. By 3 PM, it has traveled 60 km." }
ASSISTANT: { "step": "THINK", "content": "So at 3 PM, distance left = 300 - 60 = 240 km." }
ASSISTANT: { "step": "THINK", "content": "Now both trains are moving towards each other. Combined speed = 60 + 90 = 150 km/h." }
ASSISTANT: { "step": "THINK", "content": "Time to meet = distance / speed = 240 / 150 = 1.6 hours." }
ASSISTANT: { "step": "THINK", "content": "1.6 hours = 1 hour 36 minutes." }
ASSISTANT: { "step": "OUTPUT", "content": "So, they will meet at 4:36 PM." }

Final Answer = 4:36 PM

That’s the magic of CoT. It didn’t just dump an answer — it actually reasoned through the timeline, checked each step, and gave something we can trust.

Why This Matters in Building Apps

If you’re a developer playing with AI APIs (like me), you’ll notice one pattern: users don’t just want answers, they want transparency. Imagine you’re building a tutoring app. If the AI simply blurts out “31” when asked about math, the student learns nothing. But if the AI shows the chain of thought, suddenly the app becomes an actual learning companion.

This pattern is not just for math. It’s for any domain where reasoning matters:

Debugging code (showing thought process of bug hunting)
Medical advice (explaining why a certain symptom suggests X instead of Y)
Legal reasoning (tracing laws step by step)
Decision making (pros and cons breakdown)

I use this in my own experiments. For example, when I built a small CLI tool, I asked the AI to “show steps” in JSON format. That way, I could log intermediate reasoning and display it in a nice UI.

Example:

User: Why is my React app crashing?


ASSISTANT: { "step": "START", "content": "The user is asking about React app crash" }
ASSISTANT: { "step": "THINK", "content": "Common reasons are missing dependencies, incorrect hooks usage, or syntax errors." }
ASSISTANT: { "step": "THINK", "content": "Let’s ask the user for error logs." }
ASSISTANT: { "step": "OUTPUT", "content": "Please share your console error message so I can help." }

This not only helps debugging, but also gives the user confidence that the AI isn’t pulling answers out of thin air.

Flaws of Chain of Thought

Now, don’t get me wrong. CoT isn’t a silver bullet. Models can still make silly mistakes while showing steps. Sometimes they even hallucinate steps — confidently writing wrong math. It’s like watching a friend explain a problem with a lot of confidence, only to realize they multiplied wrong early on.

Also, CoT can be slower and wordier. If your app requires instant responses (like a chatbot for small talk), showing every tiny step might overwhelm the user. So you have to balance it. Maybe only enable CoT in “explain mode.”

And one more thing: CoT can leak prompt length. Since it writes more, tokens go up, costs go up. For hobby projects it’s fine, but at production scale, you need to optimize.

Bringing It All Together

When building apps with AI, Chain of Thought is like adding a window into the model’s mind. It’s not real cognition, but it sure feels closer to it. It makes AI output more reliable, transparent, and useful.

Whether you’re building an educational tool, a debugger, or a decision support system, CoT can make the difference between a “black box answer” and a “trustworthy assistant.”

So next time you prompt a model, don’t just ask for the final result. Ask it to show its work. Give it that digital scratch paper. You’ll be surprised how much smarter it suddenly seems.

Final Thoughts

AI isn’t magic. It’s probabilities, patterns, and training data. But with tools like Chain of Thought prompting, we can shape those patterns into something resembling reasoning. Not perfect, but good enough to be useful in real-world apps.

And who knows? Maybe someday, when models genuinely reason, we’ll look back at CoT as the training wheels — the stage when we taught our machines how to “think out loud.”

One last point worth mentioning: CoT is becoming the backbone of AI agents. When agents plan actions, call functions, or decide which API to hit, they rely on step-by-step reasoning to avoid dumb mistakes. CoT isn’t just for math problems anymore — it’s the hidden engine behind decision-making pipelines in modern AI agents. Without it, most agents would just guess blindly. With it, they can plan, reflect, and act more like intelligent assistants.

Building Thinking Models with Chain of Thought (CoT)

Why Chain of Thought?

How Does CoT Actually Work?

Example: Word Problem with and without CoT

Why This Matters in Building Apps

Flaws of Chain of Thought

Bringing It All Together

Final Thoughts

Comments

More from this blog

The Little Library That Could Talk: A Story About GPT

How to Set Up Swagger UI in Express with TypeScript

What I Learned From 10 DOM Manipulation Projects

Prototypes: The Backbone of JavaScript Objects

Command Palette

Why Chain of Thought?

How Does CoT Actually Work?

Example: Word Problem with and without CoT

Why This Matters in Building Apps

Flaws of Chain of Thought

Bringing It All Together

Final Thoughts

Comments

More from this blog