⚙️ Part 2: How Neural Networks Learn
From Guessing to Learning — The Journey of a Neural Network
🔰 Introduction & Background
In the first part of this series, we explored the very foundation of Artificial Intelligence — the neuron.
We learned how a neuron takes inputs, assigns importance through weights, adds a small bias, and then decides whether to “fire” using an activation function like ReLU (Rectified Linear Unit), Sigmoid, or Softmax.
Free Link:- In case, if your medium quota is expired, feel free to read here.
In simple terms, we understood how AI makes one decision — just like how your brain decides whether to take an umbrella when clouds start to gather. ☁️
But here’s the thing:
A single neuron can only make simple decisions.
To solve real-world problems — like recognizing faces, translating languages, or predicting diseases — we need many neurons working together in layers, learning from mistakes, and improving over time.
That’s where Part 2 begins.
In this article, we’ll go beyond what a neuron is and discover how a network of neurons actually learns.
We’ll explore:
- How information flows through a network (Forward Propagation)
- How the model learns from its mistakes (Backward Propagation)
- What loss functions and optimizers do behind the scenes
- And how techniques like learning rate tuning make training faster and smarter
If you haven’t read Part 1 yet, I’d recommend starting there —
👉 Understanding Neurons — The Building Blocks of AI
It sets the perfect foundation for what you’re about to learn here.
1️⃣ Let’s Start with a Thought
Think of a child learning to throw a basketball 🏀.
At first, every throw misses the hoop. The child adjusts — throws a little higher, a little softer, a little faster — until eventually, it lands perfectly.
That “trial → error → correction” loop is exactly how neural networks learn.
2️⃣ From Neurons to Networks
In Part 1, we saw how a neuron:
- Takes inputs,
- Weighs them,
- Adds a bias, and
- Decides via an activation function.
Now imagine stacking hundreds or thousands of neurons across layers:
- The input layer receives data (like pixels of an image).
- The hidden layers process patterns (edges, shapes, textures).
- The output layer makes the final decision (e.g., “cat” or “dog”).
But here’s the real question:
How does the network know which weights are right?
3️⃣ The Guessing Game — Forward Propagation 🎯
Every time we show an input (say, a cat image) to the network:
- The input moves forward through each layer.
- Every neuron performs its weigh + add + activate operation.
- The output layer makes a guess — like
[cat: 0.8, dog: 0.2].
This entire process is called Forward Propagation.
It’s like the network saying, “Here’s my current best guess.”
4️⃣ Measuring Mistakes — The Loss Function 📉
Once the guess is made, we need a way to measure how wrong it is.
That’s where the Loss Function comes in. Think of it as the distance between prediction and reality:
- If the network predicts 0.8 for cat but the actual answer is 1 (cat = true), the difference is small → good.
- If it predicts 0.1 for cat, the difference is large → bad.
Common loss functions:
- Mean Squared Error (MSE) → for regression (continuous values).
- Cross-Entropy Loss → for classification problems.
The lower the loss, the better the predictions.
5️⃣ The Fixing Process — Backward Propagation 🔁
Now comes the “learning” part — Backpropagation.
After computing the loss, the network works backward:
- It calculates how much each weight contributed to the error.
- Then it adjusts the weights slightly in the direction that reduces the loss.
This is like the basketball player realizing,
“I threw too hard — next time, a bit softer.”
6️⃣ Gradient Descent — The Learning Engine ⚙️
To know which direction to move, the network uses a mathematical tool called Gradient Descent.
Imagine standing on a hill (the loss surface).
Your goal: reach the lowest point (minimum loss).
The gradient tells you the slope — the direction of steepest descent.
So, step by step:
- Move against the gradient (downhill).
- Stop when you can’t get any lower.
Each step’s size is controlled by the learning rate (η):
- Too large → you may overshoot the valley.
- Too small → you may crawl forever.
7️⃣ Optimizers — Smarter Hill Climbers ⛰️
Plain Gradient Descent can be slow.
That’s why we use optimizers that adapt and accelerate learning:
🧠 Adam is the “default optimizer” for most modern networks — quick, stable, and reliable.
8️⃣ Epochs, Batches & Mini-Batches ⏱️
One complete pass of the dataset through the network = 1 epoch.
Because datasets can be huge, we train in smaller groups:
- Batch Gradient Descent: all samples at once — slow but precise.
- Stochastic Gradient Descent: one sample at a time — fast but noisy.
- Mini-Batch Gradient Descent: a sweet spot — small batches (like 32 or 64) for balanced speed + stability.
Each mini-batch performs one forward pass and one backward update — over many epochs, the network improves steadily. Now, let’s understand the same in detail.
When a neural network learns, it repeatedly goes through a cycle of:
- Making a prediction (forward pass),
- Measuring the error (loss),
- Adjusting weights (backpropagation).
Each time this full cycle happens, it’s called an iteration — but depending on how much data you process, there are related terms you should know.
Let’s break them down 👇
🧩 1️⃣ Iteration
➡️ One weight update step — meaning the network processes one batch of data (forward + backward pass).
- If you have 1,000 training samples and a batch size of 100,
→ one epoch will contain 10 iterations.
🧠 2️⃣ Epoch
➡️ One complete pass through the entire training dataset (i.e., all batches).
After one epoch, the network has seen every training example once.
We typically train for multiple epochs (like 10, 50, or 100) until performance stabilizes.
Think of it like:
“One round of training over all data.”
⚙️ 3️⃣ Batch
➡️ A subset of the training data processed before weights are updated.
- Example: batch size = 64 means the model looks at 64 samples at a time.
- Balances speed and stability — smaller batches = noisier updates, larger batches = slower but smoother.
🎯 Analogy:
Think of training like studying a textbook:
- One batch = reading one chapter.
- One iteration = reviewing what you learned in that chapter.
- One epoch = finishing the entire book once.
You usually need multiple epochs (re-reads) to truly master it. 📚
So, in short:
🔁 Iteration = one update cycle,
🧮 Epoch = one full pass over all training data,
📦 Batch = the group of samples used in one iteration.
9️⃣ Learning Rate Schedulers & Early Stopping 🧭
Sometimes the network gets stuck or learns too fast.
We control this with:
- Learning Rate Schedulers: gradually reduce η when improvement stalls (e.g.,
ReduceLROnPlateau). - Early Stopping: stop training when validation accuracy stops improving — prevents overfitting.
These make training efficient and prevent wasted computation.
In Short
A neural network learns by guessing, measuring, and correcting.
Each iteration makes it a bit less wrong — until it becomes impressively right.
Forward Pass → Calculate Loss → Backpropagate Error → Update Weights → Repeat
That’s the heartbeat of every deep learning model.
🚀 Coming Next — Part 3: Regularization and Generalization
In the next article, we’ll talk about how to stop networks from overfitting (memorizing data) and help them generalize to new situations.
We’ll explore techniques like L2 regularization, Dropout, DropConnect, and Batch Normalization, and see how they make learning more robust.
In the meantime, if you like to give your career a boost; here are the courses which can help you.
- Building Amazon Style Full Stack Microservices 30+ hours course
- Mastering React 18: Build a Swiggy-Style Food App 6+ hours course
- Building FullStack E-Commerce App using SpringBoot & React 17+ hours course
- Building FullStack E-Commerce App using SpringBoot & Angular 16+ hours course
- Creating .Net Core Microservices using Clean Architecture 51+ hours course
- Docker & Kubernetes for .Net and Angular Developers 7+ hours course
📲 Stay Connected & Keep Learning!
If you enjoyed this post and want to keep growing as a tech architect, let’s connect!
👉 Join my LinkedIn network for regular insights, architecture tips, and deep dives into real-world software systems:
🔗 linkedin.com/in/rahulsahay19
📺 Subscribe to my YouTube channel for tutorials, code walkthroughs, and clean architecture explainers:
🔗 youtube.com/@rahulsahay19
🐦 Follow me on Twitter for bite-sized tech tips, threads, and quick updates:
👉 twitter.com/rahulsahay19
Let’s grow, learn, and build better software — together! 🚀
