Back to AI Basics: A Refresher on Neural Network Fundamentals
Date Created: May 14, 2025
Date Modified:

Here are the links to every blog post in this series:
- Simple Neural Network
- Vanilla MLP on MNIST
- Convolutional Neural Network
- More coming soon...
Stay tuned for more posts in this series, where we'll dive deeper into each of these architectures and their applications!
We sometimes take for granted how our brains can do things effortlessly. We learn to recognize faces, understand language, and make decisions without even thinking about it. But when it comes to teaching machines to do the same, things get complicated fast. Or at least, that's what the story is 7-10 years ago.
Today, tasks like image recognition and language translation are trivial. We have LLMs that can act as anything we want, give us anything we want now. What once seemed out of reach is now part of everyday tools and products.
However, no matter how advanced these models become, they're all built on the same core principles. This blog post serves as a refresher on neural network fundamentals — a guide I created both for myself and for anyone who needs to revisit these essential concepts.
View on GitHubThe Elegant Simplicity Behind Complex AI
Here's something that took me a while to fully appreciate: all neural networks, no matter how sophisticated — whether they're powering your smartphone's voice assistant or generating photorealistic images — are fundamentally just clever arrangements of dot products, non-linearities, and parameter updates.
That's it. The rest is architecture and engineering.
This simplicity makes us feel reassuring and powerful. Understanding this core foundation helps cut through the complexity that can make AI seem intimidating.
A Single Neuron
Despite all the biological terminology, a neuron in a neural network is just a mathematical function, act as a building block:
output = activation(weights · inputs + bias)
Breaking this down:
- Weights (w): The parameters that determine how important each input is
- Inputs (x): The data features being processed
- Bias (b): An offset value that helps the model fit the data better
- Dot product (w · x): Multiplying each input by its corresponding weight and summing them
- Activation: A non-linear function that allows the network to model complex relationships
The most commonly used activation functions include:
- ReLU: max(0, x) - Simple, computationally efficient, and works surprisingly well
- Sigmoid: 1/(1 + e^(-x)) - Maps outputs to a range between 0 and 1
- Tanh: (e^x - e^(-x))/(e^x + e^(-x)) - Similar to sigmoid but maps to a range of -1 to 1
- Softmax: Normalizes outputs into a probability distribution - Often used for classification
Scaling Up: Building a Neural Network
A single neuron can only do so much. The real power comes from connecting many neurons together:
- Layer: Multiple neurons that process inputs in parallel (a matrix-vector multiplication)
- Deep Neural Network: Multiple layers stacked together
- Forward Pass: Data flows through the network, with each layer processing the outputs from the previous layer
This structure allows neural networks to learn increasingly complex representations. The early layers might detect simple features (like edges in an image), while deeper layers combine these to recognize more abstract concepts (like faces or objects).
The Training Process: How Networks Learn
But then, how do they learn or evolve from the existing knowledge? Neural networks aren't born knowing how to solve problems — they learn through a process called training:
- Forward Pass: Process inputs to generate predictions
- Loss Calculation: Compare predictions with actual targets using a loss function
- Backpropagation: Calculate gradients, essentially answering "how much would changing each weight affect our error?"
- Parameter Update: Adjust weights to reduce the error, typically using an optimizer like SGD or Adam
The beauty of backpropagation is that it efficiently applies the chain rule from calculus to calculate these gradients, allowing even very deep networks to learn.
Neural Network Variants: Different Applications, Same Foundation
While frameworks like TensorFlow and PyTorch handle most of the heavy lifting, it's valuable to understand the underlying principles.
What amazes me is how the same core principles adapt to different problems:
- CNNs (Convolutional Neural Networks): Apply dot products through sliding filters over inputs. Perfect for image processing since they capture spatial relationships.
- RNNs/LSTMs (Recurrent Neural Networks): Apply dot products over sequences with shared weights, enabling them to process sequential data like text or time series.
- Transformers: Use dot products in the attention mechanism (Query · Key^T) to capture relationships between all elements in a sequence. These power models like BERT, GPT, or Claude.
- GNNs (Graph Neural Networks): Nodes perform dot products with their neighbors' embeddings, allowing them to process data represented as graphs.
All of these architectures, despite their differences, rely on the same fundamental operations.
The Big Picture: Why This Matters
Understanding these fundamentals has several benefits:
- Debugging: When models aren't performing as expected, knowing the basics helps identify where things might be going wrong.
- Architecture Design: Better understanding leads to better design decisions when creating or modifying models.
- Optimization: Knowing what's happening under the hood allows for more effective parameter tuning.
- Innovation: All advances in AI build upon these foundations. Understanding them opens the door to new techniques or methodology.
Simplicity in Complexity
As I've gone deeper into AI and machine learning, I've come to appreciate the simplicity at the core of these seemingly complex systems. Neural networks, despite their remarkable capabilities, are built on straightforward mathematical principles.
The next time you're overwhelmed by a new AI architecture paper or the latest breakthrough model with billions of parameters, remember: at its heart, it's still just a clever arrangement of dot products, non-linearities, and parameter updates.
This perspective not only demystifies AI but also empowers us to engage with it more confidently and creatively. After all, the best innovations often come from deeply understanding the fundamentals.
This post is part of my ongoing exploration of AI fundamentals. I'm planning follow-up posts diving deeper into specific aspects like optimization algorithms, regularization techniques, and modern architectures. In the meantime, you can view code snippets and implementations on my GitHub repository.