A refresher on neural network fundamentals
- III.3.a Simple Neural Network
- III.3.b Vanilla MLP on MNIST
- III.3.c Convolutional Neural Network
- III.3.d Deep ConvNet
- … More coming as I write them.
We sometimes take for granted how effortlessly our brains do things. We learn to recognize faces, understand language, and make decisions without even thinking about it. But when it comes to teaching machines to do the same, things get complicated fast — at least, that was the story 7–10 years ago.
Today, image recognition and language translation are trivial. We have LLMs that can act as anything we want and give us anything we want, now. What once seemed out of reach is now part of everyday tools and products.
However advanced these models become, they're all built on the same core principles. This plate is a refresher on neural network fundamentals — a guide I wrote both for myself and for anyone who needs to revisit the essentials.
Series repo on GitHubThe elegant simplicity behind complex AI
Here's something that took me a while to fully appreciate: all neural networks, no matter how sophisticated — whether they're powering your smartphone's voice assistant or generating photorealistic images — are fundamentally just clever arrangements of dot products, non-linearities, and parameter updates.
That's it. The rest is architecture and engineering.
That simplicity is both reassuring and powerful. Understanding the foundation helps cut through the complexity that makes AI seem intimidating.
A single neuron
Despite all the biological terminology, a neuron in a neural network is just a mathematical function — a building block:
output = activation(weights · inputs + bias)
Breaking that down:
- Weights (w) — parameters that determine how important each input is.
- Inputs (x) — the data features being processed.
- Bias (b) — an offset value that helps the model fit the data better.
- Dot product (w · x) — multiplying each input by its corresponding weight and summing them.
- Activation — a non-linear function that lets the network model complex relationships.
The most commonly used activation functions include:
- ReLU —
max(0, x). Simple, computationally efficient, and works surprisingly well. - Sigmoid —
1 / (1 + e^(-x)). Maps outputs to a range between 0 and 1. - Tanh — similar to sigmoid, but maps to a range of −1 to 1.
- Softmax — normalizes outputs into a probability distribution. Often used for classification.
Scaling up: building a neural network
A single neuron can only do so much. The real power comes from connecting many of them:
- Layer — multiple neurons that process inputs in parallel (a matrix-vector multiplication).
- Deep Neural Network — multiple layers stacked together.
- Forward Pass — data flows through the network, with each layer processing the previous layer's outputs.
This structure allows neural networks to learn increasingly complex representations. Early layers might detect simple features (like edges in an image); deeper layers combine those into more abstract concepts (faces, objects).
The training process: how networks learn
But how do they actually learn? Neural networks aren't born knowing how to solve problems — they learn through training:
- Forward pass — process inputs to generate predictions.
- Loss calculation — compare predictions to actual targets using a loss function.
- Backpropagation — calculate gradients, essentially answering "how much would changing each weight affect our error?"
- Parameter update — adjust weights to reduce the error, typically with an optimizer like SGD or Adam.
The beauty of backpropagation is that it efficiently applies the chain rule from calculus to compute those gradients, allowing even very deep networks to learn.
Neural network variants: different applications, same foundation
Frameworks like TensorFlow and PyTorch handle most of the heavy lifting, but it's still valuable to understand the underlying principles. What amazes me is how the same core operations adapt to different problems:
- CNNs (Convolutional Neural Networks) — apply dot products through filters sliding over inputs. Perfect for image processing since they capture spatial relationships.
- RNNs / LSTMs (Recurrent Neural Networks) — apply dot products over sequences with shared weights, enabling them to process sequential data like text or time series.
- Transformers — use dot products in the attention mechanism (Query · KeyT) to capture relationships between every element in a sequence. These power models like BERT, GPT, or Claude.
- GNNs (Graph Neural Networks) — nodes perform dot products with their neighbors' embeddings, allowing them to process graph-structured data.
All of these architectures, despite their differences, rely on the same fundamental operations.
The big picture: why this matters
Understanding the fundamentals has a few concrete benefits:
- Debugging — when models aren't performing as expected, knowing the basics helps identify where things might be going wrong.
- Architecture design — better understanding leads to better design decisions when creating or modifying models.
- Optimization — knowing what's happening under the hood allows for more effective parameter tuning.
- Innovation — all advances in AI build upon these foundations. Understanding them opens the door to new techniques.
Simplicity in complexity
As I've gone deeper into AI and machine learning, I've come to appreciate the simplicity at the core of these seemingly complex systems. Neural networks, despite their remarkable capabilities, are built on straightforward mathematical principles.
The next time you're overwhelmed by a new AI architecture paper or the latest breakthrough model with billions of parameters, remember: at its heart, it's still just a clever arrangement of dot products, non-linearities, and parameter updates.
This perspective demystifies AI and empowers us to engage with it more confidently and creatively. The best innovations often come from a deep understanding of the fundamentals.
This plate is part of my ongoing exploration of AI fundamentals. Follow-ups will dive deeper into optimization, regularization, and modern architectures. In the meantime, you can browse code and notebooks in the repo.