lecture 6
ML applications,
+
neural networks part 2 yay:
multilayer perceptron
SeAts APp SEAtS ApP SEaTS APP
gentle reminder of your
presentation
starting early by thinking about these questions:
- what are the problems/design questions that can be potentially solved by ML models?
- if so, what ML models to use and how? (i'll help with this part)
lecture plan
- multilayer perceptron (finally)
- ML applications
- convolutional neural networks
most ambitious lecture of this unit so far
every previous lecture is a preparation for this one ๐
after this lecture, (hopefully) neural network is no longer a buzz word but a source of knowledge and curiosity
GAME TIME!
the 12
Mini Recap{
relu and sigmoid
chaining function
vector shape
matrix shape
vector is a special case of matrix
computation rules
vector, matrix addition: it has to be the exact same shape
vector multiplication: dot product
matrix multiplication: the shape rule
use the shape rule to verify why dot product has one single number output
/*end of recap*/
}
let's forget about math for now
the story starts from real biological neuron (a simulation)
as human we have roughly 100 billion
it is the fundamental units of the brain and nervous system.
the cells are responsible for receiving sensory input, for sending motor commands to our muscles, and for transforming and relaying the electrical signals at every step in between.
Neurons communicate with each other via electrical impulses
one neuron with dendrites
think of your happiest moment in memory, and
this is possibly what was happening in your brain exactly during that moment
recap of the simulated neural process:
-- it is charged by signals from other connected neurons,
-- there are usually different levels of signals from different neurons,
-- once it is sufficiently charged,
-- it fires off a signal to the next neuron(s)
the myth of grandma neuron
what are the mAtHY parts in the neural process?
let's do something truly interdisciplinary
--- extracting maths ideas from dat biology class --->
๐ก๐พ๐งช๐งฎ๐ก
mathy extraction 01
-- charging:
accumulation, addition, as opposed to taking stuff away (substraction)
mathy extraction 02
-- it is NOT firing immediately whatever input it receives,
instead it waits till being sufficiently charged and firing:
activation, a sense of thresholding
hint hint relu, sigmoid
mathy extraction 03
-- connectivity with different connection strengths:
1. not any two neurons are connected directly
2. different stengths => different weights?
mathy extraction 04
-- bird eye view of connectivity:
a hierarchical process
from sensory inputs to noodling to motor outputs
mathy extraction 04
-- bird eye view of connectivity:
a hierarchical process
one single neuron is not taking care of everything, it is part of a structured decomposition
introducing now: (artificial) neural networks, the anatomy
finally!!!
starting from (artificial) neuron anatomy {
one neuron holds a single number indicating signal level (not a fixed number), like the real neuron storing the signal level, let's call it activation
on whiteboard
neurons are grouped in layers, reflecting the hierarchical structure
let's draw another two layers of neurons because i want to
connectivity between !consecutive! layers
ps: it is actually connectivity between neurons in !consecutive! layers
this first neuron receives signals(numbers) from all neurons in the previous layer, let's draw a link
and so does every single neurons!
โ ๏ธ note: neurons inside the same layer are NOT connected
different connection strengths: every link indicates a different connection strength
that is to say every link also indicates a number, let's call it a weight
note that a weight is different from an activation that is stored in each neuron
one activation is contextualised in one single neuron,
whereas one weight is contextualised between two connected neurons
/*end of (artificial) neuron anatomy */
}
now that we know what neurons are and that they are grouped in layers
let's look from the perspective of layers and build our first multilayer perceptron by hands
multilayer perceptron MLP {
aka vanilla neural networks,
aka fully connected feedforward neural network, (don't memorise this MLP sounds way cooler, but this lengthy names has some meanings we'll see shortly )
summoning the "hello world" in ML: MNIST, handwritten digits recognition
through the lens of layers{
neurons are holding numbers, so in one layer there is a vertical layout of one column of numbers. Does this one vertical column of numbers sound familiar?
neurons in one layer forms a vector, or matrix in a general sense, let's call this "neurons vector"
there are different types of layers:
input and output layers
input layer is where the input data is loaded (e.g. one neuron holds one pixel's grayscale value)
because it has to be a vertical col vector, the flattening giant has stepped over...
what is the shape of the neurons vector of input layer ?
784*1
what if i have a dataset of (small)images of size 20*20?
400*1
output layer is where the output is held
for classification tasks, the output is categorical and how do we encode categorical data?
one-hot encoding: it depends on how many classes are there
another way to interpret one-hot encoding output: each neuron holds the "probability" of the output belonging to that class
it is just another number container anyway ๐คช
what is the shape of the neurons vector of output layer ?
10*1 (10 classes of digits)
what if my task changes to recognise if the digit is zero or non-zero?
2*1 (only 2 classes of digits!)
input and output layer shapes are determined by the task and the dataset
hidden layers: any layer inbetween input and output layers ๐
shape of the neuron vectors of hidden layers? is it strictly data- and task- dependent like input/output layers?
No, it is all up to you woo hoo ! it is part of the fun nn designing process ๐ฅฐ
here i choose...
let's connect these layers following our previous connection rule: only consecutives layers are *directly* linked
ATTENTION
the last piece of puzzle ๐งฉ
recall the biological accumulation and charging process
let's simulate the ANN process with biological analogies, from input to output
recall that each link has a number ("weight") for connection strength
what we just did: activations in each layer's neuron vector are computed using previous layer's neurons vector and the corresponding weights
to caculate the first neuron's number in hidden layer: prev layers' neurons activations multiplied with corresponding connection weights, and sums up
wait did that look just like a dot product process?
indeed, we can formalise the "charging" process using matrix and multiplication
weights matrix: place the connection weights in order that's it
demonstration of weights matrix multipled with input layer's activation vector on whiteboard
for the "wait till sufficiently charging or thresholding" part, let's introduce bias vector and activation function
demonstration of neuron vector added with bias vector on whiteboard (adding or removing extra difficulty for accumulated neuron value to reach "active zone" in activation function)
demonstration of neuron vector wrapped with activation function on whiteboard
puzzle almost finished!
/*end of through the lens of layers */
}
let's write down what just happened using function expression
1. each layer as a function
-- function input: a vector, previous layer's neurons vector
-- function output: a vector, this layer's neurons vector
-- the function body: multiplied by weights matrix, added with bias vector, wrapped with activation function
V_out =
Relu(Weights * V_in + Bias)
what is the scaffold of this layer function?
-- something like a linear functin, wrapped with activation
what is the muscle (parameters) in this layer function?
-- numbers in the weights matrix (weights)
-- numbers in the bias vector (bias)
-- activation funciton does not have parameters
-- input is NOT parameter
each layer has its own set of parameters
to connect different layers using function expression?
chaining!!! demonstration on whiteboard โโโ
puzzle finished, recall that a model is roughly a big function?
MLP is a model, and function. Let's writedown the final BIG function for this neural network
๐๐ฅ๐
note: being able to write down the function is just to internalise the maths understanding of MLP! when you actually use/train MLP, it is still better to think of it in the graphical way
ๅฐindex: 3 months
it is a legit part of MSc-level MLP walk through
/*end of MLP */
}
some leftover questions:
now i know that numbers in neurons are computed, but how about numbers in weights matrix and bias vector(the parameters)?
"to find good parameters" is the process of training (so far in this lecture we have not touched upon).
We have actually talk about this process before, let's revise with refreshed knowledge:
curiosity on training process will be answered next semester (stay tuned)
its major idea is "derivative", aka the mathematics of "change"
for now I will just use this lingo "backpropagation"
today's simulation process of going from input to output is what we call "forward pass"
fun ML applications
Kaggle
very practical problems, with dataset and codes
chatGPT
simply the best chatbot released recently
have a guess on what the data is like to train the BirdNET
this question is basically equivalent to "what are the input and outputof BirdNET?"
first step to prepare dataset for a SL task: formalise the problem, identify the input and output (we'll walk through this process together once you have found your TOI)
start thinking about the presentation, aka dreaming regardless of feasibility
here's my take on "what are the problems that can be potentially solved by machine learning?"
- mundane, repetitive works that i don't bother doing (transcribe handwriting into texts on computer, etc.)
- "it'd be great to have that" but i don't know how to implement that just using programming (face recogtion, etc.)
- just for fun (tomorrow lecture) ๐ฅน
summary of today ๐
MLP ๐
- neurons hold numbers ๐ง
- neurons grouped in layers
- consecutieve layers are connected
- input/output layers (shapes are determined by task and data)
- hidden layers (shapes are free to choose )
- layers as functions
-- in one layer function: weights matrix, bias vector and activation function
-- chaining layer functions together (to get the filnal MLP big function)