Model
- Each pair of input data and the desired answer is called an example.
- With the help of the examples, the training process produces the automatically discovered
rules.
- A human engineer provides a blueprint for the rules at the outside of training. The
blueprint is encapsulated in a model which forms a hypothesis space for the rules the
machine may possibly learn.
- Models vary in terms of how many layers the neural netowrk consists of, what types of
layers they are, and how they are wired together.
- With the training data and the model architecture, the training process produces the learned rules, encapsulated in a trained model.
- Training Phase -> Inference Phase
Neural network and Deep learning
- Neural networks are a subfield of machine learning, one in which the transformation of
the data representation is done by a system with an architecture loosely inspired by
how neurons are connected in human and animal brains.
- A frequently encountered theme of neuronal connection is the layer organization.
Many parts of the mammalian brain are organized in a layered fashion. Examples include
the retina, the cerebral cortex(대뇌피질), and the cerebellar cortex(소뇌피질).
- Neural network layers are different from pure mathematical functions in that they are generally stateful.
- layer’s memory is captured in its weights.
- weight: a set of numerical values that belong to the layer and govern the details of
how each input representation is transformed by the layer into an output
representation.
- When a neural network is trained through exposure to training data, the weights get
altered systematically in a way that minimizes a certain value called the loss function.
- Generally, backpropagation in a neural network computes the gradient of the loss function with respect to the weights of the network for single input or output.
- Basically, a dense layer is used for changing the dimension of the vectors by using
every neuron.
- Activation: In neural networks, the activation function is a function that is used for
the transformation of the input values of neurons. Basically, it introduces the
non-linearity into the networks of neural networks so that the networks can learn the
relationship between the input and output values.
- Deep Learning is the study and application of deep neural networks, which are, quite
simply, neural networks with many layers(typically, from a dozon to hundreds of layers)
- Deep learning(layered representation learning) vs. Shallow learning
- Feature Engineering
- Deep learning automates this features engineering
- with deep learning, you learn all features in one pass rather than having to engineer them yourself.
- Two essential characteristics
- the incremental, layer-by-layer way in which increasingly complex representations are developed
- the fact that these intermediate incremental representations are learned jointly, each
layer being updated to folow both the representatioinal needs of the layer above and the needs
of the layer below.
- CUDA(2007): Computer Unified Device Architecture
- If hardware and algorithms are the steam engine of the deep-learning revolution, then
data is its coal.
- TensorFlow was made open source in November 2015 by a team of engineers working on deep learning at Google.
- data representations called tensors flow through layers and other data-processing nodes,
allowing inference and training to happen on machine-learning models.
- tensor: multidimensional array
: In neural networks and deep learning, every piece of data and every computation result
is represented as a tensor.
- Each tensor has two basic properties: the data type(such as float32 or int32) and the shape
- The tensor is the lingua franca of deep-learning models.
- Tensorflow and Keras form an ecosystem that leads the field of deep-learning frameworks in
terms of industrial and academic adoption.
- deeplearn.js: released 2017.09
Layer: a data processing module
- You can think of as a tunable function from tensors to tensors.
the kernel and bias = weights
- To find a good setting for the kernel and bias we need two things
- a measure that tells us how well we are doing
- a method to update the weights’ values that next time we will do better than we currently are doing
according to the measure previously mentioned.
model compilation
- a loss function: an error measurement
- an optimizer: the algorithm by which the network will update its weights (kernel and bias) based on
the data and the loss function
epoch
- each iteration through the complete training set is called an epoch
model’s evaluate method
- it is similar to the fit() method in that it calculates the same loss, but evaluate() does not update
the model’s weights.
backpropagation
- The directions are critical to the neural network’s learning process. They are determined
by the gradients with respect to the weights and the algorithm for computing the gradients
is called backpropagation
- Invented in the 1960s
- is one of the foundations of neural networks and deep learning.
gradient of loss
- y’ = v * x
- loss = square(y’ = y) = square(x * x – y)
- how much change in the loss will we get if v is increased by a unit amount
Why do we need gradient?
- Because once we have it, we can alter v in the direction opposite to it, so we can get a decrease
in the loss value.
MSE (Mean Squared Error)
- If your application might be sensitive to very incorrect outliers, MSE could be better choice than MAE.
- Standard transformation or z-score normalization
we will scale our features so that they have zero mean and unit standard deviation.
- Refer to this site for more information on zero mean and unit standard deviation.
https://stats.stackexchange.com/questions/305672/what-is-unit-standard-deviation
Adding nonlinearity: Beyond weighted sums
- The primary enhancement we will introduce is nonlinearity – a mapping between input and output that
isn’t a simple weighted sum of hte input’s elements.
- MLP: Multilayer Perceptron
- an oft-used term that describes neural network that 1) have a simple topology without loops(what
is referred to as feedforward neural networks) and 2) have a least one hidden layer.
- The number of weight parameters for each layer
- This is a count of all the individual numbers that make up the layer’s weights.
- Activation Function
- is an element-by-element transform.
- Sigmoid function
- is a “squashing” nonlinearity, in the sense that it “squashes” all real values from -infinity to +infinity
into a much smaller range(0 to +1).