Generative Deep Learning

Generative Deep Learning

What is Generative Modeling?

Generative modeling is a branch of machine learning that involves training a model to produce new data that is similar to a given dataset

figure 1-1

We can sample from this model to create novel, realistic images of horses that did not exist in the original dataset.

One data point in the training data is called as observation. Each observation consists of many features.

A generative model must be probabilistic rather than deterministic, because we want to be able to sample many different variations of the output, rather than get the same output every time.

Generative Versus Descriminative Modeling

figure 1-2

Figure 1-2 shows the discriminative modeling process.

Generative modeling doesn’t require the dataset to be labeled because it concerns itself with generating entirely new images, rather than trying to predict a label of a given image.

Conditional Generative Models

For example, if out dataset contains different types of fruit, we could tell our generative model to specifically generate an image of an apple.

The Rise of Generative Modeling

Until recently, discriminative modeling has been the driving force behind most progress in machine learning. This is because the corresponding generative modeling problem is typically much more difficult to tackle.

However, as machine learning technologies have matured, this assumption has gradually weakened.

In the last 10 years many of the most interesting advancements in the field have come through novel applications of machine learning to generative modeling tasks.

Generative services that target specific business problems

  • generate original blog posts given a particular subject matter
  • produce a variety of images of your product in any setting
  • write social media content and ad copy to match your brand
  • game design and cinematography to output video and music

Generative Modeling and AI

There are three deeper reasons why generative modeling can be considered the key to unlocking a far more sophisticated form of artificial intelligence that goes beyond what discriminative modeling alone can achieve.

Firstly, we shouldn’t limit our machine training to simply categorizing data and should also be concerned with training models that capture a more complete understanding of the data distribution. Many of the same techniques that have driven development in discriminative modeling, such as deep learning, can be utilized by generative models too.

Secondly, generative modeling is now being used to drive progress in other fields of AI, such as reinforcement learning. A traditional approach(to train a robot to walk across a terrain) is fairly inflexible because it is trained to optimize the policy for one particular task. An alternative approach that has recently gained traction is to train the agent to learn a world model (not the real environment)of the environment using a generative model, independent of any particular task.

Finally, if we are to say that we have built a machine that has acquired a form of intelligence, generative modeling must surely be part of the solution. Current neuroscientific theory suggests that our perception of reality is not a highly complex discriminative model operating on our sensory input to produce predictions of what we are experiencing, but is instead a generative model that is trained from birth to produce simulations of our surroundings that accurately match the future.

The Generative Modeling Framework

  • We have a dataset of observation X.
  • We assume that the observations have been generated according to some unknown distribution, Pdata
  • We want to build generative model Pmodel that minics Pdata. If we achieve this goal, we can sample from Pmodel to generate observations that appear to have been drawn from Pdata.
  • Therefore, the desirable properties of Pmodel are:
    • Accuracy
      If Pmodel is high for a generated observation, it should look like it has been drawn from Pdata. if Pmodel is low for a generated observation, it should not look like it has been drawn from Pdata.
    • Generation
      It should be possible to easily sample a new observation from Pmodel.
    • Representation
      It should be possible to understand how different high-level features in the data are represented by Pmodel.

Representation Learning

Instead of trying to model the high-dimensional sample space directly, we describe each observation in the training set using some lower dimensional latent space and then learn a mapping function that can take a point in the latent space and map it to a point in the original domain. In other words, each point in the latent space is a representation of some high-dimensional observation.

One of the benefits of training models that utilize a latent space is that we can perform operations that affect high-level properties of the image by manipulating its representation vector within the more manageable latent space.

The concept of encoding the training dataset into a latent space so that we can sample from it and decode the point back to the original domain is common to many generative modeling technique.

figure 1-9 The dog manifold in high-dimensional pixel space is mapped to a simpler latent space that can be sampled from

Core Probability Theory

Five key terms

Sample space

  • The sample space is the complete set of all values an observation x can take

Probability density function (or simply density function)

  • a function p(x) that maps a point x in the sample space to a number between 0, and 1. The integral of the density function over all points in the sample space must equal 1, so that it is a well-defined probability distribution.

Parametric modeling

  • a technique that we can use to structure our approach to finding a suitable Pmodel(x)

Likelihood

  • The likelihood ℒ(θ|𝐱) of a parameter set θ is a function that measures the plausibility of θ , given some observed point 𝐱

Maximum likelihood estimation

Generative Model Taxonomy

While all types of generative models ultimately aim to solve the same task, they all take slightly different approaches to modeling the density function. There are three possible approaches:

  1. Explicitly model the density function, but constrain the model in some way, so that the density function is tractable(i.e., it can be calculated).
  2. Explicitly model a tractable approximation of the density function.
  3. Implicitly model the density function, through a stochastic process that directly generates data.
figure 1-10 A taxonomy of generative modeling approaches

The best-known example of an implicit generative model is a generative adversarial network.

Approximate density models include variational autoencoders.

A common thread that runs through all of the generative model family types is deep learning

Deep Learning

Deep Neural Networks

The majority of deep learning systems are artificial neural networks(ANNs, or just neural networks for short) with multiple stacked hidden layers.

What is a Neural Network?

A neural network consists of a series of stacked layers. Each layer contains units that are connected to the previous layer’s units through a set of weights.

One of the most common layers is the fully connected (or dense) layer that connects all units in the layer directly to every unit in the previous layer.

Neural networks where all adjacent layers are fully connected are called multilayer perceptrons(MLPs).

figure 2-2 An example of a multilayer perceptron that predicts if a face is smiling

Let’s walk through the network shown in figure 2-2

  1. Unit A receives the value for an individual channel of an input pixel
  2. Unit B combines its input values so that it fires strongest when a particular low-level feature such as an edge is present.
  3. Unit C combines the low-level features so that it fires strongest when a higher-level feature such as teeth are seen in the image.
  4. Unit D combines the high-level features to that it fires strongest when the person in the original image is smiling.

The layers between the input and output layers are called hidden layers.

Multilayer Perceptron(MLP)

The MLP is a discriminative (rather than generative) model, but supervised learning will still play a role in many types of generative models.

Preparing the Data -> Building the Model(Layers, Activation functions) -> Compiling the Model(Loss functions, Optimizers) -> Training the Model -> Evaluating the Model

  • There are many kinds of activation function, but three of the most important are ReLU(rectified linear unit), sigmoid, and softmax.
  • Three of the most commonly used loss functions are mean square error, categorical cross-entropy, and binary cross entropy.
  • One of the most commonly used and stable optimizer is Adam(Adaptive Moment Estimation).

Convolutional Neural Network(CNN)

To make the network perform well, the spatial structure of the input images needs to be taken into account.

Convolutional Layers

The convolution is performed by multiplying the filter pixel-wise with the portion of the image, and summing the results. The output is more positive when the portion of the image closely matches the filter and more negative when the portion of the image is the inverse of the filter.

figure 2-10 A3x3 convolutional filter applied to two portions of a grayscale image

If we move the filter across the entire image from left to right and top to bottom, recording the convolutional output as we go, we obtain a new array that picks out a particular feature of the input, depending on the values in the filter.

figure 2-11 Two convolutional filters applied to a grayscale image

A convolutional layer is simply a collection of filters, where the values stored in the filters are the weights that are learned by the neural network through training. Initially these are random, but gradually the filters adapt their weights to start picking out interesting features such as edges or particular color combinations.

figure 2-12 A 3 x 3 x 1 kernel (gray) being passed over a 5 x 5 x 1 input image (blue), with padding = "same" and strides = 1, to generate the 5 x 5 x 1 output (green)

Stride

The strides parameter is the stop size used by the layer to move the filters across the input. Increasing the stride therefore reduces the size of the output tensor.

Padding

The padding = "same" input parameter pads the input data with zeros so that the output size from the layer is exactly the same as the input size when strides = 1.

Stacking convolutional layers

The output of a Conv2D layer is another four-dimensional tensor, now of shape (batch_size, height, width, filters), so we can stack Conv2D layers on top of each other to grow the depth of our neural network and make it more powerful.

from tensorflow.keras import layers, models

input_layer = layers.Input(shape=(32,32,3))
conv_layer_1 = layers.Conv2D(
    filters = 10
    , kernel_size = (4,4)
    , strides = 2
    , padding = 'same'
    )(input_layer)
conv_layer_2 = layers.Conv2D(
    filters = 20
    , kernel_size = (3,3)
    , strides = 2
    , padding = 'same'
    )(conv_layer_1)
flatten_layer = layers.Flatten()(conv_layer_2)
output_layer = layers.Dense(units=10, activation = 'softmax')(flatten_layer)
model = models.Model(input_layer, output_layer)
figure 2-13 A diagram of a convolutional neural network

답글 남기기

이메일 주소는 공개되지 않습니다. 필수 필드는 *로 표시됩니다