Learn machine learning through TensorFlow, and achieve a simple NST work.
What is TensorFlow?
Tensorflow is a machine learning framework, created by Google; it can be used to design, build and train deep.
What is tensor?
We can compare a tensor to a video file. A video consists of a series of frames, and each frame includes many individual color values. The collection of frames can be regarded as a tensor. Each individual color value can be regarded as a scalar, and a group of color values can be regarded as a vector, for example, RGBA can be regarded as a 4D vector (xyzw). Vector blocks can be thought as tensors.
Another simple explanation: tensors are data. They can be lists of numbers or vectors of any dimensionality. They can be multidimensional data blocks, such as a set of images or videos. Tensorflow essentially allows users to construct a large amount of digital data, and then use CPU, GPU and even TPU to process the data, but in essence it is still a large number of multiplications and additions.
This is a small exercise to implement NST using Python and tensorflow, you also can find it in colab:
Extract the intermediate layer values using the Keras functional API tf.keras.applications.
1 2 3 4 5 6 7 8 9 10
defvgg_layers(layer_names): """Creates a vgg model that returns a list of intermediate output values.""" # Load our model and pretrained VGG, trained on tmagenet data vgg = tf.keras.applications.VGG19(include_top=False, weights='imagenet') vgg.trainable = False
outputs = [vgg.get_layer(name).output for name in layer_names]
model = tf.keras.Model([vgg.input], outputs) return model
# Look at the statistics of each layer's output for name, output inzip(style_layers, style_outputs): print(name) print(" shape: ", output.numpy().shape) print(" min: ", output.numpy().min()) print(" max: ", output.numpy().max()) print(" mean: ", output.numpy().mean()) print()
The style of images can be described by the means and correlations across the different feature maps. We can tf.linalg.einsum function to achieve that.
# Define a `tf.Variable` to contain the image to optimize. image = tf.Variable(content_image)
# Define a function to keep the pixel values between 0 and 1. defclip_0_1(image): return tf.clip_by_value(image, clip_value_min = 0.0, clip_value_max = 1.0)
# Create an optimizer with `Adam`. opt = tf.optimizers.Adam(learning_rate=0.02, beta_1=0.99, epsilon=1e-1)
# Use a weighted combination of the two losses to get the total loss. style_weight = 1e-2 content_weight=1e4
defstyle_content_loss(outputs): style_outputs = outputs['style'] content_outputs = outputs['content'] style_loss = tf.add_n([tf.reduce_mean((style_outputs[name]-style_targets[name])**2) for name in style_outputs.keys()]) style_loss *= style_weight / num_style_layers
content_loss = tf.add_n([tf.reduce_mean((content_outputs[name]-content_targets[name])**2) for name in content_outputs.keys()]) content_loss *= content_weight / num_content_layers loss = style_loss + content_loss return loss
Use tf.GradientTape to update the image.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
@tf.function() deftrain_step(image): with tf.GradientTape() as tape: outputs = extractor(image) loss = style_content_loss(outputs)
grad = tape.gradient(loss, image) opt.apply_gradients([(grad, image)]) image.assign(clip_0_1(image))
step = 0 for n inrange(epochs): for m inrange(steps_per_epoch): step += 1 train_step(image) print(".", end='') display.clear_output(wait=True) display.display(tensor_to_image(image)) print("Train step: {}".format(step))
end = time.time() print("Total time: {:.1f}".format(end-start))
It produces alot of high frequency artifacts. Using an explicit regularization term to decrease those.
# Rerun the Optimization. total_variation_weight = 30
@tf.function() deftrain_step(image): with tf.GradientTape() as tape: outputs = extractor(image) loss = style_content_loss(outputs) loss += total_variation_weight*tf.image.total_variation(image)
grad = tape.gradient(loss, image) opt.apply_gradients([(grad, image)]) image.assign(clip_0_1(image))
# Reinitialize the optimization variable. image = tf.Variable(content_image)
# Rerun the optimization. import time start = time.time()
epochs = 10 steps_per_epoch = 50
step = 0 for n inrange(epochs): for m inrange(steps_per_epoch): step += 1 train_step(image) print(".", end='') display.clear_output(wait=True) display.display(tensor_to_image(image)) print("Train step: {}".format(step))
end = time.time() print("Total time: {:.1f}".format(end-start))
About this Post
This post is written by Siqi Shu, licensed under CC BY-NC 4.0.