使用卷积神经网络进行图像识别(cifar10)

卷积神经网络(Convolutional Neural Networks, CNN)是一类包含卷积计算且具有深度结构的前馈神经网络(Feedforward Neural Networks),是深度学习(deep learning)的代表算法之一 。卷积神经网络具有表征学习(representation learning)能力,能够按其阶层结构对输入信息进行平移不变分类(shift-invariant classification),因此也被称为“平移不变人工神经网络(Shift-Invariant Artificial Neural Networks, SIANN)

此篇文章我将使用tensorflow的keras库尝试搭建精简过的alexnet演示卷积神经网络

导入cifar10训练集

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()
train_images, test_images = train_images / 255.0, test_images / 255.0
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']
print(train_labels.shape)
train_labels=train_labels.squeeze(axis=1)
print(train_labels.shape)
test_labels=test_labels.squeeze(axis=1)
(50000, 1)
(50000,)

发现训练标签与测试标签多了一个无用的轴,为了后续方便处理数据,去除无用的轴

查看是否成功导入

plt.imshow(train_images[0])
plt.show()
print(class_names[train_labels[0]])

建立keras神经网络

model=tf.keras.Sequential([
    tf.keras.layers.Conv2D(64,input_shape=(32,32,3),kernel_size=(3,3),activation='relu',padding='same'),
    tf.keras.layers.MaxPool2D(),
    tf.keras.layers.Conv2D(256,kernel_size=(3,3),activation='relu',padding='same'),
    tf.keras.layers.MaxPool2D(),
    tf.keras.layers.Conv2D(256,kernel_size=(3,3),activation='relu',padding='same'),
    tf.keras.layers.Conv2D(128,kernel_size=(3,3),activation='relu',padding='same'),
    tf.keras.layers.Conv2D(128,kernel_size=(3,3),activation='relu',padding='same'),
    tf.keras.layers.MaxPool2D(),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256,activation='relu'),
    tf.keras.layers.Dense(128,activation='relu'),
    tf.keras.layers.Dense(10)
])

建立的神经网络一共有12层,第一层是卷积层,卷积核大小为3*3,使用relu作为激活函数
第二层是池化层,默认缩放1/2
第三层也是卷积层
第四层为池化层
第五第六层和第七层连续三次卷积
第八层为池化层
第九层为平坦层,将二维图片转换为线性数据
第十层与第十一层为全连接层,激活函数同样使用relu
第十二层为输出层,由于样本有10类,所以神经元数量为10

AlexNet使用ReLU代替了Sigmoid,其能更快的训练,同时解决sigmoid在训练较深的网络中出现的梯度消失问题

模型总结

model.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d (Conv2D)             (None, 32, 32, 64)        1792      
                                                                 
 max_pooling2d (MaxPooling2D  (None, 16, 16, 64)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 16, 16, 256)       147712    
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 8, 8, 256)        0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 8, 8, 256)         590080    
                                                                 
 conv2d_3 (Conv2D)           (None, 8, 8, 128)         295040    
                                                                 
 conv2d_4 (Conv2D)           (None, 8, 8, 128)         147584    
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 4, 4, 128)        0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 2048)              0         
                                                                 
 dense (Dense)               (None, 256)               524544    
                                                                 
 dense_1 (Dense)             (None, 128)               32896     
                                                                 
 dense_2 (Dense)             (None, 10)                1290      
                                                                 
=================================================================
Total params: 1,740,938
Trainable params: 1,740,938
Non-trainable params: 0
_________________________________________________________________      

这里可以发现faltten层之前数据就已经被不断卷积为64个1*1的特征了,(后续操作证明即使这样模型也是可以跑的),这也是我删除一层卷积层的原因,因为到后面池化不了了

训练模型

model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics='accuracy')
history=model.fit(train_images, train_labels, epochs=10, 
                    validation_data=(test_images, test_labels))    
Epoch 1/10
1563/1563 [==============================] - 11s 7ms/step - loss: 1.5650 - accuracy: 0.4183 - val_loss: 1.3051 - val_accuracy: 0.5351
Epoch 2/10
1563/1563 [==============================] - 10s 7ms/step - loss: 1.1022 - accuracy: 0.6071 - val_loss: 1.0242 - val_accuracy: 0.6373
Epoch 3/10
1563/1563 [==============================] - 10s 7ms/step - loss: 0.8881 - accuracy: 0.6858 - val_loss: 0.9532 - val_accuracy: 0.6645
Epoch 4/10
1563/1563 [==============================] - 11s 7ms/step - loss: 0.7554 - accuracy: 0.7325 - val_loss: 0.8001 - val_accuracy: 0.7246
Epoch 5/10
1563/1563 [==============================] - 10s 7ms/step - loss: 0.6523 - accuracy: 0.7707 - val_loss: 0.8201 - val_accuracy: 0.7166
Epoch 6/10
1563/1563 [==============================] - 10s 7ms/step - loss: 0.5700 - accuracy: 0.7990 - val_loss: 0.8234 - val_accuracy: 0.7247
Epoch 7/10
1563/1563 [==============================] - 10s 7ms/step - loss: 0.4890 - accuracy: 0.8260 - val_loss: 0.8609 - val_accuracy: 0.7231
Epoch 8/10
1563/1563 [==============================] - 10s 7ms/step - loss: 0.4236 - accuracy: 0.8495 - val_loss: 0.8407 - val_accuracy: 0.7398
Epoch 9/10
1563/1563 [==============================] - 10s 7ms/step - loss: 0.3664 - accuracy: 0.8699 - val_loss: 0.8994 - val_accuracy: 0.7366
Epoch 10/10
1563/1563 [==============================] - 10s 7ms/step - loss: 0.3265 - accuracy: 0.8831 - val_loss: 0.9323 - val_accuracy: 0.7352

使用adam作为优化器(优化器决定了梯度下降的方式),使用SparseCategoricalCrossentropy作为损失函数(损失函数用于评估模型输出与真实数据之间的误差)

使用测试数据测试模型准确度

test_loss, test_acc = model.evaluate(test_images,  test_labels, verbose=2)

print('\nTest accuracy:', test_acc)
313/313 - 1s - loss: 0.9323 - accuracy: 0.7352 - 807ms/epoch - 3ms/step

Test accuracy: 0.7351999878883362

使用softmax作为激活函数可以将输出的数据转换为更为容易理解的范围

probability_model = tf.keras.Sequential([model, 
                                         tf.keras.layers.Softmax()])
predictions = probability_model.predict(test_images)

为绘图做准备

def plot_image(i, predictions_array, true_label, img):
  predictions_array, true_label, img = predictions_array, true_label[i], img[i]
  plt.grid(False)
  plt.xticks([])
  plt.yticks([])

  plt.imshow(img, cmap=plt.cm.binary)

  predicted_label = np.argmax(predictions_array)
  if predicted_label == true_label:
    color = 'blue'
  else:
    color = 'red'

  plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                100*np.max(predictions_array),
                                class_names[true_label]),
                                color=color)

def plot_value_array(i, predictions_array, true_label):
  predictions_array, true_label = predictions_array, true_label[i]
  plt.grid(False)
  plt.xticks(range(10))
  plt.yticks([])
  thisplot = plt.bar(range(10), predictions_array, color="#777777")
  plt.ylim([0, 1])
  predicted_label = np.argmax(predictions_array)

  thisplot[predicted_label].set_color('red')
  thisplot[true_label].set_color('blue')
i = 0
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions[i], test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i],  test_labels)
plt.show()
i = 1
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions[i], test_labels, test_images)
plt.subplot(1,2,2)
plot_value_array(i, predictions[i],  test_labels)
plt.show()
num_rows = 5
num_cols = 3
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
  plt.subplot(num_rows, 2*num_cols, 2*i+1)
  plot_image(i, predictions[i], test_labels, test_images)
  plt.subplot(num_rows, 2*num_cols, 2*i+2)
  plot_value_array(i, predictions[i], test_labels)
plt.tight_layout()
plt.show()

从上面的测试结果中可以看出,模型对于大部分的图片预测还是很准确的

发表评论

您的电子邮箱地址不会被公开。