機器學習看得見Lesson 5— 以視覺化的方式選擇常規化超參數的數值

9 min readOct 5, 2021

常規化（Regularisation）是建立機器學習模型當中，重要的超參數之一。正確設定此超參數，可以避免訓練模型的過程中發生過度配適（Overfitting），提高模型的應用能力。但是，怎麼選擇好的常規化超參數的數值呢？今天，我們用視覺化的方法，讓讀者可以一眼就看出最適合的常規化數值。

本文的程式是參考旗標出版的「自學機器學習 — 上Kaggle接軌世界，成為資料科學家」第6章以及第7章的程式，做部分修改並且加上常規化超參數的衡量方式。關於程式的細節描述，請參考本書的內容。

一、準備函式庫以及資料集

以下2段程式是先準備函式庫以及資料集。

import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D
from tensorflow.keras import optimizers, regularizers
from sklearn.model_selection import KFold

此範例使用Keras內建的CIFAR-10的資料，這是10個類別的多元圖像分類問題。

def prepare_data():
    (x, y), (x_test, y_test) = cifar10.load_data()
    x, x_test = x.astype('float32'), x_test.astype('float32')
    x, x_test = x/255.0, x_test/255.0
    y, y_test = to_categorical(y), to_categorical(y_test)
    return x, y, x_test, y_test

二、建立模型

由於CIFAR-10是圖像分類問題，因此我們使用卷積神經網路（Convolutional Neural Network）：3個卷積層、1個展平層、1個密集層、1個輸出層。優化器使用Adam，學習率為0.001。

為了要避免模型發生過度配適，這邊我們採用L2常規化，關於此方法的細節，請參考旗標出版的「資料科學的建模基礎 — 別急著coding！你知道模型的陷阱嗎？」第13章的內容。

def build_model(x_train, 
                y_train, 
                x_valid, 
                y_valid, 
                batch_size, 
                epochs, 
                alpha):    model = Sequential()
    model.add(Conv2D(filters = 64, 
                     kernel_size = 3, 
                     padding = 'same',
                     activation = 'relu', 
                     input_shape = (32,32,3),
                     kernel_regularizer = regularizers.l2(alpha)))
    model.add(MaxPooling2D(pool_size = 2))
    model.add(Conv2D(filters = 128, 
                     kernel_size = 3, 
                     padding = 'same',
                     activation = 'relu',
                     kernel_regularizer = regularizers.l2(alpha)))
    model.add(MaxPooling2D(pool_size = 2))
    model.add(Conv2D(filters = 256, 
                     kernel_size = 3, 
                     padding = 'same',
                     activation = 'relu',
                     kernel_regularizer = regularizers.l2(alpha))) 
    model.add(MaxPooling2D(pool_size = 2))
    model.add(Flatten()) 
    model.add(Dense(512, 
                    activation = 'relu',
                    kernel_regularizer = regularizers.l2(alpha)))
    model.add(Dense(10, 
                    activation = 'softmax',
                    kernel_regularizer = regularizers.l2(alpha)))    model.compile(loss = "categorical_crossentropy",
                  optimizer = optimizers.Adam(lr = 0.001),
                  metrics = ["accuracy"])    history = model.fit(x_train,
                        y_train,
                        batch_size = batch_size,
                        epochs = epochs,
                        verbose = 0,
                        validation_data = (x_valid, y_valid))    return history

三、訓練模型

我們用11個不同的常規化數值來訓練模型。但是如同上一篇文章提到，模型會有不確定性（Uncertainty），因此每一個常規化數值只訓練一次模型並不夠。所以我們這邊使用10-fold交叉驗證，讓每一個常規化數值都訓練10個fold，每個fold都記錄最後一個epoch的訓練資料跟驗證資料準確率。最後算出10個fold的準確率平均值跟標準差，來了解模型準確度變化的狀況。

x, y, x_test, y_test = prepare_data()
batch_size = 128
epochs = 20
alpha_val = []
train_avg = []
train_std = []
valid_avg = []
valid_std = []
for i in range(11):
    alpha = 0.00015 * i
    train_score = []
    valid_score = []
    kf = KFold(n_splits = 10)
    for train_index, valid_index in kf.split(x):
        x_train, x_valid = x[train_index], x[valid_index]
        y_train, y_valid = y[train_index], y[valid_index]
        history = build_model(x_train, 
                              y_train, 
                              x_valid, 
                              y_valid, 
                              batch_size, 
                              epochs, 
                              alpha)
        train_score.append(history.history['accuracy'][-1])
        valid_score.append(history.history['val_accuracy'][-1])
    
    train_avg.append(np.mean(train_score))
    train_std.append(np.std(train_score))
    valid_avg.append(np.mean(valid_score))
    valid_std.append(np.std(valid_score))
    alpha_val.append(alpha)

四、視覺化常規化數值

我們將常規化超參數跟模型平均準確率畫出來，並且使用errorbar函式來展現模型的不確定性。

plt.figure(figsize = (15, 10))
plt.errorbar(alpha_val, 
             train_avg, 
             yerr = train_std, 
             label = "Train")
plt.errorbar(alpha_val, 
             valid_avg, 
             yerr = valid_std, 
             label = "Validation")
plt.legend()
plt.grid()
plt.xlabel('Epoch')              
plt.ylabel('Accuracy')

從圖一可以發現，如果沒有使用常規化，訓練資料的準確度可以衝到接近1.0，但是驗證資料的分數大約只有0.75而已，顯然是發生了過度配適的問題。接下來我們要說明如何選擇較好的常規化超參數數值，我們只畫出驗證資料準確率跟常規化的關係圖，結果如圖二。

plt.figure(figsize = (15, 10))
plt.errorbar(alpha_val, 
             valid_avg, 
             yerr = valid_std, 
             label = "Validation")
plt.legend()
plt.grid()
plt.xlabel('Epoch')              
plt.ylabel('Accuracy')

選擇常規化超參數數值的方法，常見的有2個：

1、選擇可以讓模型效能最好的常規化超參數。

2、在比模型最好效能低1個標準差的常規化超參數中，選擇數值最大的。選用這方法的論點是：通常效能最好的常規化超參數，模型依舊有一點過度配適。

以圖二來說，使用方法1的話超參數值可以選0.00075，使用方法2的話超參數值可以選0.0012。

關於作者

Chia-Hao Li received the M.S. degree in computer science from Durham University, United Kingdom. He engages in computer algorithm, machine learning, and hardware/software codesign. He was former senior engineer in Mediatek, Taiwan. His currently research topic is the application of machine learning techniques for fault detection in the high-performance computing systems.