NeuroWhAI의 잡블로그

[Rust] 2층 신경망으로 MNIST 학습 - '밑바닥부터 시작하는 딥러닝' 4장 본문

개발 및 공부/알고리즘

[Rust] 2층 신경망으로 MNIST 학습 - '밑바닥부터 시작하는 딥러닝' 4장

NeuroWhAI 2018. 7. 15. 12:45


※ 실제로 동작하는 전체 소스코드는 GitHub에서 보실 수 있습니다.


이번에는 이때까지 구현한 기능을 조합하여 2층 신경망을 만들어봅니다.
다만 수치 미분을 사용했기에 매우 느려 학습이 잘 되는지는 확인하지 못했으니 대충 흐름만 보시면 되겠습니다.
코드에서 MNIST 데이터셋과 관련된 부분은 포함하지 않았으니 관심있으신 분은 GitHub에서 보시면 되겠습니다.
또한 속도를 높히기 위하여 데이터셋의 일부만 학습에 사용하였고
테스트를 위해 사용했던 해석적 미분을 이용한 부분은 주석 처리 해두었습니다.

코드:
use std::f32;
use rulinalg::matrix::{Matrix, BaseMatrix, BaseMatrixMut};
use rand;
use ch03::activation;
use common::utils;
use super::{gradient, loss as loss_function};

pub struct TwoLayerNet {
    pub w1: Matrix<f32>,
    pub b1: Matrix<f32>,
    pub w2: Matrix<f32>,
    pub b2: Matrix<f32>,
}

impl TwoLayerNet {
    pub fn new(input_size: usize, hidden_size: usize, output_size: usize,
        w_init_std: f32) -> Self {
        
        TwoLayerNet {
            w1: Matrix::from_fn(input_size, hidden_size, |_, _| rand::random::<f32>() * w_init_std),
            b1: Matrix::zeros(1, hidden_size),
            w2: Matrix::from_fn(hidden_size, output_size, |_, _| rand::random::<f32>() * w_init_std),
            b2: Matrix::zeros(1, output_size),
        }
    }
    
    pub fn predict(&self, x: &Matrix<f32>) -> Matrix<f32> {
        let mut a1 = x * &self.w1;
        for mut row in a1.row_iter_mut() {
            *row += &self.b1;
        }
        
        let z1 = activation::sigmoid(a1);
        
        let mut a2 = &z1 * &self.w2;
        for mut row in a2.row_iter_mut() {
            *row += &self.b2;
        }
        
        let y = activation::softmax(a2);
        
        y
    }
    
    pub fn loss(&self, x: &Matrix<f32>, t: &Matrix<f32>) -> f32 {
        let y = self.predict(x);
        loss_function::cross_entropy_error(&y, t)
    }
    
    pub fn accuracy(&self, x: &Matrix<f32>, t: &Matrix<f32>) -> f32 {
        let y = self.predict(x);
        let y = utils::argmax(&y);
        
        let t = utils::argmax(t);
        
        let mut correct = 0;
        
        for (v1, v2) in y.iter().zip(t.iter()) {
            if (v1 - v2).abs() < f32::EPSILON {
                correct += 1;
            }
        }
        
        correct as f32 / t.rows() as f32
    }
    
    pub fn numerical_gradient(&mut self, x: &Matrix<f32>, t: &Matrix<f32>)
        -> (Matrix<f32>, Matrix<f32>, Matrix<f32>, Matrix<f32>) {
        
        let p_net = self as *mut TwoLayerNet;
        unsafe {
            (gradient::numerical_gradient(|_| (*p_net).loss(x, t), &mut (*p_net).w1),
            gradient::numerical_gradient(|_| (*p_net).loss(x, t), &mut (*p_net).b1),
            gradient::numerical_gradient(|_| (*p_net).loss(x, t), &mut (*p_net).w2),
            gradient::numerical_gradient(|_| (*p_net).loss(x, t), &mut (*p_net).b2))
        }
    }
    
    /*pub fn gradient(&mut self, x: &Matrix<f32>, t: &Matrix<f32>)
        -> (Matrix<f32>, Matrix<f32>, Matrix<f32>, Matrix<f32>) {
        
        let f_batch_size = t.rows() as f32;
        
        
        let mut a1 = x * &self.w1;
        for mut row in a1.row_iter_mut() {
            *row += &self.b1;
        }
        
        let z1 = activation::sigmoid(utils::copy_matrix(&a1));
        
        let mut a2 = &z1 * &self.w2;
        for mut row in a2.row_iter_mut() {
            *row += &self.b2;
        }
        
        let y = activation::softmax(a2);
        
        
        let dy = (y - t) / f_batch_size;
        let grad_w2 = &z1.transpose() * &dy;
        let grad_b2 = Matrix::new(1, self.b2.cols(), dy.sum_rows().into_iter().collect::<Vec<_>>());
        
        let da1 = &dy * &self.w2.transpose();
        let dz1 = (-activation::sigmoid(utils::copy_matrix(&a1)) + 1.0) * activation::sigmoid(a1);
        let dz1 = dz1.elemul(&da1);
        let grad_w1 = &x.transpose() * &dz1;
        let grad_b1 = Matrix::new(1, self.b1.cols(), dz1.sum_rows().into_iter().collect::<Vec<_>>());
        
        (grad_w1, grad_b1, grad_w2, grad_b2)
    }*/
}
fn test_net() {
    let mnist = Mnist::new();
    let mut net = TwoLayerNet::new(784, 100, 10, 0.01);
    
    let iters_num = 100;
    let train_size = mnist.train_x.rows() / 100;
    let batch_size = 100;
    let learning_rate = 0.1;
    
    for _ in 0..iters_num {
        let mut step = 0.0;
        let mut loss = 0.0;
        let mut acc = 0.0;
    
        let mut batch_offset = 0;
        
        while batch_offset < train_size {
            let batch_range = (batch_offset..(batch_offset + batch_size).min(train_size))
                .collect::<Vec<_>>();
            let batch_x = mnist.train_x.select_rows(&batch_range[..]);
            let batch_y = mnist.train_y.select_rows(&batch_range[..]);
                
            step += 1.0;
            loss += net.loss(&batch_x, &batch_y);
            acc += net.accuracy(&batch_x, &batch_y);
            
            let (w1, b1, w2, b2) = net.numerical_gradient(&batch_x, &batch_y);
            //let (w1, b1, w2, b2) = net.gradient(&batch_x, &batch_y);
            
            net.w1 -= w1 * learning_rate;
            net.b1 -= b1 * learning_rate;
            net.w2 -= w2 * learning_rate;
            net.b2 -= b2 * learning_rate;
            
            batch_offset += batch_size;
        }
        
        println!("Loss: {}, Acc: {}", loss / step, acc / step);
    }
}

결과:
Loss: 2.0029867, Acc: 0.096
Loss: 2.0055854, Acc: 0.076
Loss: 2.0045586, Acc: 0.076
Loss: 2.0037422, Acc: 0.076
...

갈수록 손실이 줄어드는걸 볼 수 있습니다.




Comments