Recent Changes - Search:

Main

Game

Research

PmWiki

edit SideBar

Equations

  • Sigmoid Function for Sigmoid Neurons is given by {$\sigma$} in (1)

{$$ \tag{1} \sigma(z) = { 1 \over 1 + e^{-z}} $$} {$$ \tag{2} {\partial \sigma \over \partial z } = \sigma(z) (1 - \sigma(z)) $$}


  • Activation Function For a Layer is given by {$a$} in (3). {$z$} is simply a shorthand

{$$ z^l = w^l a^{l-1} + b^l $$} {$$ \tag{3} a^l = \sigma(z^l) $$}

  • Knowing the change in the activation for a bias or a weight is important later

{$$ {\partial a^l \over \partial b^l} = \sigma^\prime(z^l) $$} {$$ {\partial a^l \over \partial w^l} = a^{l-1} \sigma^\prime(z^l) $$}


  • Once all the partials have been computed, we can update the weights and biases, here v is a particular weight or bias

{$$ \tag{4} {v^\prime = v - \eta {\partial C \over \partial v } } $$}

  • Computing the partials depends on the choosen cost function. We'll discuss some here

  • Mean Square Error {$C(w,b)$} in (5)

{$$ \tag{5} C(w,b) = \frac{1}{2n} \sum_x \| y(x) - a \|^2 $$}

  • We'll need to know the changes in the cost function with respect to each bias and weight for later
  • Here {$L$} denotes the last layer and {$l$} denotes others (or all layers)

{$$ \tag{6} \delta^l = \nabla C_a \odot \sigma^\prime(z^l) $$} {$$ \tag{7} \nabla C_{a^L} = a_j^L - y_j $$} {$$ \tag{8} \nabla C_{a^l} = (w^{l+1})^T \delta^{l+1} \odot \sigma^\prime(z^l)$$} {$$ \tag{9} {\partial C \over \partial b^l} = \delta^l $$} {$$ \tag{10} {\partial C \over \partial w^l} = a^{l-1} \delta^l $$}


  • Cross-Entropy Cost Function

{$$ C = - \| y \ln (a) + (1-y) \ln(1-a) \| $$}

Derivation of C'(b)

{$$ C(b) = - D( b ) $$} {$$ D(b) = y E(b) + (1-y) G(b) $$} {$$ E(b) = \ln ( \sigma(wa+b) ) $$} {$$ G(b) = \ln ( 1 - \sigma(wa+b) ) $$} {$$ C'(b) = - D'(b) $$} {$$ D'(b) = y E'(b) + (1-y) G'(b) $$} {$$ E'(b) = (1/a)* \sigma'$$} {$$ E'(b) = 1-a $$} {$$ G'(b) = 1/(1-a) * - \sigma' $$} {$$ G'(b) = -a $$} {$$ D'(b) = y(1-a) + (1-y)(-a) ==> y-ya -a+ya ==> y-a $$}

{$$ C'(b) = a-y $$} {$$ C'(w_{jk}^l) = a_k^{l-1}(a_j^l-y) $$}


  • Softmax - Final Activation Layer

{$$ a_j^L = { e^{z_j^L} \over \sum_k e^{z_k^L} } $$}

  • Cost Function Log-Likelihood

{$$ C = -\ln( a_y^L ) $$}

Derivation of {$ {\partial C \over \partial b} $} in Softmax with Log-Likelihood

{$$ x_k = e^{z_k^L} $$} {$$ x_k^\prime(b_y^L) = x_k $$} {$y$} is the classification/activation we want - it is 1 {$x_y$} and {$x_m$} are denoted as separate for future derivations {$$ f = \sum_j x_j | j \neq y $$} {$$ s = f + x_y $$} {$$ a_y = {x_y \over s } $$} {$$ D_y(z) = a_y^L = { e^z_y \over e^z_y + f} $$} {$$ D_y^\prime(z) = { f e^z \over (f + e^z)^2} $$} {$$ C^\prime(b_y^L) = - { e^z + f \over e^z } * { f e^z \over (e^z + f)^2 } = { -f \over e^z + f } $$} {$$ C^\prime(b_y^L) = { x-s\over s} $$} {$$ C^\prime(b_y^L) = a_y - 1 $$} {$$ C^\prime(b_k^L) = a_k | k \neq y $$}

{$$ C^\prime(b_j^L) = a^L_j - y(x) $$} {$$ C^\prime(w_{jk}^L) = a^{L-1}_k (a^L_j - y(x)) $$}



Description of variables in equations above

Description of variables in equation (5)
VariableDescription
{$ w $}is the entire matrix of weights (for all connections between layers)
{$ b $}is the matrix of biases (for all layers except first)
{$ x $}inputs to neural network {$x$} is a matrix of inputs for a number of trainings
{$ n $}number of training samples
{$ y(x) $}the trained/expected value from the net given the inputs {$x$}, each y(x) is a vector of outputs
{$ a $}the computed outputs from the net given a specific input {$x$}, The last activations are {$a^L$}


Edit - History - Print - Recent Changes - Search
Page last modified on June 10, 2016, at 12:55 PM