Skip to content

Latest commit

 

History

History
80 lines (59 loc) · 2.52 KB

random.md

File metadata and controls

80 lines (59 loc) · 2.52 KB

WIP

This repository will have code and slides for my upcoming ScalaX talk: neural network from scratch in scala

key components/concepts:

  • fs2 for concurrent & stream; compare with vectorized Python
  • "supervised learning"
  • load visualise your data http://yann.lecun.com/exdb/mnist/
    • scala is pretty bad at this
    • load them as stream?
  • forward propagation
  • backward propagation
  • cost function
  • cross validation; training error, testing error
  • optimizations
    • initialization
  • latest technologies/algorithms
    • evolution of error rates
    • convnet

check Tensorflow interface; implement MNIST in Tensorflow and compare speed

What's new in neural network:

  • nonlinear function approximation

Computation Graph tensorflow whitepaper: https://www.tensorflow.org/about/bib#tensorflow_a_system_for_large-scale_machine_learning http://colah.github.io/posts/2015-08-Backprop/ http://download.tensorflow.org/paper/white_paper_tf_control_flow_implementation_2017_11_1.pdf

Backprop:

  • makes training neural network feasible
  • technique to calculate derivatives quickly
  • derivatives are unintuitively cheap
  • use State monad to store calculated derivatives
  • node is definitely a container
  • for graph, scala is great for data modeling

Type is great! easier to debug Interface and inheritance is great!

Problems with scala:

  • scala/jvm's image library is embarrasing; eg. no pyplot.imshow(numpy array) equivalent
  • scala's file io is embarrasing, not easy to read/write to file (compared to Python's with open())
  • no good way to "vectorize" -- deeplearning4j

But why?

  • research code is error prone (quote: "RL does not work yet", openAI paper on RND on bug)
  • an ideal research language:
    • native support for parallel numerical computation on gpu
    • typed, higher kind support, basic FP
    • easy to notebook: plot, notebook should be git friendly

Goal in talk

  • dymystify neural network
  • show how "general" and how extensible this idea is: you can stack up arbitrary neurons and activation functions

composability: matmul + add => Linear layer; compare with Keras api https://www.tensorflow.org/tutorials/

possible questions:

  • SGD
  • what is the use of optimizer?

can't really shuffle

anecdote: i generalised from Double (Scalar) to Vector, then to Matrix, both succeeded in one go; guess that's what type gives you

notice one hidden layer reduce loss very quickly

plot loss

test accuracy: 98.99%

explain why one hidden layer is worse: not enough training data (6000 data points, x parameters to tune)

show the wrong prediction images?