Google深度學習實作

Google Brain 邱中鎮博士 Chung-Cheng Chiu

公開分享「深度學習入門手法」以及「如何利用 TensorFlow 打造深度學習模型」

Google Brain project start in 2011, Initial emphasis

  • Use large dataset
  • Large amount of computation
  • To push boundaries of what is possible in perception and language understanding

What is deep learning

  • loosely base on (what little) we know about brain
  • Each "neuron" is connected to some neurons in a prior layer
  • Base on what is sees, it decides what it want to say
  • Neurons learn to cooperate to complete the task

簡單示意公式

ReLU, REctified linear unit

Minimize the difference between target and with respect to x

Optimization

  • Gradient descent, calculate gradients of x by chain rule with back-propagation

The architecture is inspired by the actual neurons

Reinforcement Deep Q Neural Art Char-RNN Keras Neuro tensorflow-zh

Machine Learning make send

Some areas we've published in

  • Object recognition in images, Ethan et al 2014
  • Object category discovery in video, Le et al ICML 2012
  • Speech recognition (Vanhoucke et al, NIPS Workshop 2011)
  • Annotating images with text (Vinyals et al, arXiv 2014)
  • Pedestrian detection for self-driving cars (Angelova et al., 2014)
  • OCR: reading text from images(Goodfellow et al, ICLR 2014)

Universal Machine Learning that works better than the alternatives

Machine learning is a rocket engine, Data is the rocket fuel, big data 讓機器學習

很容易scale up的演算法

  • conditional probability

Large datasets + powerful Models

Experiment turnaround time and research productivity

  • Minutes, Hours
    • interactive research! instant gratification!
  • 1~4 days Tolerable
    • Interactivity replaced by running many experiments in parallel
  • 1~4 weeks, Progress stalls
    • High value experiments only
  • 1 month, Don't even try

  • Model Parallelism

    • single core (SIMD)
    • Across core
      • Thread parallelism (QPI on intel)
    • Across devies
      • For GPUs, often limited by PCIe bandwidth.
    • Across Machines
      • Limited by network bandwidth/lantecy
  • Data Parallelism
    • use multiple model replicas to process different examples at the same time
    • All collaborate to update model
      • Parameter Server
        • sample sub data by SDE gradient
    • Synchronously, N replicas equivalent to an
    • Asynchronouly

Recent progress in Deep Learning

  • Image Recognition (6.66 % error rate, 一年錯誤率減半)
  • Speech Recognition
  • Sequence to Sequence
    • Generate Image Captions from Pixels
    • Translate to
  • Gmail Spam
  • Smart Reply
  • Google photo search
  • Text detection in street view, 自動取得街景資訊
  • Rank Brain
  • Deep Dream

設計、parameter , design model architecture 32-bit > 8 bit

deep learning on perception, need large dataset text

需要熟悉domain 建立合適的deep learning model architecture

  • based on limitation and parameters to 設計model architecture

deep learning

  • feature extraction

將domain knowledge拿掉 How to create a deep learning receipt?

Tensorflow implement

  • Open software standard for machine learning/ deep learning
  • Make it easier to exchange research idea

Tensor

  • vector 1-D tensor
  • matrix 2-D tensor

Apply simple ReLU network

  • per-neuron operation
  • matrix operation

define computation flow

  • import tensorflow as tf
  • y= tf.matmul(x,w)
  • out = tf.nn.relu(y

define tensors

  • tf.Variable(tf.random_normal([3,3], name = 'w')
  • y = tf.matmul(x, y)
  • variables stores the state of current execution
  • others are operations

session graph

  • sess = tf.Session()
  • results = sess.run(relu_out)

Fetch

  • retrive content fro a node
  • fetch the liquid

Initial Variables

Placeholder

  • its content will be filled
  • x = placeholder("float", [1,3])

Feed

  • pump liquid into the pipe
  • import numpy as np
  • feed_dict = x:np.array([[1.0,2.0,3.0]])

session management

Prediction:

  • softmax: make predictions for n targets that sum to 1

Prediction difference

  • answer - prediction

Define loss function

  • softmax_cross_entropy_with_logits

Optimization

  • Gradient descent, learning rate = 0.1

Interctive update

  • 計算所有的variable

Add biases

  • logit = tf.matmul(relu_out, soft_max_w) + softmax_b

Add layers

  • for layer in range

Visualize the graph

  • TensorBoard
  • tf.trains.SummaryWriter
  • improve naming, improve visualization
  • add name_scope to improve visualization

Add regularization to the loss

  • tensor auto support gradient
  • tools sublime

Use activation as bias

Residual learning

  • ILSVRC 2015 classification task winer

Add summaries

Save and load models

  • That's why we need to run initialization separately

Convolution Layer -Convolution2D

LSTM -BasicLSTMCell

Deep learning for text

  • word embedding, 3D dense vector
  • Skip-gram Text model, Know a word by company it with, JR Firth

Word2Vec (*)

  • tf.nn.embedding_lookup
  • tf.nn.nce_loss

Image recognition

  • Inception-v3, open source to use

Tensorflow - distributed

  • distribution configuration

Placement Algorithm Network optimization

  • Hashing 減少 word2vec index數量
  • AlexNet, VGG
  • Spark 分散式運算 Spark Deep learning
  • DistBelief
  • Impressing: Search Ranking
  • Tensorflow: data size
  • text 需要較大的資料量,影像跟語音辨識
  • Syndosize: music, image

Learning model with loss function

results matching ""

    No results matching ""