Google深度學習實作

Google Brain 邱中鎮博士 Chung-Cheng Chiu

公開分享「深度學習入門手法」以及「如何利用 TensorFlow 打造深度學習模型」

Google Brain project start in 2011, Initial emphasis

Use large dataset
Large amount of computation
To push boundaries of what is possible in perception and language understanding

What is deep learning

loosely base on (what little) we know about brain
Each "neuron" is connected to some neurons in a prior layer
Base on what is sees, it decides what it want to say
Neurons learn to cooperate to complete the task

簡單示意公式
${y = g(w * x _ b) }$

ReLU, REctified linear unit $b_1 = max(b, 0)$

Minimize the difference between target and $b_2$ with respect to x

Optimization

Gradient descent, calculate gradients of x by chain rule with back-propagation

The architecture is inspired by the actual neurons

Reinforcement Deep Q Neural Art Char-RNN Keras Neuro tensorflow-zh

Machine Learning make send

Some areas we've published in

Object recognition in images, Ethan et al 2014
Object category discovery in video, Le et al ICML 2012
Speech recognition (Vanhoucke et al, NIPS Workshop 2011)
Annotating images with text (Vinyals et al, arXiv 2014)
Pedestrian detection for self-driving cars (Angelova et al., 2014)
OCR: reading text from images(Goodfellow et al, ICLR 2014)

Universal Machine Learning that works better than the alternatives

Machine learning is a rocket engine, Data is the rocket fuel, big data 讓機器學習

很容易scale up的演算法

conditional probability

Large datasets + powerful Models

Experiment turnaround time and research productivity

Minutes, Hours
- interactive research! instant gratification!
1~4 days Tolerable
- Interactivity replaced by running many experiments in parallel
1~4 weeks, Progress stalls
- High value experiments only
1 month, Don't even try
Model Parallelism
- single core (SIMD)
- Across core
  - Thread parallelism (QPI on intel)
- Across devies
  - For GPUs, often limited by PCIe bandwidth.
- Across Machines
  - Limited by network bandwidth/lantecy
Data Parallelism
- use multiple model replicas to process different examples at the same time
- All collaborate to update model
  - Parameter Server
    - sample sub data by SDE gradient
- Synchronously, N replicas equivalent to an
- Asynchronouly

Recent progress in Deep Learning

Image Recognition (6.66 % error rate, 一年錯誤率減半)
Speech Recognition
Sequence to Sequence
- Generate Image Captions from Pixels
- Translate to
Gmail Spam
Smart Reply
Google photo search
Text detection in street view, 自動取得街景資訊
Rank Brain
Deep Dream

設計、parameter , design model architecture 32-bit > 8 bit

deep learning on perception, need large dataset text

需要熟悉domain 建立合適的deep learning model architecture

based on limitation and parameters to 設計model architecture

deep learning

feature extraction

將domain knowledge拿掉 How to create a deep learning receipt?

Tensorflow implement

Open software standard for machine learning/ deep learning
Make it easier to exchange research idea

Tensor

vector 1-D tensor
matrix 2-D tensor

Apply simple ReLU network

per-neuron operation
matrix operation

define computation flow

import tensorflow as tf
y= tf.matmul(x,w)
out = tf.nn.relu(y

define tensors

tf.Variable(tf.random_normal([3,3], name = 'w')
y = tf.matmul(x, y)
variables stores the state of current execution
others are operations

session graph

sess = tf.Session()
results = sess.run(relu_out)

Fetch

retrive content fro a node
fetch the liquid

Initial Variables

Placeholder

its content will be filled
x = placeholder("float", [1,3])

Feed

pump liquid into the pipe
import numpy as np
feed_dict = x:np.array([[1.0,2.0,3.0]])

session management

Prediction:

softmax: make predictions for n targets that sum to 1

Prediction difference

answer - prediction

Define loss function

softmax_cross_entropy_with_logits

Optimization

Gradient descent, learning rate = 0.1

Interctive update

計算所有的variable

Add biases

logit = tf.matmul(relu_out, soft_max_w) + softmax_b

Add layers

for layer in range

Visualize the graph

TensorBoard
tf.trains.SummaryWriter
improve naming, improve visualization
add name_scope to improve visualization

Add regularization to the loss

tensor auto support gradient
tools sublime

Use activation as bias

Residual learning

ILSVRC 2015 classification task winer

Add summaries

Save and load models

That's why we need to run initialization separately

Convolution Layer -Convolution2D

LSTM -BasicLSTMCell

Deep learning for text

word embedding, 3D dense vector
Skip-gram Text model, Know a word by company it with, JR Firth

Word2Vec (*)

tf.nn.embedding_lookup
tf.nn.nce_loss

Image recognition

Inception-v3, open source to use

Tensorflow - distributed

distribution configuration

Placement Algorithm Network optimization

Hashing 減少 word2vec index數量
AlexNet, VGG
Spark 分散式運算 Spark Deep learning
DistBelief
Impressing: Search Ranking
Tensorflow: data size
text 需要較大的資料量，影像跟語音辨識
Syndosize: music, image

Learning model with loss function

Google深度學習實作

Google深度學習實作

公開分享「深度學習入門手法」以及「如何利用 TensorFlow 打造深度學習模型」

Recent progress in Deep Learning

Tensorflow implement

Tensor

define computation flow

results matching ""

No results matching ""