Google深度學習實作
Google Brain 邱中鎮博士 Chung-Cheng Chiu
公開分享「深度學習入門手法」以及「如何利用 TensorFlow 打造深度學習模型」
Google Brain project start in 2011, Initial emphasis
- Use large dataset
- Large amount of computation
- To push boundaries of what is possible in perception and language understanding
What is deep learning
- loosely base on (what little) we know about brain
- Each "neuron" is connected to some neurons in a prior layer
- Base on what is sees, it decides what it want to say
- Neurons learn to cooperate to complete the task
簡單示意公式
ReLU, REctified linear unit
Minimize the difference between target and with respect to x
Optimization
- Gradient descent, calculate gradients of x by chain rule with back-propagation
The architecture is inspired by the actual neurons
Reinforcement Deep Q Neural Art Char-RNN Keras Neuro tensorflow-zh
Machine Learning make send
Some areas we've published in
- Object recognition in images, Ethan et al 2014
- Object category discovery in video, Le et al ICML 2012
- Speech recognition (Vanhoucke et al, NIPS Workshop 2011)
- Annotating images with text (Vinyals et al, arXiv 2014)
- Pedestrian detection for self-driving cars (Angelova et al., 2014)
- OCR: reading text from images(Goodfellow et al, ICLR 2014)
Universal Machine Learning that works better than the alternatives
Machine learning is a rocket engine, Data is the rocket fuel, big data 讓機器學習
很容易scale up的演算法
- conditional probability
Large datasets + powerful Models
Experiment turnaround time and research productivity
- Minutes, Hours
- interactive research! instant gratification!
- 1~4 days Tolerable
- Interactivity replaced by running many experiments in parallel
- 1~4 weeks, Progress stalls
- High value experiments only
1 month, Don't even try
Model Parallelism
- single core (SIMD)
- Across core
- Thread parallelism (QPI on intel)
- Across devies
- For GPUs, often limited by PCIe bandwidth.
- Across Machines
- Limited by network bandwidth/lantecy
- Data Parallelism
- use multiple model replicas to process different examples at the same time
- All collaborate to update model
- Parameter Server
- sample sub data by SDE gradient
- Parameter Server
- Synchronously, N replicas equivalent to an
- Asynchronouly
Recent progress in Deep Learning
- Image Recognition (6.66 % error rate, 一年錯誤率減半)
- Speech Recognition
- Sequence to Sequence
- Generate Image Captions from Pixels
- Translate to
- Gmail Spam
- Smart Reply
- Google photo search
- Text detection in street view, 自動取得街景資訊
- Rank Brain
- Deep Dream
設計、parameter , design model architecture 32-bit > 8 bit
deep learning on perception, need large dataset text
需要熟悉domain 建立合適的deep learning model architecture
- based on limitation and parameters to 設計model architecture
deep learning
- feature extraction
將domain knowledge拿掉 How to create a deep learning receipt?
Tensorflow implement
- Open software standard for machine learning/ deep learning
- Make it easier to exchange research idea
Tensor
- vector 1-D tensor
- matrix 2-D tensor
Apply simple ReLU network
- per-neuron operation
- matrix operation
define computation flow
- import tensorflow as tf
- y= tf.matmul(x,w)
- out = tf.nn.relu(y
define tensors
- tf.Variable(tf.random_normal([3,3], name = 'w')
- y = tf.matmul(x, y)
- variables stores the state of current execution
- others are operations
session graph
- sess = tf.Session()
- results = sess.run(relu_out)
Fetch
- retrive content fro a node
- fetch the liquid
Initial Variables
Placeholder
- its content will be filled
- x = placeholder("float", [1,3])
Feed
- pump liquid into the pipe
- import numpy as np
- feed_dict = x:np.array([[1.0,2.0,3.0]])
session management
Prediction:
- softmax: make predictions for n targets that sum to 1
Prediction difference
- answer - prediction
Define loss function
- softmax_cross_entropy_with_logits
Optimization
- Gradient descent, learning rate = 0.1
Interctive update
- 計算所有的variable
Add biases
- logit = tf.matmul(relu_out, soft_max_w) + softmax_b
Add layers
- for layer in range
Visualize the graph
- TensorBoard
- tf.trains.SummaryWriter
- improve naming, improve visualization
- add name_scope to improve visualization
Add regularization to the loss
- tensor auto support gradient
- tools sublime
Use activation as bias
Residual learning
- ILSVRC 2015 classification task winer
Add summaries
Save and load models
- That's why we need to run initialization separately
Convolution Layer -Convolution2D
LSTM -BasicLSTMCell
Deep learning for text
- word embedding, 3D dense vector
- Skip-gram Text model, Know a word by company it with, JR Firth
Word2Vec (*)
- tf.nn.embedding_lookup
- tf.nn.nce_loss
Image recognition
- Inception-v3, open source to use
Tensorflow - distributed
- distribution configuration
Placement Algorithm Network optimization
- Hashing 減少 word2vec index數量
- AlexNet, VGG
- Spark 分散式運算 Spark Deep learning
- DistBelief
- Impressing: Search Ranking
- Tensorflow: data size
- text 需要較大的資料量,影像跟語音辨識
- Syndosize: music, image
Learning model with loss function