售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Title Page
Copyright and Credits
Reinforcement Learning with TensorFlow
Packt Upsell
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Deep Learning – Architectures and Frameworks
Deep learning
Activation functions for deep learning
The sigmoid function
The tanh function
The softmax function
The rectified linear unit function
How to choose the right activation function
Logistic regression as a neural network
Notation
Objective
The cost function
The gradient descent algorithm
The computational graph
Steps to solve logistic regression using gradient descent
What is xavier initialization?
Why do we use xavier initialization?
The neural network model
Recurrent neural networks
Long Short Term Memory Networks
Convolutional neural networks
The LeNet-5 convolutional neural network
The AlexNet model
The VGG-Net model
The Inception model
Limitations of deep learning
The vanishing gradient problem
The exploding gradient problem
Overcoming the limitations of deep learning
Reinforcement learning
Basic terminologies and conventions
Optimality criteria
The value function for optimality
The policy model for optimality
The Q-learning approach to reinforcement learning
Asynchronous advantage actor-critic
Introduction to TensorFlow and OpenAI Gym
Basic computations in TensorFlow
An introduction to OpenAI Gym
The pioneers and breakthroughs in reinforcement learning
David Silver
Pieter Abbeel
Google DeepMind
The AlphaGo program
Libratus
Summary
Training Reinforcement Learning Agents Using OpenAI Gym
The OpenAI Gym
Understanding an OpenAI Gym environment
Programming an agent using an OpenAI Gym environment
Q-Learning
The Epsilon-Greedy approach
Using the Q-Network for real-world applications
Summary
Markov Decision Process
Markov decision processes
The Markov property
The S state set
Actions
Transition model
Rewards
Policy
The sequence of rewards - assumptions
The infinite horizons
Utility of sequences
The Bellman equations
Solving the Bellman equation to find policies
An example of value iteration using the Bellman equation
Policy iteration
Partially observable Markov decision processes
State estimation
Value iteration in POMDPs
Training the FrozenLake-v0 environment using MDP
Summary
Policy Gradients
The policy optimization method
Why policy optimization methods?
Why stochastic policy?
Example 1 - rock, paper, scissors
Example 2 - state aliased grid-world
Policy objective functions
Policy Gradient Theorem
Temporal difference rule
TD(1) rule
TD(0) rule
TD() rule
Policy gradients
The Monte Carlo policy gradient
Actor-critic algorithms
Using a baseline to reduce variance
Vanilla policy gradient
Agent learning pong using policy gradients
Summary
Q-Learning and Deep Q-Networks
Why reinforcement learning?
Model based learning and model free learning
Monte Carlo learning
Temporal difference learning
On-policy and off-policy learning
Q-learning
The exploration exploitation dilemma
Q-learning for the mountain car problem in OpenAI gym
Deep Q-networks
Using a convolution neural network instead of a single layer neural network
Use of experience replay
Separate target network to compute the target Q-values
Advancements in deep Q-networks and beyond
Double DQN
Dueling DQN
Deep Q-network for mountain car problem in OpenAI gym
Deep Q-network for Cartpole problem in OpenAI gym
Deep Q-network for Atari Breakout in OpenAI gym
The Monte Carlo tree search algorithm
Minimax and game trees
The Monte Carlo Tree Search
The SARSA algorithm
SARSA algorithm for mountain car problem in OpenAI gym
Summary
Asynchronous Methods
Why asynchronous methods?
Asynchronous one-step Q-learning
Asynchronous one-step SARSA
Asynchronous n-step Q-learning
Asynchronous advantage actor critic
A3C for Pong-v0 in OpenAI gym
Summary
Robo Everything – Real Strategy Gaming
Real-time strategy games
Reinforcement learning and other approaches
Online case-based planning
Drawbacks to real-time strategy games
Why reinforcement learning?
Reinforcement learning in RTS gaming
Deep autoencoder
How is reinforcement learning better?
Summary
AlphaGo – Reinforcement Learning at Its Best
What is Go?
Go versus chess
How did DeepBlue defeat Gary Kasparov?
Why is the game tree approach no good for Go?
AlphaGo – mastering Go
Monte Carlo Tree Search
Architecture and properties of AlphaGo
Energy consumption analysis – Lee Sedol versus AlphaGo
AlphaGo Zero
Architecture and properties of AlphaGo Zero
Training process in AlphaGo Zero
Summary
Reinforcement Learning in Autonomous Driving
Machine learning for autonomous driving
Reinforcement learning for autonomous driving
Creating autonomous driving agents
Why reinforcement learning ?
Proposed frameworks for autonomous driving
Spatial aggregation
Sensor fusion
Spatial features
Recurrent temporal aggregation
Planning
DeepTraffic – MIT simulator for autonomous driving
Summary
Financial Portfolio Management
Introduction
Problem definition
Data preparation
Reinforcement learning
Further improvements
Summary
Reinforcement Learning in Robotics
Reinforcement learning in robotics
Evolution of reinforcement learning
Challenges in robot reinforcement learning
High dimensionality problem
Real-world challenges
Issues due to model uncertainty
What's the final objective a robot wants to achieve?
Open questions and practical challenges
Open questions
Practical challenges for robotic reinforcement learning
Key takeaways
Summary
Deep Reinforcement Learning in Ad Tech
Computational advertising challenges and bidding strategies
Business models used in advertising
Sponsored-search advertisements
Search-advertisement management
Adwords
Bidding strategies of advertisers
Real-time bidding by reinforcement learning in display advertising
Summary
Reinforcement Learning in Image Processing
Hierarchical object detection with deep reinforcement learning
Related works
Region-based convolution neural networks
Spatial pyramid pooling networks
Fast R-CNN
Faster R-CNN
You Look Only Once
Single Shot Detector
Hierarchical object detection model
State
Actions
Reward
Model and training
Training specifics
Summary
Deep Reinforcement Learning in NLP
Text summarization
Deep reinforced model for Abstractive Summarization
Neural intra-attention model
Intra-temporal attention on input sequence while decoding
Intra-decoder attention
Token generation and pointer
Hybrid learning objective
Supervised learning with teacher forcing
Policy learning
Mixed training objective function
Text question answering
Mixed objective and deep residual coattention for Question Answering
Deep residual coattention encoder
Mixed objective using self-critical policy learning
Summary
Further topics in Reinforcement Learning
Continuous action space algorithms
Trust region policy optimization
Deterministic policy gradients
Scoring mechanism in sequential models in NLP
BLEU
What is BLEU score and what does it do?
ROUGE
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜