售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
About Packt
Why subscribe?
Packt.com
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1: Q-Learning: A Roadmap
Brushing Up on Reinforcement Learning Concepts
What is RL?
States and actions
The decision-making process
RL, supervised learning, and unsupervised learning
States, actions, and rewards
States
Actions and rewards
Bellman equations
Key concepts in RL
Value-based versus policy-based iteration
Q-learning hyperparameters – alpha, gamma, and epsilon
Alpha – deterministic versus stochastic environments
Gamma – current versus future rewards
Epsilon – exploration versus exploitation
Decaying epsilon
SARSA versus Q-learning – on-policy or off?
SARSA and the cliff-walking problem
When to choose SARSA over Q-learning
Summary
Questions
Getting Started with the Q-Learning Algorithm
Technical requirements
Demystifying MDPs
Control processes
Markov chains
The Markov property
MDPs and state-action diagrams
Solving MDPs with RL
Your Q-learning agent in its environment
Solving the optimization problem
States and actions in Taxi-v2
Fine-tuning your model – learning, discount, and exploration rates
Decaying epsilon
Decaying alpha
Decaying gamma
MABP – a classic exploration versus exploitation problem
Setting up a bandit problem
Bandit optimization strategies
Other applications for bandit problems
Optimal versus safe paths – revisiting SARSA
Summary
Questions
Setting Up Your First Environment with OpenAI Gym
Technical requirements
Getting started with OpenAI Gym
What is Gym?
Setting up Gym
Gym environments
Setting up an environment
Exploring the Taxi-v2 environment
The state space and valid actions
Choosing an action manually
Setting a state manually
Creating a baseline agent
Stepping through actions
Creating a task loop
Baseline models in Q-learning and machine learning research
Summary
Questions
Teaching a Smartcab to Drive Using Q-Learning
Technical requirements
Getting to know your learning agent
Implementing your agent
The value function – calculating the Q-value of a state-action pair
Implementing Bellman equations
The learning parameters – alpha, gamma, and epsilon
Adding an updated alpha value
Adding an updated epsilon value
Model-tuning and tracking your agent's long-term performance
Comparing your models and statistical performance measures
Training your models
Decaying epsilon
Hyperparameter tuning
Summary
Questions
Section 2: Building and Optimizing Q-Learning Agents
Building Q-Networks with TensorFlow
Technical requirements
A brief overview of neural networks
Extensional versus intensional definitions
Taking a closer look
Input, hidden, and output layers
Perceptron functions
ReLU functions
Implementing a neural network with NumPy
Feedforward
Backpropagation
Neural networks and Q-learning
Policy agents versus value agents
Building your first Q-network
Defining the network
Training the network
Summary
Questions
Further reading
Digging Deeper into Deep Q-Networks with Keras and TensorFlow
Technical requirements
Introducing CartPole-v1
More about CartPole states and actions
Getting started with the CartPole task
Building a DQN to solve the CartPole problem
Gamma
Alpha
Epsilon
Building a DQN class
Choosing actions with epsilon-greedy
Updating the Q-values
Running the task loop
Testing and results
Adding in experience replay
About experience replay
Implementation
Experience replay results
Building further on DQNs
Calculating DQN loss
Fixed Q-targets
Double-deep Q-networks
Dueling deep Q-networks
Summary
Questions
Further reading
Section 3: Advanced Q-Learning Challenges with Keras, TensorFlow, and OpenAI Gym
Decoupling Exploration and Exploitation in Multi-Armed Bandits
Technical requirements
Probability distributions and ongoing knowledge
Iterative probability distributions
Revisiting a simple bandit problem
A sample two-armed bandit iteration
Multi-armed bandit strategy overview
Greedy strategy
Epsilon-greedy strategy
Upper confidence bound
Bandit regret
Utility functions and optimal decisions
Contextual bandits and state diagrams
Thompson sampling and the Bayesian control rule
Thompson sampling
Bayesian control rule
Solving a multi-armed bandit problem in Python – user advertisement clicks
Epsilon-greedy selection
Multi-armed bandits in experimental design
The testing process
Bandits with knapsacks – more multi-armed bandit applications
Summary
Questions
Further reading
Further Q-Learning Research and Future Projects
Google's DeepMind and the future of Q-learning
OpenAI Gym and RL research
The standardization of RL research practice with Gym
Tracking your scores with the Gym leaderboard
More OpenAI Gym environments
Pendulum
Acrobot
MountainCar
Continuous control tasks – MuJoCo
Continuous control tasks – Box2D
Robotics research and development
Algorithms
Toy text
Contextual bandits and probability distributions
Probability and intelligence
Updating probability distributions
State spaces
A/B testing versus multi-armed bandit testing
Testing methodologies
Summary
Questions
Further reading
Assessments
Chapter 1, Brushing Up on Reinforcement Learning Concepts
Chapter 2, Getting Started with the Q-Learning Algorithm
Chapter 3, Setting Up Your First Environment with OpenAI Gym
Chapter 4, Teaching a Smartcab to Drive Using Q-Learning
Chapter 5, Building Q-Networks with TensorFlow
Chapter 6, Digging Deeper into Deep Q-Networks with Keras and TensorFlow
Chapter 7, Decoupling Exploration and Exploitation in Multi-Armed Bandits
Chapter 8, Further Q-Learning Research and Future Projects
Other Books You May Enjoy
Leave a review - let other readers know what you think
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜