万本电子书0元读

万本电子书0元读

顶部广告

Deep Reinforcement Learning Hands-On电子书

售       价:¥

45人正在读 | 0人评论 6.2

作       者:Maxim Lapan

出  版  社:Packt Publishing

出版时间:2018-06-21

字       数:342.6万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
This practical guide will teach you how deep learning (DL) can be used to solve complex real-world problems. About This Book ? Explore deep reinforcement learning (RL), from the first principles to the latest algorithms ? Evaluate high-profile RL methods, including value iteration, deep Q-networks, policy gradients, TRPO, PPO, DDPG, D4PG, evolution strategies and genetic algorithms ? Keep up with the very latest industry developments, including AI-driven chatbots Who This Book Is For Some fluency in Python is assumed. Basic deep learning (DL) approaches should be familiar to readers and some practical experience in DL will be helpful. This book is an introduction to deep reinforcement learning (RL) and requires no background in RL. What You Will Learn ? Understand the DL context of RL and implement complex DL models ? Learn the foundation of RL: Markov decision processes ? Evaluate RL methods including Cross-entropy, DQN, Actor-Critic, TRPO, PPO, DDPG, D4PG and others ? Discover how to deal with discrete and continuous action spaces in various environments ? Defeat Atari arcade games using the value iteration method ? Create your own OpenAI Gym environment to train a stock trading agent ? Teach your agent to play Connect4 using AlphaGo Zero ? Explore the very latest deep RL research on topics including AI-driven chatbots In Detail Recent developments in reinforcement learning (RL), combined with deep learning (DL), have seen unprecedented progress made towards training agents to solve complex problems in a human-like way. Google's use of algorithms to play and defeat the well-known Atari arcade games has propelled the field to prominence, and researchers are generating new ideas at a rapid pace. Deep Reinforcement Learning Hands-On is a comprehensive guide to the very latest DL tools and their limitations. You will evaluate methods including Cross-entropy and policy gradients, before applying them to real-world environments. Take on both the Atari set of virtual games and family favorites such as Connect4. The book provides an introduction to the basics of RL, giving you the know-how to code intelligent learning agents to take on a formidable array of practical tasks. Discover how to implement Q-learning on 'grid world' environments, teach your agent to buy and trade stocks, and find out how natural language models are driving the boom in chatbots. Style and approach Deep Reinforcement Learning Hands-On explains the art of building self-learning agents using algorithms and practical examples. Experiment with famous examples, such as Google's defeat of well-known Atari arcade games.
目录展开

Deep Reinforcement Learning Hands-On

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewers

Packt is Searching for Authors Like You

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Chapter 1. What is Reinforcement Learning?

Learning – supervised, unsupervised, and reinforcement

RL formalisms and relations

Reward

The agent

The environment

Actions

Observations

Markov decision processes

Markov process

Markov reward process

Markov decision process

Summary

Chapter 2. OpenAI Gym

The anatomy of the agent

Hardware and software requirements

OpenAI Gym API

Action space

Observation space

The environment

Creation of the environment

The CartPole session

The random CartPole agent

The extra Gym functionality – wrappers and monitors

Wrappers

Monitor

Summary

Chapter 3. Deep Learning with PyTorch

Tensors

Creation of tensors

Scalar tensors

Tensor operations

GPU tensors

Gradients

Tensors and gradients

NN building blocks

Custom layers

Final glue – loss functions and optimizers

Loss functions

Optimizers

Monitoring with TensorBoard

TensorBoard 101

Plotting stuff

Example – GAN on Atari images

Summary

Chapter 4. The Cross-Entropy Method

Taxonomy of RL methods

Practical cross-entropy

Cross-entropy on CartPole

Cross-entropy on FrozenLake

Theoretical background of the cross-entropy method

Summary

Chapter 5. Tabular Learning and the Bellman Equation

Value, state, and optimality

The Bellman equation of optimality

Value of action

The value iteration method

Value iteration in practice

Q-learning for FrozenLake

Summary

Chapter 6. Deep Q-Networks

Real-life value iteration

Tabular Q-learning

Deep Q-learning

Interaction with the environment

SGD optimization

Correlation between steps

The Markov property

The final form of DQN training

DQN on Pong

Wrappers

DQN model

Training

Running and performance

Your model in action

Summary

Chapter 7. DQN Extensions

The PyTorch Agent Net library

Agent

Agent's experience

Experience buffer

Gym env wrappers

Basic DQN

N-step DQN

Implementation

Double DQN

Implementation

Results

Noisy networks

Implementation

Results

Prioritized replay buffer

Implementation

Results

Dueling DQN

Implementation

Results

Categorical DQN

Implementation

Results

Combining everything

Implementation

Results

Summary

References

Chapter 8. Stocks Trading Using RL

Trading

Data

Problem statements and key decisions

The trading environment

Models

Training code

Results

The feed-forward model

The convolution model

Things to try

Summary

Chapter 9. Policy Gradients – An Alternative

Values and policy

Why policy?

Policy representation

Policy gradients

The REINFORCE method

The CartPole example

Results

Policy-based versus value-based methods

REINFORCE issues

Full episodes are required

High gradients variance

Exploration

Correlation between samples

PG on CartPole

Results

PG on Pong

Results

Summary

Chapter 10. The Actor-Critic Method

Variance reduction

CartPole variance

Actor-critic

A2C on Pong

A2C on Pong results

Tuning hyperparameters

Learning rate

Entropy beta

Count of environments

Batch size

Summary

Chapter 11. Asynchronous Advantage Actor-Critic

Correlation and sample efficiency

Adding an extra A to A2C

Multiprocessing in Python

A3C – data parallelism

Results

A3C – gradients parallelism

Results

Summary

Chapter 12. Chatbots Training with RL

Chatbots overview

Deep NLP basics

Recurrent Neural Networks

Embeddings

Encoder-Decoder

Training of seq2seq

Log-likelihood training

Bilingual evaluation understudy (BLEU) score

RL in seq2seq

Self-critical sequence training

The chatbot example

The example structure

Modules: cornell.py and data.py

BLEU score and utils.py

Model

Training: cross-entropy

Running the training

Checking the data

Testing the trained model

Training: SCST

Running the SCST training

Results

Telegram bot

Summary

Chapter 13. Web Navigation

Web navigation

Browser automation and RL

Mini World of Bits benchmark

OpenAI Universe

Installation

Actions and observations

Environment creation

MiniWoB stability

Simple clicking approach

Grid actions

Example overview

Model

Training code

Starting containers

Training process

Checking the learned policy

Issues with simple clicking

Human demonstrations

Recording the demonstrations

Recording format

Training using demonstrations

Results

TicTacToe problem

Adding text description

Results

Things to try

Summary

Chapter 14. Continuous Action Space

Why a continuous space?

Action space

Environments

The Actor-Critic (A2C) method

Implementation

Results

Using models and recording videos

Deterministic policy gradients

Exploration

Implementation

Results

Recording videos

Distributional policy gradients

Architecture

Implementation

Results

Things to try

Summary

Chapter 15. Trust Regions – TRPO, PPO, and ACKTR

Introduction

Roboschool

A2C baseline

Results

Videos recording

Proximal Policy Optimization

Implementation

Results

Trust Region Policy Optimization

Implementation

Results

A2C using ACKTR

Implementation

Results

Summary

Chapter 16. Black-Box Optimization in RL

Black-box methods

Evolution strategies

ES on CartPole

Results

ES on HalfCheetah

Results

Genetic algorithms

GA on CartPole

Results

GA tweaks

Deep GA

Novelty search

GA on Cheetah

Results

Summary

References

Chapter 17. Beyond Model-Free – Imagination

Model-based versus model-free

Model imperfections

Imagination-augmented agent

The environment model

The rollout policy

The rollout encoder

Paper results

I2A on Atari Breakout

The baseline A2C agent

EM training

The imagination agent

The I2A model

The Rollout encoder

Training of I2A

Experiment results

The baseline agent

Training EM weights

Training with the I2A model

Summary

References

Chapter 18. AlphaGo Zero

Board games

The AlphaGo Zero method

Overview

Monte-Carlo Tree Search

Self-play

Training and evaluation

Connect4 bot

Game model

Implementing MCTS

Model

Training

Testing and comparison

Connect4 results

Summary

References

Book summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

A

B

C

D

E

F

G

H

I

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部