万本电子书0元读

万本电子书0元读

顶部广告

Simulation for Data Science with R电子书

售       价:¥

5人正在读 | 0人评论 9.8

作       者:Matthias Templ

出  版  社:Packt Publishing

出版时间:2016-06-01

字       数:318.2万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Harness actionable insights from your data with computational statistics and simulations using R About This Book Learn five different simulation techniques (Monte Carlo, Discrete Event Simulation, System Dynamics, Agent-Based Modeling, and Resampling) in-depth using real-world case studies A unique book that teaches you the essential and fundamental concepts in statistical modeling and simulation Who This Book Is For This book is for users who are familiar with computational methods. If you want to learn about the advanced features of R, including the computer-intense Monte-Carlo methods as well as computational tools for statistical simulation, then this book is for you. Good knowledge of R programming is assumed/required. What You Will Learn The book aims to explore advanced R features to simulate data to extract insights from your data. Get to know the advanced features of R including high-performance computing and advanced data manipulation See random number simulation used to simulate distributions, data sets, and populations Simulate close-to-reality populations as the basis for agent-based micro-, model- and design-based simulations Applications to design statistical solutions with R for solving scientific and real world problems Comprehensive coverage of several R statistical packages like boot, simPop, VIM, data.table, dplyr, parallel, StatDA, simecol, simecolModels, deSolve and many more. In Detail Data Science with R aims to teach you how to begin performing data science tasks by taking advantage of Rs powerful ecosystem of packages. R being the most widely used programming language when used with data science can be a powerful combination to solve complexities involved with varied data sets in the real world. The book will provide a computational and methodological framework for statistical simulation to the users. Through this book, you will get in grips with the software environment R. After getting to know the background of popular methods in the area of computational statistics, you will see some applications in R to better understand the methods as well as gaining experience of working with real-world data and real-world problems. This book helps uncover the large-scale patterns in complex systems where interdependencies and variation are critical. An effective simulation is driven by data generating processes that accurately reflect real physical populations. You will learn how to plan and structure a simulation project to aid in the decision-making process as well as the presentation of results. By the end of this book, you reader will get in touch with the software environment R. After getting background on popular methods in the area, you will see applications in R to better understand the methods as well as to gain experience when working on real-world data and real-world problems. Style and approach This book takes a practical, hands-on approach to explain the statistical computing methods, gives advice on the usage of these methods, and provides computational tools to help you solve common problems in statistical simulation and computer-intense methods.
目录展开

Simulation for Data Science with R

Table of Contents

Simulation for Data Science with R

Credits

About the Author

About the Reviewer

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Introduction

What is simulation and where is it applied?

Why use simulation?

Simulation and big data

Choosing the right simulation technique

Summary

References

2. R and High-Performance Computing

The R statistical environment

Basics in R

Some very basic stuff about R

Installation and updates

Help

The R workspace and the working directory

Data types

Vectors in R

Factors in R

list

data.frame

array

Missing values

Generic functions, methods, and classes

Data manipulation in R

Apply and friends with basic R

Basic data manipulation with the dplyr package

dplyr – creating a local data frame

dplyr – selecting lines

dplyr – order

dplyr – selecting columns

dplyr – uniqueness

dplyr – creating variables

dplyr – grouping and aggregates

dplyr – window functions

Data manipulation with the data.table package

data.table – variable construction

data.table – indexing or subsetting

data.table – keys

data.table – fast subsetting

data.table – calculations in groups

High performance computing

Profiling to detect computationally slow functions in code

Further benchmarking

Parallel computing

Interfaces to C++

Visualizing information

The graphics system in R

The graphics package

Warm-up example – a high-level plot

Control of graphics parameters

The ggplot2 package

References

3. The Discrepancy between Pencil-Driven Theory and Data-Driven Computational Solutions

Machine numbers and rounding problems

Example – the 64-bit representation of numbers

Convergence in the deterministic case

Example – convergence

Condition of problems

Summary

References

4. Simulation of Random Numbers

Real random numbers

Simulating pseudo random numbers

Congruential generators

Linear and multiplicative congruential generators

Lagged Fibonacci generators

More generators

Simulation of non-uniform distributed random variables

The inversion method

The alias method

Estimation of counts in tables with log-linear models

Rejection sampling

Simulating values from a normal distribution

Simulating random numbers from a Beta distribution

Truncated distributions

Metropolis - Hastings algorithm

A few words on Markov chains

The Metropolis sampler

The Gibbs sampler

The two-phase Gibbs sampler

The multiphase Gibbs sampler

Application in linear regression

The diagnosis of MCMC samples

Tests for random numbers

The evaluation of random numbers – an example of a test

Summary

References

5. Monte Carlo Methods for Optimization Problems

Numerical optimization

Gradient ascent/descent

Newton-Raphson methods

Further general-purpose optimization methods

Dealing with stochastic optimization

Simplified procedures (Star Trek, Spaceballs, and Spaceballs princess)

Metropolis-Hastings revisited

Gradient-based stochastic optimization

Summary

References

6. Probability Theory Shown by Simulation

Some basics on probability theory

Probability distributions

Discrete probability distributions

Continuous probability distributions

Winning the lottery

The weak law on large numbers

Emperor penguins and your boss

Limits and convergence of random variables

Convergence of the sample mean – weak law of large numbers

Showing the weak law of large numbers by simulation

The central limit theorem

Properties of estimators

Properties of estimators

Confidence intervals

A note on robust estimators

Summary

References

7. Resampling Methods

The bootstrap

A motivating example with odds ratios

Why the bootstrap works

A closer look at the bootstrap

The plug-in principle

Estimation of standard errors with bootstrapping

An example of a complex estimation using the bootstrap

The parametric bootstrap

Estimating bias with bootstrap

Confidence intervals by bootstrap

The jackknife

Disadvantages of the jackknife

The delete-d jackknife

Jackknife after bootstrap

Cross-validation

The classical linear regression model

The basic concept of cross validation

Classical cross validation – 70/30 method

Leave-one-out cross validation

k-fold cross validation

Summary

References

8. Applications of Resampling Methods and Monte Carlo Tests

The bootstrap in regression analysis

Motivation to use the bootstrap

The most popular but often worst method

Bootstrapping by draws from residuals

Proper variance estimation with missing values

Bootstrapping in time series

Bootstrapping in the case of complex sampling designs

Monte Carlo tests

A motivating example

The permutation test as a special kind of MC test

A Monte Carlo test for multiple groups

Hypothesis testing using a bootstrap

A test for multivariate normality

Size of the test

Power comparisons

Summary

References

9. The EM Algorithm

The basic EM algorithm

Some prerequisites

Formal definition of the EM algorithm

Introductory example for the EM algorithm

The EM algorithm by example of k-means clustering

The EM algorithm for the imputation of missing values

Summary

References

10. Simulation with Complex Data

Different kinds of simulation and software

Simulating data using complex models

A model-based simple example

A model-based example with mixtures

Model-based approach to simulate data

An example of simulating high-dimensional data

Simulating finite populations with cluster or hierarchical structures

Model-based simulation studies

Latent model example continued

A simple example of model-based simulation

A model-based simulation study

Design-based simulation

An example with complex survey data

Simulation of the synthetic population

Estimators of interest

Defining the sampling design

Using stratified sampling

Adding contamination

Performing simulations separately on different domains

Inserting missing values

Summary

References

11. System Dynamics and Agent-Based Models

Agent-based models

Dynamics in love and hate

Dynamic systems in ecological modeling

Summary

References

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部