万本电子书0元读

万本电子书0元读

顶部广告

Pentaho Data Integration Cookbook Second Edition电子书

售       价:¥

3人正在读 | 0人评论 9.8

作       者:Alex Meadows

出  版  社:Packt Publishing

出版时间:2013-12-02

字       数:370.1万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
Pentaho Data Integration Cookbook Second Edition is written in a cookbook format, presenting examples in the style of recipes.This allows you to go directly to your topic of interest, or follow topics throughout a chapter to gain a thorough in-depth knowledge.Pentaho Data Integration Cookbook Second Edition is designed for developers who are familiar with the basics of Kettle but who wish to move up to the next level.It is also aimed at advanced users that want to learn how to use the new features of PDI as well as and best practices for working with Kettle.
目录展开

Pentaho Data Integration Cookbook Second Edition

Table of Contents

Pentaho Data Integration Cookbook Second Edition

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Working with Databases

Introduction

Sample databases

Pentaho BI platform databases

Connecting to a database

Getting ready

How to do it...

How it works...

There's more...

Avoiding creating the same database connection over and over again

Avoiding modifying jobs and transformations every time a connection changes

Specifying advanced connection properties

Connecting to a database not supported by Kettle

Checking the database connection at runtime

Getting data from a database

Getting ready

How to do it...

How it works...

There's more...

See also

Getting data from a database by providing parameters

Getting ready

How to do it...

How it works...

There's more...

Parameters coming in more than one row

Executing the SELECT statement several times, each for a different set of parameters

See also

Getting data from a database by running a query built at runtime

Getting ready

How to do it...

How it works...

There's more...

See also

Inserting or updating rows in a table

Getting ready

How to do it...

How it works...

There's more...

Alternative solution if you just want to insert records

Alternative solution if you just want to update rows

Alternative way for inserting and updating

See also

Inserting new rows where a simple primary key has to be generated

Getting ready

How to do it...

How it works...

There's more...

Using the Combination lookup/update for looking up

See also

Inserting new rows where the primary key has to be generated based on stored values

Getting ready

How to do it...

How it works...

There's more...

See also

Deleting data from a table

Getting ready

How to do it...

How it works...

See also

Creating or altering a database table from PDI (design time)

Getting ready

How to do it...

How it works...

There's more...

See also

Creating or altering a database table from PDI (runtime)

How to do it...

How it works...

There's more...

See also

Inserting, deleting, or updating a table depending on a field

Getting ready

How to do it...

How it works...

There's more...

Insert, update, and delete all-in-one

Synchronizing after merge

See also

Changing the database connection at runtime

Getting ready

How to do it...

How it works...

There's more...

See also

Loading a parent-child table

Getting ready

How to do it...

How it works...

See also

Building SQL queries via database metadata

Getting ready

How to do It...

How it works...

See also

Performing repetitive database design tasks from PDI

Getting ready

How to do It...

How it works...

See also

2. Reading and Writing Files

Introduction

Reading a simple file

Getting ready

How to do it...

How it works...

There's more...

Alternative notation for a separator

About file format and encoding

About data types and formats

Altering the names, order, or metadata of the fields coming from the file

Reading files with fixed width fields

Reading several files at the same time

Getting ready

How to do it...

How it works...

There's more...

Reading semi-structured files

Getting ready

How to do it...

How it works...

There's more...

Master/detail files

Logfiles

See also

Reading files having one field per row

Getting ready

How to do it...

How it works...

There's more...

See also

Reading files with some fields occupying two or more rows

Getting ready

How to do it...

How it works...

See also

Writing a simple file

Getting ready

How to do it...

How it works...

There's more...

Changing headers

Giving the output fields a format

Writing a semi-structured file

Getting ready

How to do it...

How it works...

There's more...

Providing the name of a file (for reading or writing) dynamically

Getting ready

How to do it...

How it works...

There's more...

Get System Info

Generating several files simultaneously with the same structure, but different names

Using the name of a file (or part of it) as a field

Getting ready

How to do it...

How it works...

Reading an Excel file

Getting ready

How to do it...

How it works...

See also

Getting the value of specific cells in an Excel file

Getting ready

How to do it...

How it works...

There's more...

Looking for a given cell

Writing an Excel file with several sheets

Getting ready

How to do it...

How it works...

There's more...

See also

Writing an Excel file with a dynamic number of sheets

Getting ready

How to do it...

How it works...

See also

Reading data from an AWS S3 Instance

Getting ready

How to do it...

How it works...

See also

3. Working with Big Data and Cloud Sources

Introduction

Loading data into Salesforce.com

Getting ready

How to do it...

How it works...

See also

Getting data from Salesforce.com

Getting ready

How to do it...

How it works...

See also

Loading data into Hadoop

Getting ready

How to do it...

How it works...

There's more...

See also

Getting data from Hadoop

Getting ready

How to do it...

How it works...

See also

Loading data into HBase

Getting ready

How to do it...

How it works...

There's more...

See also

Getting data from HBase

Getting ready

How to do it...

How it works...

See also

Loading data into MongoDB

Getting ready

How to do it...

How it works...

See also

Getting data from MongoDB

Getting ready

How to do it...

How it works...

See also

4. Manipulating XML Structures

Introduction

Reading simple XML files

Getting ready

How to do it...

How it works...

There's more...

XML data in a field

XML file name in a field

See also

Specifying fields by using the Path notation

Getting ready

How to do it...

How it works...

There's more...

Getting data from a different path

Getting data selectively

Getting more than one node when the nodes share their Path notation

Saving time when specifying Path

Validating well-formed XML files

Getting ready

How to do it...

How it works...

See also

Validating an XML file against DTD definitions

Getting ready

How to do it...

How it works...

There's more...

See also

Validating an XML file against an XSD schema

Getting ready

How to do it...

How it works...

There's more...

See also

Generating a simple XML document

Getting ready

How to do it...

How it works...

There's more...

Generating fields with XML structures

See also

Generating complex XML structures

Getting ready

How to do it...

How it works...

See also

Generating an HTML page using XML and XSL transformations

Getting ready

How to do it...

How it works...

There's more...

See also

Reading an RSS Feed

Getting ready

How to do it...

How it works...

See also

Generating an RSS Feed

Getting ready

How to do it...

How it works

There's more...

See also

5. File Management

Introduction

Copying or moving one or more files

Getting ready

How to do it...

How it works...

There's more...

Moving files

Detecting the existence of the files before copying them

Creating folders

See also

Deleting one or more files

Getting ready

How to do it...

How it works...

There's more...

Figuring out which files have been deleted

See also

Getting files from a remote server

How to do it...

How it works...

There's more...

Specifying files to transfer

Some considerations about connecting to an FTP server

Access via SFTP

Access via FTPS

Getting information about the files being transferred

See also

Putting files on a remote server

Getting ready

How to do it...

How it works...

There's more...

See also

Copying or moving a custom list of files

Getting ready

How to do it...

How it works...

See also

Deleting a custom list of files

Getting ready

How to do it...

How it works...

See also

Comparing files and folders

Getting ready

How to do it...

How it works...

There's more...

Comparing folders

Working with ZIP files

Getting ready

How to do it...

How it works...

There's more...

Avoiding zipping files

Avoiding unzipping files

See also

Encrypting and decrypting files

Getting ready

How to do it...

How it works...

There's more...

See also

6. Looking for Data

Introduction

Looking for values in a database table

Getting ready

How to do it...

How it works...

There's more...

Taking some action when the lookup fails

Taking some action when there are too many results

Looking for non-existent data

See also

Looking for values in a database with complex conditions

Getting ready

How to do it...

How it works...

There's more...

See also

Looking for values in a database with dynamic queries

Getting ready

How to do it...

How it works...

There's more...

See also

Looking for values in a variety of sources

Getting ready

How to do it...

How it works...

There's more...

Looking for alternatives when the Stream Lookup step doesn't meet your needs

Speeding up your transformation

Using the Value Mapper step for looking up from a short list of values

See also

Looking for values by proximity

Getting ready

How to do it...

How it works...

There's more...

Looking for values by using a web service

Getting ready

How to do it...

How it works...

There's more...

See also

Looking for values over intranet or the Internet

Getting ready

How to do it...

How it works...

There's more...

See also

Validating data at runtime

Getting ready

How to do it...

How it works...

There's more...

See also

7. Understanding and Optimizing Data Flows

Introduction

Splitting a stream into two or more streams based on a condition

Getting ready

How to do it...

How it works...

There's more...

Avoiding the use of Dummy steps

Comparing against the value of a Kettle variable

Avoiding the use of nested Filter rows steps

Overcoming the difficulties of complex conditions

Merging rows of two streams with the same or different structures

Getting ready

How to do it...

How it works...

There's more...

Making sure that the metadata of the streams is the same

Telling Kettle how to merge the rows of your streams

See also

Adding checksums to verify datasets

Getting ready

How to do it...

How it works...

Comparing two streams and generating differences

Getting ready

How to do it...

How it works...

There's more...

Using the differences to keep a table up-to-date

See also

Generating all possible pairs formed from two datasets

How to do it...

How it works...

There's more...

Getting variables in the middle of the stream

Limiting the number of output rows

See also

Joining two or more streams based on given conditions

Getting ready

How to do it...

How it works...

There's more...

See also

Interspersing new rows between existent rows

Getting ready

How to do it...

How it works...

See also

Executing steps even when your stream is empty

Getting ready

How to do it...

How it works...

There's more...

Processing rows differently based on the row number

Getting ready

How to do it...

How it works...

There's more...

Identifying specific rows

Identifying the last row in the stream

Avoiding using an Add sequence step to enumerate the rows

See also

Processing data into shared transformations via filter criteria and subtransformations

Getting ready

How to do it...

How it works...

See also

Altering a data stream with Select values

How to do it...

How it works...

Processing multiple jobs or transformations in parallel

How to do it...

How it works...

See also

8. Executing and Re-using Jobs and Transformations

Introduction

Sample transformations

Sample transformation – hello

Sample transformation – random list

Sample transformation – sequence

Sample transformation – file list

Launching jobs and transformations

How to do it...

How it works...

Executing a job or a transformation by setting static arguments and parameters

Getting ready

How to do it...

How it works...

There's more...

See also

Executing a job or a transformation from a job by setting arguments and parameters dynamically

Getting ready

How to do it...

How it works...

There's more...

See also

Executing a job or a transformation whose name is determined at runtime

Getting ready

How to do it...

How it works...

There's more...

See also

Executing part of a job once for every row in a dataset

Getting ready

How to do it...

How it works...

There's more...

Accessing the copied rows from jobs, transformations, and other entries

Executing a transformation once for every row in a dataset

Executing a transformation or part of a job once for every file in a list of files

See also

Executing part of a job several times until a condition is true

Getting ready

How to do it...

How it works...

There's more...

Implementing loops in a job

Using the JavaScript step to control the execution of the entries in your job

See also

Creating a process flow

Getting ready

How to do it...

How it works...

There's more...

Serializing/De-serializing data

Other means for transferring or sharing data between transformations

Moving part of a transformation to a subtransformation

Getting ready

How to do it...

How it works...

There's more...

Using Metadata Injection to re-use transformations

Getting ready

How to do it...

How it works...

There's more...

9. Integrating Kettle and the Pentaho Suite

Introduction

A sample transformation

Creating a Pentaho report with data coming from PDI

Getting ready

How to do it...

How it works...

There's more...

Creating a Pentaho report directly from PDI

Getting ready

How to do it...

How it works...

There's more...

See also

Configuring the Pentaho BI Server for running PDI jobs and transformations

Getting ready

How to do it...

How it works...

There's more...

See also

Executing a PDI transformation as part of a Pentaho process

Getting ready

How to do it...

How it works...

There's more...

Specifying the location of the transformation

Supplying values for named parameters, variables and arguments

Keeping things simple when it's time to deliver a plain file

See also

Executing a PDI job from the Pentaho User Console

Getting ready

How to do it...

How it works...

There's more...

See also

Generating files from the PUC with PDI and the CDA plugin

Getting ready

How to do it...

How it works...

There's more...

Populating a CDF dashboard with data coming from a PDI transformation

Getting ready

How to do it...

How it works...

There's more...

10. Getting the Most Out of Kettle

Introduction

Sending e-mails with attached files

Getting ready

How to do it...

How it works...

There's more...

Sending logs through an e-mail

Sending e-mails in a transformation

Generating a custom logfile

Getting ready

How to do it...

How it works...

There's more...

Filtering the logfile

Creating a clean logfile

Isolating logfiles for different jobs or transformations

See also

Running commands on another server

Getting ready

How to do it...

How it works...

See also

Programming custom functionality

Getting ready

How to do it...

How it works...

There's more...

Data type's equivalence

Generalizing your UDJC code

Looking up information with additional steps

Customizing logs

Scripting alternatives to the UDJC step

Generating sample data for testing purposes

How to do it...

How it works...

There's more...

Using a Data grid step to generate specific data

Working with subsets of your data

See also

Working with JSON files

Getting ready

How to do it...

How it works...

There's more...

Reading JSON files dynamically

Writing JSON files

Getting information about transformations and jobs (file-based)

Getting ready

How to do it...

How it works...

There's more...

Job XML nodes

Steps and entries information

See also

Getting information about transformations and jobs (repository-based)

Getting ready

How to do it...

How it works...

There's more...

Transformation tables

Job tables

Database connections tables

Using Spoon's built-in optimization tools

Getting ready

How to do it...

How it works...

There's more...

11. Utilizing Visualization Tools in Kettle

Introduction

Managing plugins with the Marketplace

Getting ready

How to do it...

How it works...

There's more...

See also

Data profiling with DataCleaner

Getting ready

How to do it...

How it works...

There's more...

See also

Visualizing data with AgileBI

Getting ready

How to do it...

How it works...

There's more...

See also

Using Instaview to analyze and visualize data

Getting ready

How to do it...

How it works...

There's more...

See also

12. Data Analytics

Introduction

Reading data from a SAS datafile

Why read a SAS file?

Getting ready

How to do it...

How it works...

See also

Studying data via stream statistics

Getting ready

How to do it...

How it works...

See also

Building a random data sample for Weka

Getting ready

How to do it...

How it works...

There's more...

See also

A. Data Structures

Books data structure

Books

Authors

museums data structure

museums

cities

outdoor data structure

products

categories

Steel Wheels data structure

Lahman Baseball Database

B. References

Books

Online

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部