万本电子书0元读

万本电子书0元读

顶部广告

Pentaho Data Integration 4 Cookbook电子书

售       价:¥

2人正在读 | 0人评论 9.8

作       者:Adrian Sergio Pulvirenti

出  版  社:Packt Publishing

出版时间:2011-06-23

字       数:301.1万

所属分类: 进口书 > 外文原版书 > 电脑/网络

温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印

为你推荐

  • 读书简介
  • 目录
  • 累计评论(0条)
  • 读书简介
  • 目录
  • 累计评论(0条)
This book has step-by-step instructions to solve data manipulation problems using PDI in the form of recipes. It has plenty of well-organized tips, screenshots, tables, and examples to aid quick and easy understanding. If you are a software developer or anyone involved or interested in developing ETL solutions, or in general, doing any kind of data manipulation, this book is for you. It does not cover PDI basics, SQL basics, or database concepts. You are expected to have a basic understanding of the PDI tool, SQL language, and databases.
目录展开

Pentaho Data Integration 4 Cookbook

Table of Contents

Pentaho Data Integration 4 Cookbook

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Working with Databases

Introduction

Sample databases

Pentaho BI platform databases

Connecting to a database

Getting ready

How to do it...

How it works...

There's more...

Avoiding creating the same database connection over and over again

Avoiding modifying jobs and transformations every time a connection changes

Specifying advanced connection properties

Connecting to a database not supported by Kettle

Checking the database connection at run-time

Getting data from a database

Getting ready

How to do it...

How it works...

There's more...

See also

Getting data from a database by providing parameters

Getting ready

How to do it...

How it works...

There's more...

Parameters coming in more than one row

Executing the SELECT statement several times, each for a different set of parameters

See also

Getting data from a database by running a query built at runtime

Getting ready

How to do it...

How it works...

There's more...

See also

Inserting or updating rows in a table

Getting ready

How to do it...

How it works...

There's more...

Alternative solution if you just want to insert records

Alternative solution if you just want to update rows

Alternative way for inserting and updating

See also

Inserting new rows where a simple primary key has to be generated

Getting ready

How to do it...

How it works...

There's more...

Using the Combination lookup/update for looking up

See also

Inserting new rows where the primary key has to be generated based on stored values

Getting ready

How to do it...

How it works...

There's more...

See also

Deleting data from a table

Getting ready

How to do it...

How it works...

See also

Creating or altering a database table from PDI (design time)

Getting ready

How to do it...

How it works...

There's more...

See also

Creating or altering a database table from PDI (runtime)

How to do it...

How it works...

There's more...

See also

Inserting, deleting, or updating a table depending on a field

Getting ready

How to do it...

How it works...

There's more...

Insert, update, and delete all-in-one

Synchronizing after merge

See also

Changing the database connection at runtime

Getting ready

How to do it...

How it works...

There's more...

See also

Loading a parent-child table

Getting ready

How to do it...

How it works...

See also

2. Reading and Writing Files

Introduction

Reading a simple file

Getting ready

How to do it...

How it works...

There's more...

Alternative notation for a separator

About file format and encoding

About data types and formats

Altering the names, order, or metadata of the fields coming from the file

Reading files with fixed width fields

Reading several files at the same time

Getting ready

How to do it...

How it works...

There's more...

Reading unstructured files

Getting ready

How to do it...

How it works...

There's more...

Master/detail files

Log files

Reading files having one field by row

Getting ready

How to do it...

How it works...

There's more...

See also

Reading files with some fields occupying two or more rows

Getting ready

How to do it...

How it works...

See also

Writing a simple file

Getting ready

How to do it...

How it works...

There's more...

Changing headers

Giving the output fields a format

Writing an unstructured file

Getting ready

How to do it...

How it works...

There's more...

Providing the name of a file (for reading or writing) dynamically

Getting ready

How to do it...

How it works...

There's more...

Get System Info

Generating several files simultaneously with the same structure, but different names

Using the name of a file (or part of it) as a field

Getting ready

How to do it...

How it works...

Reading an Excel file

Getting ready

How to do it...

How it works...

See also

Getting the value of specific cells in an Excel file

Getting ready

How to do it...

How it works...

There's more...

Labels and values horizontally arranged

Looking for a given cell

Writing an Excel file with several sheets

Getting ready

How to do it...

How it works...

There's more...

See also

Writing an Excel file with a dynamic number of sheets

Getting ready

How to do it...

How it works...

See also

3. Manipulating XML Structures

Introduction

Reading simple XML files

Getting ready

How to do it...

How it works...

There's more...

XML data in a field

XML file name in a field

ECMAScript for XML

See also

Specifying fields by using XPath notation

Getting ready

How to do it...

How it works...

There's more...

Getting data from a different path

Getting data selectively

Getting more than one node when the nodes share their XPath notation

Saving time when specifying XPath

Validating well-formed XML files

Getting ready

How to do it...

How it works...

See also

Validating an XML file against DTD definitions

Getting ready

How to do it...

How it works...

There's more...

See also

Validating an XML file against an XSD schema

Getting ready

How to do it...

How it works...

There's more...

See also

Generating a simple XML document

Getting ready

How to do it...

How it works...

There's more...

Generating fields with XML structures

See also

Generating complex XML structures

Getting ready

How to do it...

How it works...

See also

Generating an HTML page using XML and XSL transformations

Getting ready

How to do it...

How it works...

There's more...

See also

4. File Management

Introduction

Copying or moving one or more files

Getting ready

How to do it...

How it works...

There's more...

Moving files

Detecting the existence of the files before copying them

Creating folders

See also

Deleting one or more files

Getting ready

How to do it...

How it works...

There's more...

Figuring out which files have been deleted

See also

Getting files from a remote server

Getting ready

How to do it...

How it works...

There's more...

Specifying files to transfer

Some considerations about connecting to an FTP server

Access via SFTP

Access via FTPS

Getting information about the files being transferred

See also

Putting files on a remote server

Getting ready

How to do it...

How it works...

There's more...

See also

Copying or moving a custom list of files

Getting ready

How to do it...

How it works...

See also

Deleting a custom list of files

Getting ready

How to do it...

How it works...

See also

Comparing files and folders

Getting ready

How to do it...

How it works...

There's more...

Comparing folders

Working with ZIP files

Getting ready

How to do it...

How it works...

There's more...

Avoiding zipping files

Avoiding unzipping files

See also

5. Looking for Data

Introduction

Looking for values in a database table

Getting ready

How to do it...

How it works...

There's more...

Taking some action when the lookup fails

Taking some action when there are too many results

Looking for non-existent data

See also

Looking for values in a database (with complex conditions or multiple tables involved)

Getting ready

How to do it...

How it works...

There's more...

See also

Looking for values in a database with extreme flexibility

Getting ready

How to do it...

How it works...

There's more...

See also

Looking for values in a variety of sources

Getting ready

How to do it...

How it works...

There's more...

Looking for alternatives when the Stream Lookup step doesn't meet your needs

Speeding up your transformation

Using the Value Mapper step for looking up from a short list of values

See also

Looking for values by proximity

Getting ready

How to do it...

How it works...

There's more...

Looking for values consuming a web service

Getting ready

How to do it...

How it works...

There's more...

See also

Looking for values over an intranet or Internet

Getting ready

How to do it...

How it works...

There's more...

See also

6. Understanding Data Flows

Introduction

Splitting a stream into two or more streams based on a condition

Getting ready

How to do it...

How it works...

There's more...

Avoiding the use of Dummy steps

Comparing against the value of a Kettle variable

Avoiding the use of nested Filter Rows steps

Overcoming the difficulties of complex conditions

Merging rows of two streams with the same or different structures

Getting ready

How to do it...

How it works...

There's more...

Making sure that the metadata of the streams is the same

Telling Kettle how to merge the rows of your streams

See also

Comparing two streams and generating differences

Getting ready

How to do it...

How it works...

There's more...

Using the differences to keep a table up to date

See also

Generating all possible pairs formed from two datasets

How to do it...

How it works...

There's more...

Getting variables in the middle of the stream

Limiting the number of output rows

See also

Joining two or more streams based on given conditions

Getting ready

How to do it...

How it works...

There's more...

See also

Interspersing new rows between existent rows

Getting ready

How to do it...

How it works...

See also

Executing steps even when your stream is empty

Getting ready

How to do it...

How it works...

There's more...

Processing rows differently based on the row number

Getting ready

How to do it...

How it works...

There's more...

Identifying specific rows

Identifying the last row in the stream

Avoiding using an Add sequence step to enumerate the rows

See also

7. Executing and Reusing Jobs and Transformations

Introduction

Sample transformations

Sample transformation: Hello

Sample transformation: Random list

Sample transformation: Sequence

Sample transformation: File list

Launching jobs and transformations

Executing a job or a transformation by setting static arguments and parameters

Getting ready

How to do it...

How it works...

There's more...

See also

Executing a job or a transformation from a job by setting arguments and parameters dynamically

Getting ready

How to do it...

How it works...

There's more...

See also

Executing a job or a transformation whose name is determined at runtime

Getting ready

How to do it...

How it works...

There's more...

See also

Executing part of a job once for every row in a dataset

Getting ready

How to do it...

How it works...

There's more...

Accessing the copied rows from jobs, transformations, and other entries

Executing a transformation once for every row in a dataset

Executing a transformation or part of a job once for every file in a list of files

See also

Executing part of a job several times until a condition is true

Getting ready

How to do it...

How it works...

There's more...

Implementing loops in a job

Using the JavaScript step to control the execution of the entries in your job

See also

Creating a process flow

Getting ready

How to do it...

How it works...

There's more...

Serializing/De-serializing data

Other means for transferring or sharing data between transformations

Moving part of a transformation to a subtransformation

Getting ready

How to do it...

How it works...

There's more...

8. Integrating Kettle and the Pentaho Suite

Introduction

A sample transformation

Creating a Pentaho report with data coming from PDI

Getting ready

How to do it...

How it works...

There's more...

Configuring the Pentaho BI Server for running PDI jobs and transformations

Getting ready

How to do it...

How it works...

There's more...

See also

Executing a PDI transformation as part of a Pentaho process

Getting ready

How to do it...

How it works...

There's more...

Specifying the location of the transformation

Supplying values for named parameters, variables and arguments

Keeping things simple when it's time to deliver a plain file

See also

Executing a PDI job from the Pentaho User Console

Getting ready

How to do it...

How it works...

There's more...

See also

Generating files from the PUC with PDI and the CDA plugin

Getting ready

How to do it...

How it works...

There's more...

Populating a CDF dashboard with data coming from a PDI transformation

Getting ready

How to do it...

How it works...

There's more...

See also

9. Getting the Most Out of Kettle

Introduction

Sending e-mails with attached files

Getting ready

How to do it...

How it works...

There's more...

Sending logs through an e-mail

Sending e-mails in a transformation

Generating a custom log file

Getting ready

How to do it...

How it works...

There's more...

Filtering the log file

Creating a clean log file

Isolating log files for different jobs or transformations

See also

Programming custom functionality

Getting ready

How to do it...

How it works...

There's more...

Data type's equivalence

Generalizing you code

Looking up information with additional steps

Customizing logs

Scripting alternatives to the UDJC step

Generating sample data for testing purposes

How to do it...

How it works...

There's more...

Using Data grid step to generate specific data

Working with subsets of your data

See also

Working with Json files

Getting ready

How to do it...

How it works...

There's more...

Reading Json files dynamically

Writing Json files

Getting information about transformations and jobs (file-based)

Getting ready

How to do it...

How it works...

There's more...

Transformation XML nodes

Job XML nodes

Steps and entries information

See also

Getting information about transformations and jobs (repository-based)

Getting ready

How to do it...

How it works...

There's more...

Transformation tables

Job tables

Database connections tables

A. Data Structures

Book's data structure

Books

Authors

Museum's data structure

Museums

Cities

Outdoor data structure

Products

Categories

Steel Wheels structure

Index

累计评论(0条) 0个书友正在讨论这本书 发表评论

发表评论

发表评论,分享你的想法吧!

买过这本书的人还买过

读了这本书的人还在读

回顶部