售 价:¥
温馨提示:数字商品不支持退换货,不提供源文件,不支持导出打印
为你推荐
Pentaho Data Integration Cookbook Second Edition
Table of Contents
Pentaho Data Integration Cookbook Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Working with Databases
Introduction
Sample databases
Pentaho BI platform databases
Connecting to a database
Getting ready
How to do it...
How it works...
There's more...
Avoiding creating the same database connection over and over again
Avoiding modifying jobs and transformations every time a connection changes
Specifying advanced connection properties
Connecting to a database not supported by Kettle
Checking the database connection at runtime
Getting data from a database
Getting ready
How to do it...
How it works...
There's more...
See also
Getting data from a database by providing parameters
Getting ready
How to do it...
How it works...
There's more...
Parameters coming in more than one row
Executing the SELECT statement several times, each for a different set of parameters
See also
Getting data from a database by running a query built at runtime
Getting ready
How to do it...
How it works...
There's more...
See also
Inserting or updating rows in a table
Getting ready
How to do it...
How it works...
There's more...
Alternative solution if you just want to insert records
Alternative solution if you just want to update rows
Alternative way for inserting and updating
See also
Inserting new rows where a simple primary key has to be generated
Getting ready
How to do it...
How it works...
There's more...
Using the Combination lookup/update for looking up
See also
Inserting new rows where the primary key has to be generated based on stored values
Getting ready
How to do it...
How it works...
There's more...
See also
Deleting data from a table
Getting ready
How to do it...
How it works...
See also
Creating or altering a database table from PDI (design time)
Getting ready
How to do it...
How it works...
There's more...
See also
Creating or altering a database table from PDI (runtime)
How to do it...
How it works...
There's more...
See also
Inserting, deleting, or updating a table depending on a field
Getting ready
How to do it...
How it works...
There's more...
Insert, update, and delete all-in-one
Synchronizing after merge
See also
Changing the database connection at runtime
Getting ready
How to do it...
How it works...
There's more...
See also
Loading a parent-child table
Getting ready
How to do it...
How it works...
See also
Building SQL queries via database metadata
Getting ready
How to do It...
How it works...
See also
Performing repetitive database design tasks from PDI
Getting ready
How to do It...
How it works...
See also
2. Reading and Writing Files
Introduction
Reading a simple file
Getting ready
How to do it...
How it works...
There's more...
Alternative notation for a separator
About file format and encoding
About data types and formats
Altering the names, order, or metadata of the fields coming from the file
Reading files with fixed width fields
Reading several files at the same time
Getting ready
How to do it...
How it works...
There's more...
Reading semi-structured files
Getting ready
How to do it...
How it works...
There's more...
Master/detail files
Logfiles
See also
Reading files having one field per row
Getting ready
How to do it...
How it works...
There's more...
See also
Reading files with some fields occupying two or more rows
Getting ready
How to do it...
How it works...
See also
Writing a simple file
Getting ready
How to do it...
How it works...
There's more...
Changing headers
Giving the output fields a format
Writing a semi-structured file
Getting ready
How to do it...
How it works...
There's more...
Providing the name of a file (for reading or writing) dynamically
Getting ready
How to do it...
How it works...
There's more...
Get System Info
Generating several files simultaneously with the same structure, but different names
Using the name of a file (or part of it) as a field
Getting ready
How to do it...
How it works...
Reading an Excel file
Getting ready
How to do it...
How it works...
See also
Getting the value of specific cells in an Excel file
Getting ready
How to do it...
How it works...
There's more...
Looking for a given cell
Writing an Excel file with several sheets
Getting ready
How to do it...
How it works...
There's more...
See also
Writing an Excel file with a dynamic number of sheets
Getting ready
How to do it...
How it works...
See also
Reading data from an AWS S3 Instance
Getting ready
How to do it...
How it works...
See also
3. Working with Big Data and Cloud Sources
Introduction
Loading data into Salesforce.com
Getting ready
How to do it...
How it works...
See also
Getting data from Salesforce.com
Getting ready
How to do it...
How it works...
See also
Loading data into Hadoop
Getting ready
How to do it...
How it works...
There's more...
See also
Getting data from Hadoop
Getting ready
How to do it...
How it works...
See also
Loading data into HBase
Getting ready
How to do it...
How it works...
There's more...
See also
Getting data from HBase
Getting ready
How to do it...
How it works...
See also
Loading data into MongoDB
Getting ready
How to do it...
How it works...
See also
Getting data from MongoDB
Getting ready
How to do it...
How it works...
See also
4. Manipulating XML Structures
Introduction
Reading simple XML files
Getting ready
How to do it...
How it works...
There's more...
XML data in a field
XML file name in a field
See also
Specifying fields by using the Path notation
Getting ready
How to do it...
How it works...
There's more...
Getting data from a different path
Getting data selectively
Getting more than one node when the nodes share their Path notation
Saving time when specifying Path
Validating well-formed XML files
Getting ready
How to do it...
How it works...
See also
Validating an XML file against DTD definitions
Getting ready
How to do it...
How it works...
There's more...
See also
Validating an XML file against an XSD schema
Getting ready
How to do it...
How it works...
There's more...
See also
Generating a simple XML document
Getting ready
How to do it...
How it works...
There's more...
Generating fields with XML structures
See also
Generating complex XML structures
Getting ready
How to do it...
How it works...
See also
Generating an HTML page using XML and XSL transformations
Getting ready
How to do it...
How it works...
There's more...
See also
Reading an RSS Feed
Getting ready
How to do it...
How it works...
See also
Generating an RSS Feed
Getting ready
How to do it...
How it works
There's more...
See also
5. File Management
Introduction
Copying or moving one or more files
Getting ready
How to do it...
How it works...
There's more...
Moving files
Detecting the existence of the files before copying them
Creating folders
See also
Deleting one or more files
Getting ready
How to do it...
How it works...
There's more...
Figuring out which files have been deleted
See also
Getting files from a remote server
How to do it...
How it works...
There's more...
Specifying files to transfer
Some considerations about connecting to an FTP server
Access via SFTP
Access via FTPS
Getting information about the files being transferred
See also
Putting files on a remote server
Getting ready
How to do it...
How it works...
There's more...
See also
Copying or moving a custom list of files
Getting ready
How to do it...
How it works...
See also
Deleting a custom list of files
Getting ready
How to do it...
How it works...
See also
Comparing files and folders
Getting ready
How to do it...
How it works...
There's more...
Comparing folders
Working with ZIP files
Getting ready
How to do it...
How it works...
There's more...
Avoiding zipping files
Avoiding unzipping files
See also
Encrypting and decrypting files
Getting ready
How to do it...
How it works...
There's more...
See also
6. Looking for Data
Introduction
Looking for values in a database table
Getting ready
How to do it...
How it works...
There's more...
Taking some action when the lookup fails
Taking some action when there are too many results
Looking for non-existent data
See also
Looking for values in a database with complex conditions
Getting ready
How to do it...
How it works...
There's more...
See also
Looking for values in a database with dynamic queries
Getting ready
How to do it...
How it works...
There's more...
See also
Looking for values in a variety of sources
Getting ready
How to do it...
How it works...
There's more...
Looking for alternatives when the Stream Lookup step doesn't meet your needs
Speeding up your transformation
Using the Value Mapper step for looking up from a short list of values
See also
Looking for values by proximity
Getting ready
How to do it...
How it works...
There's more...
Looking for values by using a web service
Getting ready
How to do it...
How it works...
There's more...
See also
Looking for values over intranet or the Internet
Getting ready
How to do it...
How it works...
There's more...
See also
Validating data at runtime
Getting ready
How to do it...
How it works...
There's more...
See also
7. Understanding and Optimizing Data Flows
Introduction
Splitting a stream into two or more streams based on a condition
Getting ready
How to do it...
How it works...
There's more...
Avoiding the use of Dummy steps
Comparing against the value of a Kettle variable
Avoiding the use of nested Filter rows steps
Overcoming the difficulties of complex conditions
Merging rows of two streams with the same or different structures
Getting ready
How to do it...
How it works...
There's more...
Making sure that the metadata of the streams is the same
Telling Kettle how to merge the rows of your streams
See also
Adding checksums to verify datasets
Getting ready
How to do it...
How it works...
Comparing two streams and generating differences
Getting ready
How to do it...
How it works...
There's more...
Using the differences to keep a table up-to-date
See also
Generating all possible pairs formed from two datasets
How to do it...
How it works...
There's more...
Getting variables in the middle of the stream
Limiting the number of output rows
See also
Joining two or more streams based on given conditions
Getting ready
How to do it...
How it works...
There's more...
See also
Interspersing new rows between existent rows
Getting ready
How to do it...
How it works...
See also
Executing steps even when your stream is empty
Getting ready
How to do it...
How it works...
There's more...
Processing rows differently based on the row number
Getting ready
How to do it...
How it works...
There's more...
Identifying specific rows
Identifying the last row in the stream
Avoiding using an Add sequence step to enumerate the rows
See also
Processing data into shared transformations via filter criteria and subtransformations
Getting ready
How to do it...
How it works...
See also
Altering a data stream with Select values
How to do it...
How it works...
Processing multiple jobs or transformations in parallel
How to do it...
How it works...
See also
8. Executing and Re-using Jobs and Transformations
Introduction
Sample transformations
Sample transformation – hello
Sample transformation – random list
Sample transformation – sequence
Sample transformation – file list
Launching jobs and transformations
How to do it...
How it works...
Executing a job or a transformation by setting static arguments and parameters
Getting ready
How to do it...
How it works...
There's more...
See also
Executing a job or a transformation from a job by setting arguments and parameters dynamically
Getting ready
How to do it...
How it works...
There's more...
See also
Executing a job or a transformation whose name is determined at runtime
Getting ready
How to do it...
How it works...
There's more...
See also
Executing part of a job once for every row in a dataset
Getting ready
How to do it...
How it works...
There's more...
Accessing the copied rows from jobs, transformations, and other entries
Executing a transformation once for every row in a dataset
Executing a transformation or part of a job once for every file in a list of files
See also
Executing part of a job several times until a condition is true
Getting ready
How to do it...
How it works...
There's more...
Implementing loops in a job
Using the JavaScript step to control the execution of the entries in your job
See also
Creating a process flow
Getting ready
How to do it...
How it works...
There's more...
Serializing/De-serializing data
Other means for transferring or sharing data between transformations
Moving part of a transformation to a subtransformation
Getting ready
How to do it...
How it works...
There's more...
Using Metadata Injection to re-use transformations
Getting ready
How to do it...
How it works...
There's more...
9. Integrating Kettle and the Pentaho Suite
Introduction
A sample transformation
Creating a Pentaho report with data coming from PDI
Getting ready
How to do it...
How it works...
There's more...
Creating a Pentaho report directly from PDI
Getting ready
How to do it...
How it works...
There's more...
See also
Configuring the Pentaho BI Server for running PDI jobs and transformations
Getting ready
How to do it...
How it works...
There's more...
See also
Executing a PDI transformation as part of a Pentaho process
Getting ready
How to do it...
How it works...
There's more...
Specifying the location of the transformation
Supplying values for named parameters, variables and arguments
Keeping things simple when it's time to deliver a plain file
See also
Executing a PDI job from the Pentaho User Console
Getting ready
How to do it...
How it works...
There's more...
See also
Generating files from the PUC with PDI and the CDA plugin
Getting ready
How to do it...
How it works...
There's more...
Populating a CDF dashboard with data coming from a PDI transformation
Getting ready
How to do it...
How it works...
There's more...
10. Getting the Most Out of Kettle
Introduction
Sending e-mails with attached files
Getting ready
How to do it...
How it works...
There's more...
Sending logs through an e-mail
Sending e-mails in a transformation
Generating a custom logfile
Getting ready
How to do it...
How it works...
There's more...
Filtering the logfile
Creating a clean logfile
Isolating logfiles for different jobs or transformations
See also
Running commands on another server
Getting ready
How to do it...
How it works...
See also
Programming custom functionality
Getting ready
How to do it...
How it works...
There's more...
Data type's equivalence
Generalizing your UDJC code
Looking up information with additional steps
Customizing logs
Scripting alternatives to the UDJC step
Generating sample data for testing purposes
How to do it...
How it works...
There's more...
Using a Data grid step to generate specific data
Working with subsets of your data
See also
Working with JSON files
Getting ready
How to do it...
How it works...
There's more...
Reading JSON files dynamically
Writing JSON files
Getting information about transformations and jobs (file-based)
Getting ready
How to do it...
How it works...
There's more...
Job XML nodes
Steps and entries information
See also
Getting information about transformations and jobs (repository-based)
Getting ready
How to do it...
How it works...
There's more...
Transformation tables
Job tables
Database connections tables
Using Spoon's built-in optimization tools
Getting ready
How to do it...
How it works...
There's more...
11. Utilizing Visualization Tools in Kettle
Introduction
Managing plugins with the Marketplace
Getting ready
How to do it...
How it works...
There's more...
See also
Data profiling with DataCleaner
Getting ready
How to do it...
How it works...
There's more...
See also
Visualizing data with AgileBI
Getting ready
How to do it...
How it works...
There's more...
See also
Using Instaview to analyze and visualize data
Getting ready
How to do it...
How it works...
There's more...
See also
12. Data Analytics
Introduction
Reading data from a SAS datafile
Why read a SAS file?
Getting ready
How to do it...
How it works...
See also
Studying data via stream statistics
Getting ready
How to do it...
How it works...
See also
Building a random data sample for Weka
Getting ready
How to do it...
How it works...
There's more...
See also
A. Data Structures
Books data structure
Books
Authors
museums data structure
museums
cities
outdoor data structure
products
categories
Steel Wheels data structure
Lahman Baseball Database
B. References
Books
Online
Index
买过这本书的人还买过
读了这本书的人还在读
同类图书排行榜