KM423G Outline - IBM InfoSphere DataStage v11.5 - Advanced Data Processing

Detailed Course Outline

Unit 1 –Accessing databasesTopic 1: Connector stage overview• Use Connector stages to read from and write to relational tables• Working with the Connector stage propertiesTopic 2: Connector stage functionality• Before / After SQL• Sparse lookups• Optimize insert/update performanceTopic 3: Error handling in Connector stages• Reject links• Reject conditionsTopic 4: Multiple input links• Designing jobs using Connector stages with multiple input links• Ordering records across multiple input linksTopic 5: File Connector stage• Read and write data to Hadoop file systemsDemonstration 1: Handling database errorsDemonstration 2: Parallel jobs with multiple Connector input linksDemonstration 3: Using the File Connector stage to read and write HDFS files

Unit 2 – Processing unstructured dataTopic 1: Using the Unstructured Data stage in DataStage jobs• Extract data from an Excel spreadsheet• Specify a data range for data extraction in an Unstructured Data stage• Specify document properties for data extraction.Demonstration 1: Processing unstructured data

Unit 3 – Data maskingTopic 1: Using the Data Masking stage in DataStage jobs• Data masking techniques• Data masking policies• Applying policies for masquerading context-aware data types• Applying policies for masquerading generic data types• Repeatable replacement• Using reference tables• Creating custom reference tablesDemonstration 1: Data masking

Unit 4 – Using data rulesTopic 1: Introduction to data rules• Using the Data Rules Editor• Selecting data rules• Binding data rule variables• Output link constraints• Adding statistics and attributes to the output informationTopic 2: Use the Data Rules stage to valid foreign key references in source dataTopic 3: Create custom data rulesDemonstration 1: Using data rules

Unit 5 – Processing XML dataTopic 1: Introduction to the Hierarchical stage• Hierarchical stage Assembly editor• Use the Schema Library Manager to import and manage XML schemasTopic 2: Composing XML data• Using the HJoin step to create parent-child relationships between input lists• Using the Composer stepTopic 3: Writing Hierarchical data to a relational tableTopic 4: Using the Regroup stepTopic 5: Consuming XML data• Using the XML Parser step• Propagating columnsTopic 6: Transforming XML data• Using the Aggregate step• Using the Sort step• Using the Switch step• Using the H-Pivot stepDemonstration 1: Importing XML schemasDemonstration 2: Compose hierarchical dataDemonstration 3: Consume hierarchical dataDemonstration 4: Transform hierarchical data

Unit 6: Updating a star schema databaseTopic 1: Surrogate keys• Design a job that creates and updates a surrogate key source key file from a dimension tableTopic 2: Slowly Changing Dimensions (SCD) stage• Star schema databases• SCD stage Fast Path pages• Specifying purpose codes• Dimension update specification• Design a job that processes a star schema database with Type 1 and Type 2 slowly changing dimensionsDemonstration 1: Build a parallel job that updates a star schema database with two dimensions

IBM InfoSphere DataStage v11.5 – Advanced Data Processing (KM423G) – Outline

Detailed Course Outline