IBM InfoSphere DataStage v11.5 – Advanced Data Processing (KM423G) – Outline

Detailed Course Outline

Unit 1 –Accessing databasesTopic 1:  Connector stage overview• Use Connector stages to read from and write to relational tables• Working with the Connector stage propertiesTopic 2:  Connector stage functionality• Before / After SQL• Sparse lookups• Optimize insert/update performanceTopic 3:  Error handling in Connector stages• Reject links• Reject conditionsTopic 4:  Multiple input links• Designing jobs using Connector stages with multiple input links• Ordering records across multiple input linksTopic 5:  File Connector stage• Read and write data to Hadoop file systemsDemonstration 1: Handling database errorsDemonstration 2:  Parallel jobs with multiple Connector input linksDemonstration 3:  Using the File Connector stage to read and write HDFS files

Unit 2 – Processing unstructured dataTopic 1:  Using the Unstructured Data stage in DataStage jobs• Extract data from an Excel spreadsheet• Specify a data range for data extraction in an Unstructured Data stage• Specify document properties for data extraction.Demonstration 1:  Processing unstructured data

Unit 3 – Data maskingTopic 1:  Using the Data Masking stage in DataStage jobs• Data masking techniques• Data masking policies• Applying policies for masquerading context-aware data types• Applying policies for masquerading generic data types• Repeatable replacement• Using reference tables• Creating custom reference tablesDemonstration 1: Data masking

Unit 4 – Using data rulesTopic 1:  Introduction to data rules• Using the Data Rules Editor• Selecting data rules• Binding data rule variables• Output link constraints• Adding statistics and attributes to the output informationTopic 2:  Use the Data Rules stage to valid foreign key references in source dataTopic 3:  Create custom data rulesDemonstration 1:  Using data rules

Unit 5 – Processing XML dataTopic 1:  Introduction to the Hierarchical stage• Hierarchical stage Assembly editor• Use the Schema Library Manager to import and manage XML schemasTopic 2:  Composing XML data• Using the HJoin step to create parent-child relationships between input lists• Using the Composer stepTopic 3:  Writing Hierarchical data to a relational tableTopic 4:  Using the Regroup stepTopic 5:  Consuming XML data• Using the XML Parser step• Propagating columnsTopic 6:  Transforming XML data• Using the Aggregate step• Using the Sort step• Using the Switch step• Using the H-Pivot stepDemonstration 1:  Importing XML schemasDemonstration 2: Compose hierarchical dataDemonstration 3: Consume hierarchical dataDemonstration 4:  Transform hierarchical data

Unit 6:  Updating a star schema databaseTopic 1:  Surrogate keys• Design a job that creates and updates a surrogate key source key file from a dimension tableTopic 2:  Slowly Changing Dimensions (SCD) stage• Star schema databases• SCD stage Fast Path pages• Specifying purpose codes• Dimension update specification• Design a job that processes a star schema database with Type 1 and Type 2 slowly changing dimensionsDemonstration 1: Build a parallel job that updates a star schema database with two dimensions