Overview
| Course code | 1I002 | Skill level | Basic |
|---|---|---|---|
| Duration | 5.0 hours | Delivery type | Web Based Training |
| Course type | Public only | ||
| Public price | USD $340.00 plus tax |
Note: This is a self-paced online course in which the Guided Tour Product Simulation requires no installations: Everything is through your IE 5.0+ browser. Please DO NOT make travel arrangements for this course. After you receive confirmation that you are registered, just follow the instructions to access the course.
This browser-based product simulation lets you ""learn-by-doing"" in a case-study scenario in which you solve a 'real-life' business problem - no installations needed just a browser and a passion to learn.
By the end of this training you'll have a better understanding of this InfoSphere product's terminology, an overview of its architecture, and a better understanding of how it can be implemented.
You can learn more about the FlexLearning library:
http://www-304.ibm.com/jct03001c/services/learning/ites.wss/us/en?pageType=page&c=a0011797
USA IBM Training Registrar:
e-mail: iiseduc@us.ibm.com
Before the Guided Tour Product Simulation begins, this course will start by talking about the intent of the tutorial. Then we'll attempt to answer the question ""What is data analysis?"". We'll discuss some use-cases and then we'll discuss what it means to understand information (from a process standpoint). In doing so, we'll try to answer questions such as ""What is it that you are trying to understand?"", ""What are the core scenarios for understanding data?"", ""What are you looking at when you go through source system analysis?"", ""What are you trying to discover?"", ""What are you trying to deliver?"".
We'll then look at basic practices in detail. Specifically, we'll discuss the integrity of data. We'll do this by assessing and analyzing:
Metadata Integrity (How well the descriptions of the data can be understood)
Domain Integrity (Looking at discrete chunks of data - for example Name, Credit Rating, or Creation Date are domains of data - and then assessing whether or not the elements that you have within a given domain are well-understood, complete, and valid. This section will discuss in detail the seven inferred (by Information Analyzer) data classifications;
- Identifiers
- Indicators
- Codes
- Quantifiers
- Dates/Times
- Text
- Unknowns
Structural Integrity (Looking at how well-defined a key is. Looking at a table or a file and finding clear identification in it -and potentially across tables- thus ensuring no duplicated information). Here, we'll also look at how well-defined pieces of data are with regard to length, data type, and things that might affect ETL data processes.
Relational Integrity (Looking at two things. First, asking ""How well do keys support each other, and are key relationships maintained across tables?"" - known as referential integrity. Second, the consistency and potentially redundancy of information from one source to another source – For example, if you have a State Code in one table, is it consistent with State Codes in other tables, do they have the same abbreviation, do they have the same set of values, and do they have the same format?)
Finally, we'll wrap things up by talking about next steps. For example, once some analysis has been performed, how do you proceed? What additional analytic techniques can be brought to bear? From a project perspective, what best practices are available to you from the IPS Services group at IBM? Then we'll talk about how analysis might tie into a data integration project's life cycle, and how you might treat things from a broader project perspective within a methodology framework.
This tutorial is not going to teach you how to use the IBM Information Analyzer product. Instead, the intent of this tutorial is to teach you basic data profiling practices and show you how to go about understanding information through data profiling and data analysis.
View this course in other countries
Training Paths that reference this course are:
Audience
This basic course is for:
- Business Analysts
- Data Analysts
- ETL Developers
- Database Administrators
- Data Architects
- Integration/Migration Team Members
- Project Leads and Managers
- IBM Information Analyzer Users
- Data Stewards trying to better understand data
- IT Professionals determining optimization of system resources
- Managers interested in data compliance and governance
Prerequisites
You should have:
- Familiarity with relational databases (Understand basic terminology including the idea of a table, a column of information or alternately a file or field of data - if it's a database, the idea that it has some sort of a key or identifier, and that keys may have a relationship between multiple tables.
- Some exposure to IBM Information Analyzer (IA), its navigation and procedural flow (This training focuses on what to do with the results of a previous profiling of the data). The exposure to IA might have been accomplished by taking the IA FlexLearning (Entitled "Introduction to Information Analyzer"), an Instructor Led Training course, or review of the documentation.
Familiarity with SQL and familiarity with programming tools are not required.
Skills taught
- Understand the basic scenarios for data profiling
- Understand how to analyze data at several levels
- Determine the integrity of data
- Use Information Analyzer to learn-by-doing
Course outline
- Beginning Data Analysis
- Metadata Integrity
- Domain Integrity
- Data Classification
- Structural Integrity
- Relational Integrity
- Analysis in the Project Lifecycle
- Performing all of the above in a Guided Tour Product Simulation