Example: Determining what data the operational data is derived from
You can use queries and the SQL views to find out where your operational data comes from, which puts operational data, such as data rule execution results, into the context of the information assets this data has been derived from.
Computing statistics from facts requires a clear understanding about what business data the operational data is derived from. In the context of data rules, this business data equates to information such as what database, table, or column the data rule processed to obtain the results that are shown in the fact table.
IBM® InfoSphere® Information Governance Catalog represents database artifacts as assets of type implemented data resources. The model fragment that represents these assets is depicted in the following graphic:
The PDRDATABASESCHEMA-PDRDATABASE-PDRHOSTSYSTEM paths are the same for the PDRDATABASETABLE and PDRDATABASEVIEW views. This model fragment shows only the first path completely.
You can add this dimension to the dimensional model in one of two ways:
- By using the data rule binding.
- This method is based on how the data rules are defined and is always present in any environment in which data rules are used.
- You might prefer to use this method if the governs relationship is not used.
- By using the governed asset relationship at the information governance
rule level.
- This method requires you to create the governs relationship in InfoSphere Information Governance Catalog.
- This method provides a more flexible way to define the scope of a data rule.
If both methods are available, some additional consistency checking might be helpful to ensure that the scope assignments are consistent.
The following model fragment shows how the hierarchy of implemented data resources connects with the fact table to implement the first option. This model fragment restricts the dimension to assets of type table.
The following query lists implemented data resources that are bound to data rules. A SELECT DISTINCT statement is used as a simple way to aggregate related facts. In a real usage scenario, facts would typically be aggregated using computations, such as average or maximum.
SELECT DISTINCT
datarule_dim.NAME DataRule,
binding.VARNAME Variable,
dbcol.NAME Column,
dbtab.NAME Table,
dbschema.NAME Schema,
dbase.NAME Database,
dbhost.NAME Host
FROM IAVIEWS.IARULEEXECUTIONHISTORY fact
INNER JOIN IAVIEWS.IARULE datarule_dim
ON datarule_dim.RID = fact.RULERID
INNER JOIN IAVIEWS.IARULEBINDING binding
ON binding.RULERID = fact.RULERID
INNER JOIN IAVIEWS.IAREGISTEREDCOLUMN bridge1
ON bridge1.RID = binding.TARGETRID
INNER JOIN CMVIEWS.PDRDATABASECOLUMN dbcol
ON dbcol.RID = bridge1.DATAFIELDRID
INNER JOIN CMVIEWS.PDRDATABASETABLE dbtab
ON dbtab.RID = dbcol.OFDATABASETABLERID
INNER JOIN CMVIEWS.PDRDATABASESCHEMA dbschema
ON dbschema.RID = dbtab.OFDATASCHEMARID
INNER JOIN CMVIEWS.PDRDATABASE dbase
ON dbase.RID = dbschema.OFDATABASERID
INNER JOIN CMVIEWS.PDRHOSTSYSTEM dbhost
ON dbhost.RID = dbase.HOSTEDBYRID
ORDER BY
datarule_dim.NAME
The results from running this query might look like the data in the following table.
DataRule | Variable | Column | Table | Schema | Database | Host |
---|---|---|---|---|---|---|
BusAddZipCodeFormat | datavalue | POSTALCODE | AZ_ADDRESS | SAMPLE | MDMIASAM | Host1 |
BusNameOneExists | datavalue | GIVENNAME1 | AZ_NAME | SAMPLE | MDMIASAM | Host1 |
DataRule2_1 | config_parameter | PARAMETER | SETUP_PARAMETER | IGDCONFG | XMETA | Host1 |
DataRule2_1 | config_value | VALUE | SETUP_PARAMETER | IGDCONFG | XMETA | Host1 |
IGDConfigCheck_DR | parameter_value | VALUE | SETUP_PARAMETER | IGDCONFG | XMETA | Host1 |
MsgLocaleExistenceCheck_DR | locale_val | LOCALE | MSG_LIB | IGDCONFG | XMETA | Host1 |
The following model fragment shows how the hierarchy of implemented data resources connects with the information governance rule dimension by using the governed asset relationship at the information governance rule level.
The corresponding query to this model fragment looks like the following example. A SELECT DISTINCT statement is used as a simple way to aggregate related facts. In a real usage scenario, facts would typically be aggregated using computations, such as average or maximum.
SELECT DISTINCT
datarule_dim.NAME DataRule,
dbtab.NAME Table,
dbschema.NAME Schema,
dbase.NAME Database,
dbhost.NAME Host
FROM IAVIEWS.IARULEEXECUTIONHISTORY fact
INNER JOIN IAVIEWS.IARULE datarule_dim
ON datarule_dim.RID = fact.RULERID
INNER JOIN IGVIEWS.IGIMPLEMENTEDBYIARULE bridge1
ON bridge1.IMPLEMENTATIONRULERID = fact.RULERID
INNER JOIN IGVIEWS.IGBUSINESSRULE igrule_dim
ON igrule_dim.RID = bridge1.BUSINESSRULERID
INNER JOIN IGVIEWS.IGGOVERNEDOBJECTSOFARULE bridge2
ON bridge2.BUSINESSRULERID = igrule_dim.RID
INNER JOIN CMVIEWS.PDRDATABASETABLE dbtab
ON dbtab.RID = bridge2.GOVERNEDOBJECTRID
INNER JOIN CMVIEWS.PDRDATABASESCHEMA dbschema
ON dbschema.RID = dbtab.OFDATASCHEMARID
INNER JOIN CMVIEWS.PDRDATABASE dbase
ON dbase.RID = dbschema.OFDATABASERID
INNER JOIN CMVIEWS.PDRHOSTSYSTEM dbhost
ON dbhost.RID = dbase.HOSTEDBYRID
ORDER BY
datarule_dim.NAME
The results from running this query might look like the data in the following table.
DataRule | Variable | Table | Schema | Database | Host |
---|---|---|---|---|---|
BusAddZipCodeFormat | datavalue | AZ_ADDRESS | SAMPLE | MDMIASAM | Host1 |
BusAddZipCodeFormat | datavalue | AZ_PHONE | SAMPLE | MDMIASAM | Host1 |
BusAddZipCodeFormat | datavalue | AZ_POSTCODE_REF | SAMPLE | MDMIASAM | Host1 |
BusNameOneExists | datavalue | AZ_NAME | SAMPLE | MDMIASAM | Host1 |
BusNameOneExists | datavalue | AZ_NAME_REF | SAMPLE | MDMIASAM | Host1 |
DataRule2_1 | config_parameter | SETUP_PARAMETER | IGDCONFG | XMETA | Host1 |
DataRule2_1 | config_value | SETUP_PARAMETER | IGDCONFG | XMETA | Host1 |
IGDConfigCheck_DR | parameter_value | SETUP_PARAMETER | IGDCONFG | XMETA | Host1 |
A typical application of the technical scope dimension is to ensure the correct aggregation of facts. A query that aggregates the number of records for a dimension, such as by policy, might lead to improper results because a policy might have multiple data rules assigned. The fact that a data rule can cover multiple database objects can also contribute to improper results. Including the appropriate level of the database asset hierarchy helps to ensure that values of the NumRecordsProcessed fact contribute only to the average for the appropriate database asset.