Example: Determining what data the operational data is derived from

You can use queries and the SQL views to find out where your operational data comes from, which puts operational data, such as data rule execution results, into the context of the information assets this data has been derived from.

Computing statistics from facts requires a clear understanding about what business data the operational data is derived from. In the context of data rules, this business data equates to information such as what database, table, or column the data rule processed to obtain the results that are shown in the fact table.

IBM® InfoSphere® Information Governance Catalog represents database artifacts as assets of type implemented data resources. The model fragment that represents these assets is depicted in the following graphic:

Database assets model fragment

The PDRDATABASESCHEMA-PDRDATABASE-PDRHOSTSYSTEM paths are the same for the PDRDATABASETABLE and PDRDATABASEVIEW views. This model fragment shows only the first path completely.

You can add this dimension to the dimensional model in one of two ways:

  • By using the data rule binding.
    • This method is based on how the data rules are defined and is always present in any environment in which data rules are used.
    • You might prefer to use this method if the governs relationship is not used.
  • By using the governed asset relationship at the information governance rule level.
    • This method requires you to create the governs relationship in InfoSphere Information Governance Catalog.
    • This method provides a more flexible way to define the scope of a data rule.

If both methods are available, some additional consistency checking might be helpful to ensure that the scope assignments are consistent.

The following model fragment shows how the hierarchy of implemented data resources connects with the fact table to implement the first option. This model fragment restricts the dimension to assets of type table.

Model fragment that demonstrates the hierarchy of implemented data resources that connect with the fact table

The following query lists implemented data resources that are bound to data rules. A SELECT DISTINCT statement is used as a simple way to aggregate related facts. In a real usage scenario, facts would typically be aggregated using computations, such as average or maximum.

SELECT DISTINCT
   datarule_dim.NAME DataRule,
   binding.VARNAME Variable,
   dbcol.NAME Column,
   dbtab.NAME Table,
   dbschema.NAME Schema,
   dbase.NAME Database,
   dbhost.NAME Host
   
FROM IAVIEWS.IARULEEXECUTIONHISTORY fact
   INNER JOIN IAVIEWS.IARULE datarule_dim
      ON datarule_dim.RID = fact.RULERID
   INNER JOIN IAVIEWS.IARULEBINDING binding
      ON binding.RULERID = fact.RULERID
   INNER JOIN IAVIEWS.IAREGISTEREDCOLUMN bridge1
      ON bridge1.RID = binding.TARGETRID
   INNER JOIN CMVIEWS.PDRDATABASECOLUMN dbcol
      ON dbcol.RID = bridge1.DATAFIELDRID
   INNER JOIN CMVIEWS.PDRDATABASETABLE dbtab
      ON dbtab.RID = dbcol.OFDATABASETABLERID
   INNER JOIN CMVIEWS.PDRDATABASESCHEMA dbschema
      ON dbschema.RID = dbtab.OFDATASCHEMARID
   INNER JOIN CMVIEWS.PDRDATABASE dbase
      ON dbase.RID = dbschema.OFDATABASERID
   INNER JOIN CMVIEWS.PDRHOSTSYSTEM dbhost
      ON dbhost.RID = dbase.HOSTEDBYRID
   
ORDER BY
   datarule_dim.NAME

The results from running this query might look like the data in the following table.

DataRule Variable Column Table Schema Database Host
BusAddZipCodeFormat datavalue POSTALCODE AZ_ADDRESS SAMPLE MDMIASAM Host1
BusNameOneExists datavalue GIVENNAME1 AZ_NAME SAMPLE MDMIASAM Host1
DataRule2_1 config_parameter PARAMETER SETUP_PARAMETER IGDCONFG XMETA Host1
DataRule2_1 config_value VALUE SETUP_PARAMETER IGDCONFG XMETA Host1
IGDConfigCheck_DR parameter_value VALUE SETUP_PARAMETER IGDCONFG XMETA Host1
MsgLocaleExistenceCheck_DR locale_val LOCALE MSG_LIB IGDCONFG XMETA Host1

The following model fragment shows how the hierarchy of implemented data resources connects with the information governance rule dimension by using the governed asset relationship at the information governance rule level.

How the hierarchy of implemented data resources connects with the information governance rule dimension

The corresponding query to this model fragment looks like the following example. A SELECT DISTINCT statement is used as a simple way to aggregate related facts. In a real usage scenario, facts would typically be aggregated using computations, such as average or maximum.

SELECT DISTINCT
   datarule_dim.NAME DataRule,
   dbtab.NAME Table,
   dbschema.NAME Schema,
   dbase.NAME Database,
   dbhost.NAME Host
   
FROM IAVIEWS.IARULEEXECUTIONHISTORY fact
   INNER JOIN IAVIEWS.IARULE datarule_dim
      ON datarule_dim.RID = fact.RULERID
   INNER JOIN IGVIEWS.IGIMPLEMENTEDBYIARULE bridge1
      ON bridge1.IMPLEMENTATIONRULERID = fact.RULERID
   INNER JOIN IGVIEWS.IGBUSINESSRULE igrule_dim
      ON igrule_dim.RID = bridge1.BUSINESSRULERID
   INNER JOIN IGVIEWS.IGGOVERNEDOBJECTSOFARULE bridge2
      ON bridge2.BUSINESSRULERID = igrule_dim.RID
   INNER JOIN CMVIEWS.PDRDATABASETABLE dbtab
      ON dbtab.RID =  bridge2.GOVERNEDOBJECTRID
   INNER JOIN CMVIEWS.PDRDATABASESCHEMA dbschema
      ON dbschema.RID = dbtab.OFDATASCHEMARID
   INNER JOIN CMVIEWS.PDRDATABASE dbase
      ON dbase.RID = dbschema.OFDATABASERID
   INNER JOIN CMVIEWS.PDRHOSTSYSTEM dbhost
      ON dbhost.RID = dbase.HOSTEDBYRID
   
ORDER BY
   datarule_dim.NAME

The results from running this query might look like the data in the following table.

DataRule Variable Table Schema Database Host
BusAddZipCodeFormat datavalue AZ_ADDRESS SAMPLE MDMIASAM Host1
BusAddZipCodeFormat datavalue AZ_PHONE SAMPLE MDMIASAM Host1
BusAddZipCodeFormat datavalue AZ_POSTCODE_REF SAMPLE MDMIASAM Host1
BusNameOneExists datavalue AZ_NAME SAMPLE MDMIASAM Host1
BusNameOneExists datavalue AZ_NAME_REF SAMPLE MDMIASAM Host1
DataRule2_1 config_parameter SETUP_PARAMETER IGDCONFG XMETA Host1
DataRule2_1 config_value SETUP_PARAMETER IGDCONFG XMETA Host1
IGDConfigCheck_DR parameter_value SETUP_PARAMETER IGDCONFG XMETA Host1

A typical application of the technical scope dimension is to ensure the correct aggregation of facts. A query that aggregates the number of records for a dimension, such as by policy, might lead to improper results because a policy might have multiple data rules assigned. The fact that a data rule can cover multiple database objects can also contribute to improper results. Including the appropriate level of the database asset hierarchy helps to ensure that values of the NumRecordsProcessed fact contribute only to the average for the appropriate database asset.