Schema file elements for getting the results of a column analysis

Specify elements in your XML Schema Definition (XSD) document to request the results of a column analysis. This is a description of the elements of the XML document that is returned when requesting the results of a column analysis.

Name of the XSD file where the elements are defined

iaapi.xsd, which is in the following locations:
  • ASBServer/docs/IA/schema (server)
  • ASBNode/docs/IA/schema (client)

Commands that use these elements

GET columnAnalysis/results

Example of an XML document for the GET columnAnalysis command:
<?xml version="1.0" encoding="UTF-8"?>
<iaapi:Project xmlns:iaapi="http://www.ibm.com/investigate/api/iaapi" 
   name="project1">
  <DataSources>
    <DataSource name="SOURCE1">
      <Schema name="SCHEMA1">
        <Table name="TABLE1">
          <Column name="COL1">
            <ColumnAnalysisResults>
              <RuntimeMetadata sampleUsed=”true” whereClause=”GENDER='M'”
                      analysisDate=”2010-02-09T18:56:31+01:00” runTime=”120”>
                <SampleOptions type=”RANDOM” percent=”0.10” size=”100”/>
              </RuntimeMetadata>
              <Cardinality count=”1000” percent=”1.0” 
                      inferredCardinalityType=”UNIQUE”
                      maxValue="1900-01-01 23:41:00.000000" 
                      minValue="1900-01-01 00:01:00.000000" 
                      definedCardinalityType=”NOT_CONSTRAINED” 
                      selectedCardinalityType=”UNIQUE”
                      sequence="4" 
                      totalRows="1000" />
              <DataType definedType=”STRING” definedLength=”128”
                      inferredType=”STRING” inferredLength=”32”
                      selectedType=”STRING” selectedLength=”32”
                      definedNullability=”true” inferredNullability=”true”
                      selectedNullability=”true”
                      definedIsEmpty=”false” inferredIsEmpty=”false” 
                      selectedIsEmpty=”false”
                      inferredIsConstant=”false” definedIsConstant=”false”/>
              <DataClass classificationDate=”2010-02-09T18:56:31+01:00”
                      classificationStatus=”REVIEWED”
                      inferredClass=”Code”
                      selectedClass=”Code”/>
              <Format analysisDate=”2010-02-09T18:56:31+01:00” 
                      analysisStatus=”REVIEWED”
                      generalFormat=”AAAA” generalFormatPercent=”0.95”/>
              <CompletenessAnalysis analysisDate=”2010-02-09T18:56:31+01:00”
                      analysisStatus=”REVIEWED”/>
              <DomainAnalysis analysisDate=”2010-02-09T18:56:31+01:00”
                      analysisStatus=”REVIEWED”/>
             <Notes>
               <Note status="Opened" subject="Column Notes" textContent="This is
                  notes for the column" type="Action"/>
               </Notes>
             <Terms>
               <Term name="Category1/Term1"/>
               <Term name="Category1/Term2"/>
             </Terms>
            </ColumnAnalysisResults>
          </Column>
          <Column name="COL2">
            <ColumnAnalysisResults>
              (…)
            </ColumnAnalysisResults>
          </Column>
        </Table>
      </Schema>
    </DataSource>
  </DataSources>
</iaapi:Project>

XSD file elements

<Project name=”...”>
Specifies the name of the project.

The following table shows the attributes of the <Project> element:

Table 1. Attributes of the <Project> element
Attribute Description
name The name of the project

The following table shows the children of the <Project> element:

Table 2. Children of the <Project> element
Element Cardinality Description
<DataSources> 0 or 1 A list of data sources that are registered for the project
<Column name=”...”>
Specifies a physical column to register in a physical or virtual table that is defined in a project.

The following table shows the attributes of the <Column> element:

Table 3. Attributes of the <Column> element
Attribute Description
name The name of the column

The following table shows the children of the <Column> element:

Table 4. Children of the <Column> element
Element Cardinality Description
<ColumnAnalysisResults> 0 or 1 The results of the column analysis for this column

The following table shows the children of the <ColumnAnalysisResults> element:

Table 5. Children of the <ColumnAnalysisResults> element
Element Cardinality Description
<RunTimeMetaData> 0 or 1 Specifies the elements that were used during the analysis
<Cardinality> 0 or 1 Specifies the number of distinct values that are found and the percentage of distinct values in the total number of records
<DataType> 0 or 1 Shows the defined, inferred, and selected data types
<DataClass> 0 or 1 Specifies the inferred and selected data classes
<Format> 0 or 1 Specifies the general information about the format analysis
<CompletenessAnalysis> 0 or 1 Specifies the date and status of the completeness analysis for this column
<DomainAnalysis> 0 or 1 Specifies the date and status of the domain analysis for this column
<FrequencyAnalysis> 0 or 1 Specifies the date and status of the frequency analysis for this column
<Notes> 0 or 1 Specifies any notes or annotations associated with the column.
<Terms> 0 or 1 Specifies the terms that are associated with the specified column. Includes the parent category for each term.
<Cardinality>
Specifies the number of values in a column.

The following table shows the attributes of the <Cardinality> element:

Table 6. Attributes of the <Cardinality> element
Attribute Description
count The number of distinct values in a column
percent The total number of distinct values in a column divided by the total number of values in the same column
inferredCardinalityType The inferred cardinality type, based on the actual data type of the column. Possible values are:
  • unique_and_constant
  • unique
  • constant
  • not_constrained
maxValue Maximum value in the column
minValue Minimum value in the column
definedCardinalityType The defined cardinality type, as defined in the data source. Possible values are:
  • unique_and_constant
  • unique
  • constant
selectedCardinalityType The cardinality type selected for this column. Possible values are:
  • unique_and_constant
  • unique
  • constant
sequence Serial number of the column in a table
totalRows Total number of rows in the column
<DataType>
Specifies the defined, inferred, and selected data types.

The following table shows the attributes of the <DataType> element:

Table 7. Attributes of the <DataType> element
Attribute Description
definedType The data type defined in the source. Possible values are:
  • Boolean
  • date
  • datetime
  • decimal
  • dfloat
  • int8
  • int16
  • int32
  • int64
  • sfloat
  • qfloat
  • time
  • string
inferredType The data type inferred from the actual data of the column
selectedType The data type selected for this column
definedLength The data length defined in the source
inferredLength The data length inferred from the actual data of the column
selectedType The data length selected for this column
definedLength The data length defined in the source
inferredLength The data length inferred from the actual data of the column
selectedLength The data length selected for this column
definedPrecision The data precision defined in the data source
inferredPrecision The data precision inferred from the actual data of the column
selectedPrecision The data precision selected for this column
definedScale The data scale defined in the data source
InferredScale The data scale inferred from the actual data of the column
selectedScale The data scale selected for this column
definedNullability The nullability defined in the data source
inferredNullability The nullability inferred from the actual data of the column
SelectedNullability The nullability selected for this column
definedIsEmpty The empty flag defined in the data source
inferredIsEmpty The empty flag inferred from the data of the column
selectedIsEmpty The empty flag selected for this column
inferredIsConstant The constant flag inferred from the data of the column
selectedIsConstant The constant flag selected by the user for this column
<DataClass>
Specifies the inferred and selected data classes.

The following table shows the attributes of the <DataClass> element:

Table 8. Attributes of the <DataClass> element
Attribute Description
inferredClass The data class inferred by IBM® InfoSphere® Information Analyzer from the data of the column
selectedClass The data class selected for this column
classificationDate The date and time of the data classification
classificationStatus The status of the data classification. Possible values are:
  • not_done
  • processing_started
  • processing_completed
  • review_completed
  • error
  • not_found
  • not_analyzable
  • review_only
  • lightweight_review
<Format>
Represents the most frequent inferred format of a column.

The following table shows the attributes of the <Format> element:

Table 9. Attributes of the <Format> element
Attribute Description
generalFormat The most general format
generalFormatPercent The percentage of all column values that match the format
analysisDate The date and time of the analysis
analysisStatus The status of the format analysis. Possible values are:
  • not_done
  • processing_started
  • processing_completed
  • review_completed
  • error
  • not_found
  • not_analyzable
  • review_only
  • lightweight_review
<CompletenessAnalysis>
Specifies the last completeness analysis run.

The following table shows the attributes of the <CompletenessAnalysis> element:

Table 10. Attributes of the <CompletenessAnalysis> element
Attribute Description
analysisDate The date and time of the analysis
analysisStatus The status of the completeness analysis. Possible values are:
  • not_done
  • processing_started
  • processing_completed
  • review_completed
  • error
  • not_found
  • not_analyzable
  • review_only
  • lightweight_review
<DomainAnalysis>
Specifies the last domain analysis run.

The following table shows the attributes of the <DomainAnalysis> element:

Table 11. Attributes of the <DomainAnalysis> element
Attribute Description
analysisDate The date and time of the analysis
analysisStatus The status of the domain analysis. Possible values include:
  • not_done
  • processing_started
  • processing_completed
  • review_completed
  • error
  • not_found
  • not_analyzable
  • review_only
  • lightweight_review
<FrequencyDistribution>
Specifies the number of occurrences of each distinct data value of a column.

The following table shows the attributes of the <FrequencyDistribution> element:

Table 12. Attributes of the <FrequencyDistribution> element
Attribute Description
nbOfDistinctValues The number of distinct values that occur in a column

The following table shows the children of the <FrequencyDistribution> element:

Table 13. Children of the <FrequencyDistribution> element
Element Cardinality Description
<Value> 0 to unbounded Specifies a distinct value and the number of its occurrences
<Value>
Specifies the frequency and percentage of a particular distinct value.

The following table shows the attributes of the <Value> element:

Table 14. Attributes of the <Value> element
Attribute Description
frequency The absolute number of occurrences of the corresponding distinct value in a column
percent The percentage of all occurrences of a particular distinct value with respect to the total number of values in a column