Types of data that can be extracted from Microsoft Excel

You can use the Unstructured Data stage to extract several types of data from a Microsoft Excel file.

File properties
The following table lists the information that can be extracted as file properties:
Table 1. Data that can be extracted as file properties
Data Description
File name Name of the file. For example: Workbook1.xls
File path Path of the file. For example: C:\excel\Workbook1.xls
File size Size of the file in bytes.
Last modified date The date and time that the file was last modified.
Document properties
The following table lists the information that can be extracted as document properties:
Table 2. Data that can be extracted as document properties
Data Description
Authors Authors of the document.
Document comments Comments of the document.
Content creation date The date and time that the document was created.
Key words Key words of the document.
Revision number Revision number of the document.
Subject Subject of the document.
Title Title of the document.
Company Company property value of the document.
Category Category of the document.
Manager Manager of the document.
Custom properties Custom properties of the document. You must specify the name of the custom property to extract.
Sheet information
The following table lists the information that can be extracted as sheet information:
Table 3. Data that can be extracted as sheet information
Data Description
Sheet name Name of the Microsoft Excel sheet.
Header (left, center, right) Header of the specified position.
Footer (left, center, right) Footer of the specified position.
Row information
The following table lists the information that can be extracted as row information:
Table 4. Data that can be extracted as row information
Data Description
Row number Microsoft Excel row number within the sheet. The first row number is 1.
Is hidden Whether the row is hidden or not. Writes true if the row or the sheet to which this row belongs is hidden.
Cell information
You can extract the cell information based on the Microsoft Excel column or the cell position. You can specify the source Microsoft Excel column based on the relative position within the data range when extracting the cell information based on the Microsoft Excel column.
The following table lists information that can be extracted as cell information:
Table 5. Data that can be extracted as cell information
Data Description
Value Value of a cell. If the cell has a formula, the stage extracts the value from the cache.
Comment Comment of a cell.
Author of Comment Author of the comment of a cell.
Formula Formula of a cell in text.
Hyperlink Type Type of hyperlink of a cell.
Hyperlink Address The address this hyperlink points to. The format depends on type of this hyperlink.
Hyperlink label Text label for this hyperlink.