Importing CSV files to a collection

If you want to add documents to the index without crawling repositories, you can import files in comma-separated value (CSV) format. Users can query this content in the same way that they explore other content in the collection.

About this task

Data is imported successfully if the content follows the CSV file format regardless of the file extension, such as .csv, .dat, .text, or .txt. When you run the CSV file import wizard, you can verify that the format of the data is correct by previewing the content before you import it.

Procedure

To add CSV files to a collection:

  1. On the Collections view, expand the collection that you want to add CSV files to.
  2. In the Import area, click Import CSV files.
  3. Specify the location of the files that you want to import. If you specify a directory path, all CSV files in the directory are imported. Files with more than 128 columns or records that are larger than 512 KB cannot be imported.
    Local path
    Select this option if the CSV file is on your local computer. Click Browse to select the file.
    Master server path
    Select this option if the CSV file or files are on the master server. Type the fully qualified path for the file or for the directory that contains the CSV files that you want to add.
  4. Specify whether you want to use the system default settings for importing the CSV files, re-use settings that you saved when you previously ran the CSV file import wizard, or re-use settings that are specified in a property file. If you reuse saved settings, you can select the settings that you want to apply for importing these documents. If you reuse settings from a property file, click Browse to select the file.
  5. Specify how the system is to read the CSV files. A preview of the CSV file content helps you configure this information. Specify:
    • The character encoding set
    • The column separator character or characters
    • The line number where the parser is to begin reading a file
    • Whether the first line is to be treated as header information
  6. Map the columns in the CSV files to index fields so that users can search the CSV content. Also specify:
    • A unique request ID (which is used, for example, to monitor the import request and generate unique document URIs)
    • Whether you want to select the column or columns that uniquely identify documents or let the system generate a unique identifier
    • The column that you want to use as the document date column
    • The format to use for date data in the date column or in columns that are mapped to date index fields. This format is based on the case sensitive Java SimpleDateFormat class. In the format yyyy-MM-dd, yyyy represents the year, MM represents the month, and dd represents the day in the month.
    • The format to use for numeric data in columns that are mapped to decimal parametric index fields. This format is based on the Java DecimalFormat class.
    • The time zone and locale to use for parsing date and decimal data
    • The language to use for parsing the imported documents. You can specify a default language to use and also specify that the parser is to try to detect the source language.
  7. Specify whether you want to save your configuration settings to re-use another time. To save your settings, specify a descriptive name so that you can select it when you configure CSV file import settings at another time. You can also save your settings to a property file, and reload the file when you run the import wizard again.
  8. In the Import area, monitor the progress of the import task. You can see the status of the current CSV document import task and the CSV document import tasks that are waiting to be processed.
  9. Optional: Open the CSV file import history to see information about CSV document import tasks that have been processed, such as how many records were read and whether a task completed successfully or errors occurred.

    If you delete an import task, all documents that were added to the index by the task are deleted from the index.