If you want to add documents to the index without crawling
repositories, you can import files in comma-separated value (CSV)
format. Users can query this content in the same way that they explore
other content in the collection.
About this task
Data is imported successfully if the content
follows the CSV file format regardless of the file extension, such
as .csv, .dat, .text, or .txt. When you run the CSV file import wizard,
you can verify that the format of the data is correct by previewing
the content before you import it.
Procedure
To add CSV files to a collection:
- On the Collections view, expand the
collection that you want to add CSV files to.
- In the Import area, click Import
CSV files.
- Specify the location of the files that you want
to import. If you specify a directory path, all CSV files
in the directory are imported. Files with more than 128 columns or
records that are larger than 512 KB cannot be imported.
- Local path
- Select this option if the CSV file is on your local computer.
Click Browse to select the file.
- Master server path
- Select this option if the CSV file or files are on the master
server. Type the fully qualified path for the file or for the directory
that contains the CSV files that you want to add.
- Specify whether you want to use the system default settings
for importing the CSV files, re-use settings that you saved when you
previously ran the CSV file import wizard, or re-use settings that
are specified in a property file. If you reuse saved settings,
you can select the settings that you want to apply for importing these
documents. If you reuse settings from a property file, click Browse to
select the file.
- Specify how the system is to read the CSV files. A
preview of the CSV file content helps you configure this information.
Specify:
- The character encoding set
- The column separator character or characters
- The line number where the parser is to begin reading a file
- Whether the first line is to be treated as header information
- Map the columns in the CSV files to index fields so that
users can search the CSV content. Also specify:
- A unique request ID (which is used, for example, to monitor the
import request and generate unique document URIs)
- Whether you want to select the column or columns that uniquely
identify documents or let the system generate a unique identifier
- The column that you want to use as the document date column
- The format to use for date data in the date column or in columns
that are mapped to date index fields. This format is based on the
case sensitive Java SimpleDateFormat class. In the format yyyy-MM-dd,
yyyy represents the year, MM represents the month, and dd represents
the day in the month.
- The format to use for numeric data in columns that are mapped
to decimal parametric index fields. This format is based on the Java
DecimalFormat class.
- The time zone and locale to use for parsing date and decimal data
- The language to use for parsing the imported documents. You can
specify a default language to use and also specify that the parser
is to try to detect the source language.
- Specify whether you want to save your configuration settings
to re-use another time. To save your settings, specify
a descriptive name so that you can select it when you configure CSV
file import settings at another time. You can also save your settings
to a property file, and reload the file when you run the import wizard
again.
- In the Import area, monitor the progress
of the import task. You can see the status of the current
CSV document import task and the CSV document import tasks that are
waiting to be processed.
- Optional: Open the CSV file import history
to see information about CSV document import tasks that have been
processed, such as how many records were read and whether a task completed
successfully or errors occurred.
If you delete an import
task, all documents that were added to the index by the task are deleted
from the index.