Adding an Object Set for Stream File Data

The stored procedure is in the DB2® schema to add an object set for stream file data.

ADD_IFS_STMF_OBJECT_SET

Authorization

This stored procedure is created with public authority *EXCLUDE and is owned by the creator of the text search collection.

The procedure will adopt the authority of the text search collection owner's profile. Authority can be granted to other users to allow them to execute the procedure.

Syntax

This procedure allows a user to add an object set of stream files (STMF) in the Integrated File System (IFS).

Add an object set for stream file data (stream files in IFS):

Read syntax diagramSkip visual syntax diagramADD_IFS_STMF_OBJECT_SET (stmf_expression_string,output_set_id)

The schema qualifier is the name of the text search collection.

Parameters

stmf_expression_string
This parameter contains an absolute path to a directory containing the files that will be indexed.

This must be a valid directory (type *DIR) on a file system that is accessible. Stream file objects (type *STMF) within this directory will be indexed. The path name should be absolute and should not contain any regular expressions.

The data type for this parameter is VARCHAR(32000)

Stream files contained within the specified directory are indexed.
  • Symbolic links are NOT followed
  • Sub-directories are NOT processed
  • Path names must not be delimited, characters such as '*', '?', etc do not have any special meaning and should not be escaped.
  • Path names may or may not be case sensitive, depending on the attribute of the file system.

A check will be performed when adding the object set to verify that a duplicate set does not already exist in the text search collection. This check does not consider equivalent paths to be duplicate.

In other words the following paths could all represent the same directory, but will be considered unique object sets, furthermore, the objects in those sets will be indexed multiple times as unique objects.
/dir1/DIR2
/dir1//DIR2//
/DIR1/DIR2/  (if file system is case insensitive)
/dir1/DIR2/../DIR2
etc.
output_set_id
Output Integer value that returns the set id for the object set that was added. This value can be used to remove the object set at a later time.

This parameter is optional.

The data type for this parameter is INTEGER.

Special Considerations for Update Processing

Non-existent file systems:

If a directory cannot be located during an update operation, the files associated with that directory will not be removed from the index. This avoids unnecessary re-indexing of documents when a files system is unmounted and later remounted.

If these files need to be removed from the index, several options exist:
  • Issue the remove object set stored procedure against the IFS Stream file object set. This will remove any documents associated with the object set.
  • Issue the REPRIME stored procedure against the collection. All data will be removed from the index, and only files that can be located will be reindexed.
  • Create the directory as an empty directory and issue the update

CCSID Conversion

If the collection's FORMAT is TEXT:
  • The CCSID attribute of the file is used to convert the file's extracted data to UTF-8 for indexing. The CCSID attribute of the file must be correct in order for the file to be correctly indexed.
If the collection's FORMAT is INSO:
  • The data from the file will be extracted from the file and sent to the text search server for processing. No character set conversion will occur, and the CCSID attribute of the file will be ignored. The text search server will use its rich text processing to determine the format and encoding of the document. This can be used to index rich text (such as PDF) files, or ordinary text files. For some plain text documents, it may not be possible for the text search server to determine encoding of the document with enough confidence to index the data. This is more likely for very small documents, but can occur for larger documents that use a wide range of characters as well. If the format and encoding of the file cannot be determined, the file will not be indexed and a document error will be logged.

Authorities to Indexed Objects

When adding an IFS stream file object set, consider the authority requirements to read the stream files carefully. Adopted authorities are not honored when accessing the stream file's data. In addition, scheduled updates run under the user profile that owns the index. See the update stored procedure documentation for information on the authority requirements to indexed objects.

ADD_IFS_STMF_OBJECT_SET_WITH_SUBDIR

The syntax and authority requirement of this stored procedure are similar as ADD_IFS_STMF_OBJECT_SET. With this stored procedure, user can add a directory as an object set to the collection. All the files and subdirectories under this directory will be indexed recursively.

Example

Add an object set to MYCOLLECTION to index all stream files in an IFS directory '/home/ntl/stmf':
> CALL MYCOLLECTION.ADD_IFS_STMF_OBJECT_SET('/home/ntl/stmf');
Add an object set to MYCOLLECTION to index all stream files and subdirectories in an IFS directory '/home/ntl/stmf':
> CALL MYCOLLECTION.ADD_IFS_STMF_OBJECT_SET_WITH_SUBDIR('/home/ntl/stmf');

To add IFS path to a collection from IBM® Navigator for i, follow these steps.

  1. From IBM Navigator for i, expand IBM i Management > System > All Tasks.
  2. On the right panel, select System > OmniFind > Collection List.
  3. Right click the collection and select Properties. On the Object tab, press Add IFS Path button. Choose Include sub-directories to add all sub directories under the specified IFS path.