URI formats in the index

The uniform resource identifier (URI) of each document in the index indicates the type of crawler that added the document to the collection.

You can specify URIs or URI patterns when you configure categories, scopes, and quick links for a collection. You also specify the URI when you need to remove documents from the index or view detailed status information about a specific URI.

Search the collection to determine the URIs or URI patterns for a document. You can click the URIs in the search results to retrieve documents that you are interested in. You can copy the URI from the search results to use the URI in the administration console. For example, you can specify a URI pattern to automatically associate documents that match that URI pattern with a quick link.

When you specify a URI or URI pattern, you must specify the URL-encoded format for the URI and ensure that the URI does not contain characters that are not included in the US-ASCII coded character set. For details, see RFC1738, the Internet standard for URLs.

In the following example, you cannot specify the first URI, which contains Hebrew characters. You can, however, specify the second URI, which is the URL-encoded format of the first URI.

Incorrect URI

file:///c:/shared/hebrew/עברית

Correct URI

file:///c:/shared/hebrew/%D7%A2%D7%91%D7%A8%D7%99%D7%AA

Archive files

The URI format for documents that are extracted from an archive file (such as a .zip or .tar file) and then crawled is:

Original_URI(?|&)ArchiveEntry=Entry_Name(&ArchiveEntry=Entry_Name)

Parameters

Original_URI: The location of the archive file on the data source.
Entry_Name: The URL-encoded name of the archive entry in the archive file.

Examples

file:///d:/Archive1.zip
  file:///d:/Archive1.zip?ArchiveEntry=Folder1/PowerPoint.ppt
  file:///d:/Archive1.zip?ArchiveEntry=Folder2/Text.txt

Agent for Windows file systems crawlers

The URI format for documents that are crawled by an Agent for Windows file systems crawler is:

winfs://Host_Name/Drive:/Directory_Path/File_Name

Parameters

URL encoding is applied to all of the fields.

Host_Name: The host name or IP address of the server where the document is located.
Drive: The drive on the server where the document is located.
Directory_Path: The path for a shared directory in the Windows domain.
File_Name: The name of the file.

Example

winfs:////9.187.186.83/c:/temp/test/test2/Copy+%284%29+of+dumpstore_1.txt

BoardReader crawlers

The URI format for documents that are crawled by a BoardReader crawler is as follows:

Replace the protocol of URL with boardreader://
Add the parameter boardreaderid= and the BoardReader ID to the URL
Add the parameter useSSL=true to the URL when the original protocol is https://

URL encoding is applied to all of the fields.

Example 1: URL: http://www.facebook.com/1102412197/posts/10202426006027108
BoardReader ID: 17005669247
URI: boardreader://www.facebook.com/1102412197/posts/10202426006027108?boardreaderid=17005669247
Example 2: URL: http://foursoftpaws.yuku.com/reply/5369/Kjara-Tockica#reply-5369
BoardReader ID: 12702129376
URI: boardreader://foursoftpaws.yuku.com/reply/5369/Kjara-Tockica%23reply-5369?boardreaderid=12702129376
Example 3: URL: https://www.flashback.org/t2284857#p46743985
BoardReader ID: 23427574780
URI: boardreader://www.flashback.org/t2284857%23p46743985?boardreaderid=23427574780&useSSL=true

Case Manager crawlers

The URI format for documents that are crawled by a Case Manager crawler is:

p8ce://host_name:port/object_store/version_series_id/hash_code[/element_number]?protocol=http

Parameters

URL encoding is applied to all of the fields.

host_name

A host name of a server on which the IBM® FileNet® Content Engine runs.

port

A port number on which the Content Engine Web Service runs.

object_store

A name of an object store in which a document is stored.

version_series_id

A unique document identifier. The version series ID is used because the document ID changes as the document is versioned while the version series ID does not change.

hash_code

To make a distinction between folders, a hash code is added to the path for the object URI. In the following example, 7584373 is the hash code of folder path /ObjectStore/CaseSolution/../CaseFolder/SubFolders:

p8ce://9.39.44.204:9080/wsi/FNCEWS40MTOM/ATOSAIX2/{2D09F43F-3392-485E-B338-E67D68F04FA6}.7584373?protocol=http

element_number

An index of content elements. This variable is appended only when a URI points to a document that contains multiple content elements.

protocol

A protocol for accessing the Web Service. Valid values are http or https.

Content Integrator crawlers

The URI format for documents that are crawled by a Content Integrator crawler in server access mode is:

vbr://Server_Name/Repository_System_ID/Repository_Persistent_ID
     /Item_ID/Version_ID
     /Item_Type/?[Page=Page_Number&] JNDI_properties

The URI format for documents that are crawled by a Content Integrator crawler in direct access mode is:

vbr:///Repository_System_ID/Repository_Persistent_ID
     /Item_ID/Version_ID
     /Item_Type/[?Page=Page_Number]

Parameters

URL encoding is applied to all of the fields.

Server_Name

The name of the IBM Content Integrator server.

Repository_System_ID

The system ID for the repository.

Repository_Persistent_ID

The persistent ID for the repository.

Item_ID

The ID for the item.

Version_ID

The ID for the version. If the version ID is blank, this value indicates the latest version of the document.

Item_Type

The type of the item (CONTENT or FOLDER).

Page_Number

The page number.

JNDI_properties

The JNDI properties for the J2EE application client. There are two types of properties:

java.naming.factory.initial: The name of the class for the application server that is used to create the EJB handle.
java.naming.provider.url: The URL to the naming service for the application server that is used to request the EJB handle.

Examples

Documentum:

vbr://vbrsrv.ibm.com/Documentum/c06b/094e827780000302//CONTENT/?
java.naming.provider.url=iiop%3A%2F%2Fmyvbr.ibm.com%3A2809&
java.naming.factory.initial=com.ibm.websphere.naming.WsnInitContextFactory

FileNet PanagonCS:

vbr://vbrsrv.ibm.com/PanagonCS/4a4c/003671066//CONTENT/?Page=1&
java.naming.provider.url=iiop%3A%2F%2Fmyvbr.ibm.com%3A2809&
java.naming.factory.initial=com.ibm.websphere.naming.WsnInitContextFactory

Content Manager crawlers

The URI format for documents that are crawled by a Content Manager crawler is:

cm://Server_Name/Item_Type_Name/PID

Parameters

URL encoding is applied to the PID parameter.

Server_Name: The name of the IBM Content Manager Enterprise Edition library server.
Item_Type_Name: The name of the target item type.
PID: The Content Manager EE persistent identifier.

Example

cm://cmsrvctg/ITEMTYPE1/92+3+ICM8+icmnlsdb12+ITEMTYPE159+26+A1001001A
03F27B94411D1831718+A03F27B+94411D183171+14+1018

DB2 crawlers

The URI format for documents that are crawled by a DB2 crawler is:

db2://Database_Name/Table_Name
     /Unique_Identifier_Column_Name1/Unique_Identifier_Value1
     [/Unique_Identifier_Column_Name2/Unique_Identifier_Value2/...
     /Unique_Identifier_Column_NameN/Unique_Identifier_ValueN]

Parameters:

URL encoding is applied to all of the fields.

Database_Name: The internal name of the database or the alias for the database.
Table_Name: The name of the target table, including the name of the schema.
Unique_Identifier_Column_Name1: The name of the first Unique Identifier column in the table.
Unique_Identifier_Value1: The value of the first Unique Identifier column.
Unique_Identifier_Column_NameN: The name of the nth Unique Identifier column in the table.
Unique_Identifier_ValueN: The value of the nth Unique Identifier column.

Examples

Local, cataloged database:

db2://LOCALDB/SCHEMA1.TABLE1/MODEL/ThinkPadA20

Remote, uncataloged database:

db2://myserver.mycompany.com:50001/REMOTEDB/SCHEMA2.TABLE2/NAME/DAVID

Exchange Server crawlers

Because Watson Content Analytics cannot obtain the URL of attachments through Outlook Web App (OWA), it shows alternate URLs for attached items. Because Exchange Server 2007 supports only the Internet Explorer browser, users can access OWA of Exchange Server 2007 only with that browser.

When users click titles in the results page of the enterprise search application or content analytics miner, the corresponding Exchange Server item is shown through OWA. If the user has MailboxPermission to the mailbox that contains the search results, the user can also open the item through OWA. However, if the user has MailboxFolderPermission or Delegation to the mailbox that contains the search results, the user must access the following URL before clicking the title to access the item, where user's_primarySmtpAddress is the address that the search results originally belong to

https://hostname/OWA/user's_primarySmtpAddress/?cmd=contents

The Exchange Server crawler generates original URIs for crawled documents. The crawler uses IDs for the URI that are unique values among items and attachments. If a document is an item, the URI is formatted as follows:

exchadp://hostname/mailbox_name/itemId=itemId&owa=owaURL

If document is an attachment, URI is formatted as follows:

exchadp://hostname/mailbox_name/attachmentId=attachmentId&owa=owaURL

FileNet P8 crawlers

The URI format for documents that are crawled by a FileNet P8 crawler is:

p8ce://host_name:port/object_store/object_id[/element_number]?protocol=http

Parameters

URL encoding is applied to all of the fields.

host_name: A host name of a server on which the IBM FileNet Content Engine runs.
port: A port number on which the Content Engine Web Service runs.
object_store: A name of an object store in which a document is stored.
object_id: A globally unique identifier (GUID) assigned by the Content Engine to a stored object. A character string that contains 38 characters, the GUID consists of a left curly brace, 8 hexadecimal characters, a dash, 4 hexadecimal characters, a dash, 4 hexadecimal characters, a dash, 4 hexadecimal characters, a dash, 12 hexadecimal characters, and a right curly brace. Braces are encoded by URL encoding rules. For example:
%7B1234abcd-56ef-7a89-9fe8-7d65cd43ba21%7D
element_number: An index of content elements. This variable is appended only when a URI points to a document that contains multiple content elements.
protocol: A protocol for accessing the Web Service. Valid values are http or https.

Example

p8ce://host.filenet.com:9080/STORE1/{1234abcd-56ef-7a89-9fe8-7d65cd43ba21}/2

JDBC database crawlers

The URI format for documents that are crawled by a JDBC database crawler is:

jdbc://DB_URL/Table_Name
      /Unique_Identifier_Column_Name1/Unique_Identifier_Value1
      /[Unique_Identifier_Column_Name2/Unique_Identifier_Value2
     /.../Unique_Identifier_Column_NameN/Unique_Identifier_ValueN]

Parameters

URL encoding is applied to all of the fields.

DB_URL: The URL for the database.
Table_Name: The name of the target table, including the name of the schema.
Unique_Identifier_Column_Name1: The name of the first Unique Identifier column in the table.
Unique_Identifier_Value1: The value of the first Unique Identifier column.
Unique_Identifier_Column_NameN: The name of the nth Unique Identifier column in the table.
Unique_Identifier_ValueN: The value of the nth Unique Identifier column.

Examples:

DB2 database:

jdbc:db2://host01.svl.ibm.com:50000/SAMPLE/DB2INST1.ORG/DEPTNUMB/51

Oracle database:

jdbc:oracle:thin:@/host01.svl.ibm.com:1521:ora/SCOTT.EMP/EMPNO/7934

MS SQL Server 2000 database:

jdbc:microsoft:sqlserver://host01.svl.ibm.com:1433;
DatabaseName=Northwind/dbo.Region/RegionID/100

MS SQL Server 2005 database:

jdbc:sqlserver://host01.svl.ibm.com:1433;
DatabaseName=Northwind/dbo.Region/RegionID/100

Notes crawlers

The URI format for documents that are crawled by a Notes crawler is:

domino://Server_Name[:Port_Number]/Database_Replica_ID/Database_Path_and_Name
     /[View_Universal_ID]/Document_Universal_ID
     [?AttNo=Attachment_Number&AttName=Attachment_File_Name]

Parameters

URL encoding is applied to all of the fields.

Server_Name: The name of the Lotus Notes® server.
Port_Number: The port number for the Lotus Notes server. The port number is optional.
Database_Replica_ID: The identifier for the database replica.
Database_Path_and_Name: The path and file name for the NSF database on the target Lotus Notes server.
View_Universal_ID: The View Universal ID that is defined on the target database. This ID is specified only when the document is selected from a view or folder. If you do not designate a view or folder to crawl (for example if you specify that you want to crawl all documents in a database), the View Universal ID is not specified.
Document_Universal_ID: The Document Universal ID that is defined in the document that is crawled by the crawler.
Attachment_Number: A consecutive number, starting from zero, for each attachment. The attachment number is optional.
Attachment_File_Name: The original name of the attachment file. The attachment file name is optional.

Examples

A document that was selected for crawling by view or folder:

domino://dominosvr.ibm.com/49256D3A000A20DE/Database.nsf/
8178B1C14B1E9B6B8525624F0062FE9F/0205F44FA3F45A9049256DB20042D226

A document that was not selected for crawling by view or folder:

domino://dominosvr.ibm.com/49256D3A000A20DE/Database.nsf//
0205F44FA3F45A9049256DB20042D226

A document attachment:

domino://dominosvr.ibm.com/49256D3A000A20DE/Database.nsf//
0205F44FA3F45A9049256DB20042D226?AttNo=0&AttName=AttachedFile.doc

Quickr for Domino crawlers

The URI format for documents that are crawled by a Quickr for Domino crawler is:

quickplace://Server_Name:Port_Number/Database_Replica_ID/Database_Path_and_Name
/View_Universal_ID/Document_Universal_ID
/?AttNo=Attachment_Number&AttName=Attachment_File_Name

Parameters

URL encoding is applied to all of the fields.

Server_Name: The host name of the Quickr for Domino server.
Port_Number: Optional: The port number for the Quickr for Domino server.
Database_Replica_ID: The identifier for the database replica.
Database_Path_and_Name: The path and file name for the document NSF database on the target Quickr for Domino server.
View_Universal_ID: The View Universal ID that is used to crawl documents.
Document_Universal_ID: The Document Universal ID that is defined in the crawled document.
Attachment_Number: Optional: A consecutive number, starting from zero, for each attachment.
Attachment_File_Name: Optional: The original name of the attachment file.

Examples

A document:

quickplace://ltwsvr.ibm.com/49257043000214B3/QuickPlace%5Csampleplace
%5CPageLibrary4925704300021490.nsf
/A7986FD2A9CD47090525670800167225
/2B02B1DE3A82B2CE49257043001C2498

A page attachment:

quickplace://ltwsvr.ibm.com/49257043000214B3/QuickPlace%5Csampleplace
%5CPageLibrary4925704300021490.nsf
/A7986FD2A9CD47090525670800167225
/2B02B1DE3A82B2CE49257043001C2498
?AttNo=0&AttName==QPCons3.ppt

Seed list crawlers

The URI format for documents that are crawled by a Seed list crawler is:

seedlist://Page_URL?pageID=Page_ID[&useSSL;=true]

Parameters

URL encoding is applied to all of the fields.

Page_URL: The URL for the document (unique for each document).
Page_ID: The object identifier for the document.
useSSL: When the protocol is HTTPS, &useSSL;=true is added to the URI. Otherwise, useSSL is omitted.

Example

HTTPS protocol:

seedlist://quickrserver.ibm.com:10035/lotus/mypoc?uri=dm:bec6090046f1cd5
2bc5cfcb06e9f4550&verb;=view&pageID;=NlFSZURlMkJQNjZSMDZQMUMwM1FPNjZCQzY
2SUw2SUhPNk1RQ0M2Uk80Nk9PNjVCRUM2UUs2TDFDMA==&useSSL;=true

SharePoint crawlers

The SharePoint crawler does not generate its own format of document URI. It creates an accessible URL for the document URI. The accessible URL can be changed according to the Site and Form configuration of the SharePoint server. The crawler tries to retrieve the display form URL and append the document ID to it. If the crawler is configured to retrieve a URL from a specific field, the crawler tries to use the field value as the URI. This format is useful for crawling lists that do not generate URLs based on the primary key value. The default format is:

http://server/display_form_path?primary_key_field name=primary_key_value

Parameters

URL encoding is applied to all of the fields.

server
display_form_path
primary_key_field name
primary_key_value

Example

https://sharepoint.example.ibm.com:9999/rootDir/Shared%20Documents/
Forms/DispForm.aspx?ID=5

UNIX file system crawlers

The URI format for documents that are crawled by a UNIX file system crawler is:

file:///Directory_Name/File_Name

Parameters

URL encoding is applied to all of the fields.

Directory_Name: The absolute path name for the directory.
File_Name: The name of the file.

Example

file:///home/user/test.doc

Windows file system crawlers

The URI formats for documents that are crawled by a Windows file system crawler are:

file:///Directory_Name/File_Name
file:////Network_Folder_Name/Directory_Name/File_Name

Parameters

URL encoding is applied to all of the fields.

Directory_Name: The absolute path name for the directory.
File_Name: The name of the file.
Network_Folder_Name: For documents on remote servers only, the name of the shared folder on a Windows network.

Examples

Local file system:

file:///d:/directory/test.doc

Network file system:

file:////filesvr.ibm.com/directory/file.doc