Attempts to run a Portal Search crawler result in "EJPJO0046E" errors when SPNEGO is enabled in an IBM WebSphere Portal environment. The crawler fails to authenticate, and no documents can be added to the collection.
After defining a new Portal or WCM content source for a search collection, or when trying to to run the crawler, it fails with this error message:
PortalCollect E com.ibm.hrl.portlets.WsPse.PortalCollectionsService checkCrawler EJPJO0046E: Failed to connect to content source <b>Portal Content Source</b>. Either a wrong URL is defined, the content source's authentication info is incorrect, or the site is blocked by robot.txt.
If you have SPNEGO enabled, then you will need to define a filter to allow the crawler to bypass any SPNEGO challenge, so that the crawler can authenticate using BASIC AUTH instead.
Portal 6.1.x or 7.0.0.x on WebSphere Application Server 7.0.0.x
Diagnosing the problem
Start the crawler, search the SystemOut.log files for the following error:
[13:14:19:406 GMT] 0000008d PortalCollect E com.ibm.hrl.portlets.WsPse.PortalCollectionsService checkCrawler EJPJO0046E: Failed to connect to content source <b>Portal Content Source</b>. Either a wrong URL is defined, the content source's authentication info is incorrect, or the site is blocked by robot.txt.
Resolving the problem
You will need to define a SPNEGO Web Authentication Filter as defined in this InfoCenter page:
You will need to identify the HTTP request header for the crawler, and when you start the crawler you should see the following messages in the SystemOut.log:
"WARNING: Unknown User Browser to WCL DeviceContext. Dump UserAgent: javacrawler/1.1"
Taking this into account, you should set the filter in the following way:
1) Open the Administrative Console for the server
2) Click Security > Global Security
3) From Authentication, expand Web and SIP Security
4) Click SPNEGO Web Authentication
5) Under SPNEGO filters, click New or select an existing one to edit
6) for the value, set the following
7) Save and exit
Rate this page:
Copyright and trademark information
IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.