Content Classification fails to extract text from Vectorworks PDFs

Technote (troubleshooting)


Problem(Abstract)

IBM Content Classification cannot extract text from PDF files that are generated by using Vectorworks.

Cause

A newer version of a third-party component in Content Classification (Adobe PDFBox) is required to extract text from Vectorworks PDFs.

Resolving the problem

You can resolve this problem by downloading and installing Adobe PDFBox version 1.7.1:

  1. Download the pdfbox-app-1.7.1.jar file from the PDFBox download page.
    Important: Do not download the pdfbox-1.7.1.jar file.
  2. Put the pdfbox-app-1.7.1.jar file into the Classification_Home\Filters\EmailFilter\Lib directory.
  3. Update references to the new JAR file.

    Windows:
    In the Classification_Home\Filters\EmailFilter directory, edit the EFServer.bat and EFDirect.bat files to refer to pdfbox-app-1.7.1.jar instead of  pdfbox-app-1.2.1.jar.

    AIX, Linux, and Solaris:
    In the Classification_Home/Filters/EmailFilter directory, edit the EFServer.sh and EFDirect.sh files to refer to pdfbox-app-1.7.1.jar instead of  pdfbox-app-1.2.1.jar.
  4. Restart all Content Classification servers.

Rate this page:

(0 users)Average rating

Add comments

Document information


More support for:

Content Classification

Software version:

8.7, 8.8

Operating system(s):

AIX, Linux, Solaris, Windows

Reference #:

1621261

Modified date:

2013-07-25

Translate my page

Machine Translation

Content navigation