Skip to main content

Content Classification fails to extract text from Vectorworks PDFs


Technote (troubleshooting)


Problem(Abstract)

IBM Content Classification cannot extract text from PDF files that are generated by using Vectorworks.

Cause

A newer version of a third-party component in Content Classification (Adobe PDFBox) is required to extract text from Vectorworks PDFs.

Resolving the problem

You can resolve this problem by downloading and installing Adobe PDFBox version 1.7.1:

  1. Download the pdfbox-app-1.7.1.jar file from the PDFBox download page.
    Important: Do not download the pdfbox-1.7.1.jar file.
  2. Put the pdfbox-app-1.7.1.jar file into the Classification_Home\Filters\EmailFilter\Lib directory.
  3. Update references to the new JAR file.

    Windows:
    In the Classification_Home\Filters\EmailFilter directory, edit the EFServer.bat and EFDirect.bat files to refer to pdfbox-app-1.7.1.jar instead of  pdfbox-app-1.2.1.jar.

    AIX, Linux, and Solaris:
    In the Classification_Home/Filters/EmailFilter directory, edit the EFServer.sh and EFDirect.sh files to refer to pdfbox-app-1.7.1.jar instead of  pdfbox-app-1.2.1.jar.
  4. Restart all Content Classification servers.

Rate this page:

(0 users)Average rating

Copyright and trademark information

IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.

Rate this page:


(0 users)Average rating

Add comments

Document information

Content Classification


Software version:
8.7, 8.8


Operating system(s):
AIX, Linux, Solaris, Windows


Reference #:
1621261


Modified date:
2013-01-03

Translate my page

Content navigation