Technote (troubleshooting)
Problem(Abstract)
IBM Content Classification cannot extract text from PDF files that are generated by using Vectorworks.
Cause
A newer version of a third-party component in Content Classification (Adobe PDFBox) is required to extract text from Vectorworks PDFs.
Resolving the problem
You can resolve this problem by downloading and installing Adobe PDFBox version 1.7.1:
- Download the pdfbox-app-1.7.1.jar file from the PDFBox download page.
Important: Do not download the pdfbox-1.7.1.jar file. - Put the pdfbox-app-1.7.1.jar file into the Classification_Home\Filters\EmailFilter\Lib directory.
- Update references to the new JAR file.
Windows:
In the Classification_Home\Filters\EmailFilter directory, edit the EFServer.bat and EFDirect.bat files to refer to pdfbox-app-1.7.1.jar instead of pdfbox-app-1.2.1.jar.
AIX, Linux, and Solaris:
In the Classification_Home/Filters/EmailFilter directory, edit the EFServer.sh and EFDirect.sh files to refer to pdfbox-app-1.7.1.jar instead of pdfbox-app-1.2.1.jar.
- Restart all Content Classification servers.
Rate this page:
Average rating
Copyright and trademark information
IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.