Published on 28-Mar-2012
"The volume of electronic documents and email messages is growing fast. Unless we determine what must be saved as company records and automate the process, we will not succeed in managing compliance risk." - Director of IT operations, US industrial products company
A US industrial products company
Enterprise Content Management, Information Lifecycle Governance (ILG)
This publicly traded holding company has subsidiaries operating across the manufacturing, specialty retailing and mining industries.
The company needed to reduce the effort and cost of complying with legal requirements for capturing and retrieving electronic records.
The company is experimenting with the natural language processing capability in IBM Classification Module software to categorize and save employees’ electronic records.
As the solution is perfected and expanded to all subsidiaries, the company expects to significantly improve compliance with minimal additional expenditure in data storage and staff time.
Managing the growth and risk of electronic records
The company is no stranger to efficient records management. It has a long history of aggressively pursuing technology solutions for managing physical information assets. As part of its global records initiative, the company has implemented content management software from IBM in phases since 2004 to successfully manage engineering drawings and changes and assembly instructions as well as streamline International Organization for Standardization (ISO) certification, Sarbanes-Oxley Act (SOX) compliance and contract management corporatewide.
Now, the company is extending this same discipline to electronic records retention. First, it began scanning inactive paper records into electronic format and storing them in the same records management system as physical records. The objective going forward is to use the system for all current and active records, both physical and electronic, that must be accessible and maintained for a certain period.
Upping the ante
The need for effective electronic records management increased urgently in 2010 when the company faced a potential legal claim. It was required to save all documents, including email messages, generated by every individual at its largest subsidiary who might be involved in the suit. Even though the case was eventually dropped, the company had to preserve everything until the decision was reached not to sue. The physical records retention system was already in place, but a solution for capturing the necessary electronic records had to be quickly implemented.
During the legal hold period, the company used IBM Content Collector for File Systems and IBM Classification Module software to make sure only relevant records were retained. Without this capability, it would have been virtually impossible to selectively back up correspondence of the small number of people involved, rather than the entire employee population. The cost of archiving all the backups would have been significant, and document retrieval would have been extremely difficult. Worse, it would not have been foolproof because an employee can send an email and then delete it before it is archived.
This experience demonstrated the need for a more strategic approach to managing electronic records and email.
Rethinking the approach to records classification
The company engaged IBM for a pilot program to categorize and save electronic records of employees leaving the company. One challenge is that each employee has his or her own way of filing content on the desktop and in email folders. This flexibility is great for individual productivity but makes a standard keyword search problematic because it would require employees to manually categorize their email messages and documents.
However, the Classification Module software offers a second search capability based on automated content classification as demonstrated by the IBM Watson system on Jeopardy! At its core, the Watson system is built on IBM DeepQA technology for applying natural language processing, information retrieval, knowledge representation and reasoning and machine learning technologies to the field of open-domain question answering. This same technology is available in the Classification Module software.
Inspired by the Watson system performance, the company is experimenting with the technology to simplify the process of capturing unstructured employee information assets. With employee email, spreadsheets, presentations and Microsoft Word software documents, the company wants to teach the Classification Module engine to quickly and accurately determine whether the content is personal, a company record, relevant or irrelevant. Then the company wants the relevant records automatically identified by type and placed in the appropriate storage system with the required retention period and retrieval rules.
Teaching the machine
“We are currently teaching the Classification Module software to continually improve its accuracy in automatic tagging of unstructured content by feeding in already categorized documents,” says the director of IT operations.
The machine learning technology will initially retrieve metadata from the system properties of the documents. It will be designed to ask for additional properties and allowed to make guesses and assign probabilities to the accuracy of those guesses. If the probability assigned to a record is 80 percent or more, the company plans to let the system accept its own classification. If the probability is less than 80 percent, the record will be placed into a separate folder for authorized staff to review.
The objective is to minimize disruption to employees so they can work while records are being captured. “I don’t think the process will ever be totally transparent and unobtrusive for the employee because there will still be instances where manual intervention is necessary to complete records classification,” says the director of operations. “But with the IBM technology, we expect that these instances will decrease over time because the more you feed into the Classification Module software, the more it learns.”
The Classification Module software will integrate with Content Collector for File Systems and IBM Records Manager software for accurate tracking and disposition and rapid access. This integration will help ensure that an electronic record will be retained even if it is deleted, because IBM Content Collector for Email software captures records from the journals. For managing the disposition of in-process electronic records, the organization is considering integrating Microsoft SharePoint with their IBM FileNet® Content Manager software. All inactive records will be stored in FileNet Content Manager and managed by IBM Enterprise Records. In addition, the company plans to deploy IBM eDiscovery software and other tools to enable records searches if necessary for legal or regulatory audit purposes.
Eliminating electronic clutter
Concurrent with adopting technology solutions for electronic records management, the company is also taking steps to decrease the volume it needs to manage. The organization instituted a program at every location to periodically designate a week for employees to go through their email and PC files and delete unneeded messages and documents. As with the similar half-day program for paper records, the company provided decision criteria and a simple flowchart for employees to follow when deciding what to retain and what to delete. As a result of the program, significant disk space is being freed up every year.
Enhancing compliance, reducing costs
As the full-blown electronic records management solution is perfected and expanded to all subsidiaries, the company expects to significantly improve compliance with minimal additional expenditure in data storage and staff time.
For more information
To learn more about the IBM Enterprise Content Management portfolio, contact your IBM sales representative or IBM Business Partner, or visit: ibm.com/software/info/itsolutions/content-management
Products and services used
© Copyright IBM Corporation 2012 IBM Corporation Software Group Route 100 Somers, NY 10589 Produced in the United States of America March 2012 IBM, the IBM logo, FileNet, ibm.com, Lombardi and Teamworks are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml Microsoft, Outlook and SharePoint are trademarks of Microsoft Corporation in the United States, other countries, or both. This document is current as of the initial date of publication and may be changed by IBM at any time. THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided. The client is responsible for ensuring compliance with laws and regulations applicable to it. IBM does not provide legal advice or represent or warrant that its services or products will ensure that the client is in compliance with any law or regulation.