Published on 20-Oct-2011
Validated on 07 Oct 2013
"At IBM, when we find an open source technology that has potential, we experiment with it to understand how to use it to bring the most business value to IBM. For example, IBM InfoSphere BigInsights is a new class of analytics platform based on Hadoop and innovation from IBM. It can store raw data ‘as-is’ and help clients gain rapid insight through large scale analysis." - Sara Weber, Manager, IBM’s CIO Lab Analytics team
How do IBM® employees find and connect with their colleagues?
With over 600,000 names in BluePages, IBM’s employee directory, and over 500,000 queries daily, the average search session takes two minutes. IBM needed a faster, more efficient application.
Using Apache open source technologies, the IBM CIO Lab Analytics team developed a new people-search application that allows flexible queries and returns as many results as possible, as fast as possible. Additional capabilities include quick browsing and photo images.
The new Faces application offers instantaneous response time, saving on average over a minute for each search session—and thousands of hours daily for IBM employees.
With an enterprise population of over 600,000 people worldwide, how do IBM® employees find and connect with their colleagues? For over a decade, IBM BluePages has been the primary source. This high-demand, intranet application provides information on all IBM employees and contractors, including areas of expertise and responsibilities. And with IBM’s focus on innovation and emerging technologies, positive changes are always on the horizon.
“BluePages is one of the most used applications at IBM,” says Sara Weber, manager of IBM’s CIO Lab Analytics team. “At one time, BluePages was state-of-the-art; however, over the years it was not updated to keep up with new advances in Internet technology. With over 500,000 BluePages searches done every day, and with BluePages accessing huge volumes of data, an average search session can take up to two minutes. When multiple results are returned they do not show individual photo images, and incorrect spelling may yield no results. My team was tasked with addressing the question: ‘How can we build a better and faster people search?’”
The goals for this project, aptly named Faces, were to support flexible queries and return as many results as possible, as fast as possible. Results that more closely matched the query would appear first. Additional capabilities would permit quick browsing and photo images.
Applying emerging technologies to deliver innovation
Weber’s CIO Lab Analytics team identifies problems that IBM employees are experiencing and finds ways to apply emerging technologies to develop solutions. “We had to process tremendous amounts of data, and then store it in a way that it could be accessed quickly,” says Weber. “For this project, we selected Apache Hadoop and Apache Voldemort; both are open source technologies. My development team has extensive expertise in using Hadoop technology. The Faces application was developed by two members of our team over a five month period.”
Apache Hadoop allows developers to create distributed applications that run on clusters of computers. Organizations can leverage this infrastructure to handle large data sets, by dividing the data into “chunks” and coordinating the data processing in the distributed, clustered environment. Once the data has been distributed to the cluster, it can be processed in parallel. Apache Voldemort is a distributed key-value storage system that offers fast, reliable and persistent storage and retrieval. Specific keys return specific values. If no additional query power is needed, a key value store is faster than a database.
“At IBM, when we find an open source technology that has potential, we experiment with it to understand how to use it to bring the most business value to IBM,” says Weber. “For example, IBM InfoSphere® BigInsights is a new class of analytics platform based on Hadoop and innovation from IBM. It can store raw data ‘as-is’ and help clients gain rapid insight through large scale analysis.”
For Faces, Hadoop preprocesses data from the IBM Enterprise Directory and Social Networks and sends this information to the Voldemort Person Store (2.2 GB). Voldemort, in turn, sends data to Hadoop processing for the Person ID fetcher, Reports Loader, Query Expander, and Location Expander. These results are saved to Voldemort’s Query Store (5.5 GB). Hadoop also receives images from BluePages that are saved in Voldemort’s image store to remain available for Hadoop’s montage generator.
“We placed all 600,000 names into memory for immediate access,” says Weber. “Preprocessing with Hadoop directly improves performance. Each time you type a letter in a name, results are immediate. We have precomputed the search process to retrieve every employee name that matches what is entered. Every time you type another letter, scoring retrieves people who are more relevant to the search criteria. The information is available and, from a performance perspective, everything is ready to go. Memory and storage are inexpensive and nightly processing takes only a few hours.”
Weber adds, “We run Hadoop on ten, five-year-old IBM BladeCenter® servers. These Blades are low powered, but Hadoop distributes the workload and takes advantage of the hardware to the fullest. If more computation is needed, we can add machines and improve performance without modifying the code.”
Measuring business value
According to Weber, the new Faces application enables employees to receive instantaneous search results. “Conservatively speaking, we are saving on average over a minute for each search session,” says Weber. “Searches are faster and easier. The information is timely and accurate. With over 500,000 searches daily, IBMers are saving thousands of hours each day.”
For IBM employees, the improvement is noticeable. “To gain user acceptance or change user behavior, we know any new solution we create has to be significantly faster and better,” says Weber. “As far as I know, Faces is the fastest growing innovation ever introduced at IBM. In the first two weeks, Faces went from zero to 85,000 users with continued viral growth throughout the entire IBM organization. What used to take minutes now takes milliseconds. We provide a feedback button on all our applications so users can report errors or issues. With Faces, IBMers were using the feedback button to say, ‘Thank you for making my job so much easier.’”
Weber concludes, “We could not have developed Faces without the distributed processing capabilities Hadoop provides. The Faces application has really highlighted the power of Hadoop and has helped us address a major pain point for all IBMers.”
- IBM® BladeCenter® servers
- Apache Hadoop
- Apache Voldemort Key Value Storage System
For more information
To learn more about IBM Information Management solutions, please contact your IBM sales representative or IBM Business Partner, or visit the following website: ibm.com/software/data
To learn more about IBM InfoSphere BigInsights, visit: ibm.com/software/data/infosphere/biginsights
Products and services used
IBM products and services that were used in this case study.
InfoSphere BigInsights Basic Edition
© Copyright IBM Corporation 2011 IBM Corporation Software Group Route 100 Somers, NY 10589 U.S.A. Produced in the United States of America October 2011 All Rights Reserved IBM, the IBM logo, ibm.com, InfoSphere, and BladeCenter are trademarks of International Business Machines Corporation in the United States, other countries or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml Other company, product and service names may be trademarks or service marks of others. References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates.