DigitalGlobe maps out new opportunities with IBM

Using analytics to deliver high-quality satellite imagery

Published on 16-Nov-2012

"With the IBM SPSS solution, we hope to reduce the amount of images that require manual review by 90 percent." - Chris Padwick, principal scientist, DigitalGlobe

Customer:
DigitalGlobe

Industry:
Computer Services

Deployment country:
United States

Solution:
BA - Business Analytics, BA - Predictive Analytics

Overview

DigitalGlobe is a leading global provider of high-resolution earth imagery products and services, based in Longmont, Colorado. Its customers range from urban planners to US and foreign defense and intelligence agencies, including NASA. Its imagery is also used commercially for navigation technology and web mapping applications, such as Google Maps. The company collects more than two million square kilometers of earth imagery every day from three high-resolution satellites. DigitalGlobe employs approximately 700 people and reported revenues of $322 million in 2010.

Business need:
DigitalGlobe wanted to improve its ability to deliver high-quality satellite images to its clients by finding an efficient way to automate the detection of cloud cover in its imagery. Its existing cloud detection algorithm was highly error-prone, requiring staff to manually check thousands of photos each day.

Solution:
DigitalGlobe used IBM® SPSS® Statistics and IBM SPSS Modeler to build a new machine learning system, called Advanced Cloud Cover Assessment, which performs large-scale, automated identification of clouds in satellite imagery.

Results:
Reduced the rate of false positive classifications for non-cloudy scenes from approximately one in every five images to just one in every 1,000. Greater accuracy should enable the elimination of up to 90 percent of manual reviews, saving many hours of staff time.

Benefits:
Going forward, solutions like ACCA will enable DigitalGlobe to extract more valuable information from raw image data.

Case Study

DigitalGlobe is a leading global provider of high-resolution earth imagery products and services, based in Longmont, Colorado. Its customers range from urban planners to US and foreign defense and intelligence agencies, including NASA. Its imagery is also used commercially for navigation technology and web mapping applications, such as Google Maps. The company collects more than two million square kilometers of earth imagery every day from three high-resolution satellites. DigitalGlobe employs approximately 700 people and reported revenues of $322 million in 2010.

DigitalGlobe’s success depends on its ability to provide its customers with images of the highest quality and accuracy within short timeframes. A key part of maintaining the quality of the company’s imagery production chain involves the detection of cloud cover in satellite images, as customers typically do not want the images they order to be obscured by clouds.

Chris Padwick, principal scientist at DigitalGlobe, explains: “To draw an analogy to a manufacturing process, if the image represents the part that is being manufactured, then pixels in the image that are covered by clouds represent a defect in the part. These defects must be identified, cataloged, and stored in the system on an image-by-image basis to ensure that we deliver the highest quality product to our customers.”

Meeting the challenge of accurate cloud detection
Automating the process of reliably detecting cloudy pixels in earth observation imagery has proven to be a significant challenge. DigitalGlobe’s existing cloud assessment algorithm is based on image thresholding, which marks individual pixels as either cloud or non-cloud depending on their brightness. While it is easy to detect clouds over a dark background such as water, it is much more difficult for the algorithm to determine the extent of cloud cover over naturally brighter areas, such as deserts or snowy landscapes, where there is less of a contrast.

“A simple thresholding approach works fairly well for “easy” scenes, in which there is a large brightness difference between the clouds and the background pixels. However, this approach tends to grossly over-classify the amount of clouds in the image on “harder” scenes, in which the background pixels are the same brightness or brighter than the cloud pixels,” notes Padwick. “For example, if the system has to process an image taken over the Sahara desert, in which the sand is very bright, it might report 100 percent cloud cover when in reality there is only 10 percent. The high rate of false positives means that we have to manually check images for classification accuracy, and manually reclassify the image if the accuracy is poor. As we’re currently collecting approximately 2,000 images per day, the potential workload is significant.”

DigitalGlobe has plans to expand its image collection capacity by launching a new satellite, which will allow it to downlink an additional two to three thousand images every day. Since the manual workload required to quality assure images – and hence the cost of production – increases linearly as the number of images grows, DigitalGlobe wanted to find a way to streamline image processing without sacrificing quality and accuracy.

Harnessing a powerful IBM solution
In its search for a solution, DigitalGlobe’s team of remote imaging scientists conducted an extensive, head-to-head evaluation of a number of statistical analysis applications, including IBM SPSS Statistics and Modeler, TIBCO Spotfire and SAS Analytics.

“Most of the software we tested simply could not handle the amount of information that we needed to process,” remarks Padwick. “SPSS was the only solution we evaluated that could manage these vast data volumes, and it outperformed the competing options by a significant margin. It also had a very intuitive user interface, which made it easy for users to build models even if they weren’t familiar with the software. Based on these two factors, we were convinced that SPSS was the best option for our needs.”

Creating a more accurate algorithm
Using IBM SPSS Statistics and IBM SPSS Modeler, DigitalGlobe developed a new machine learning system called Advanced Cloud Cover Assessment (ACCA), which is designed to minimize the amount of manual review that is required to detect cloud cover in satellite images of Earth.

The ACCA system consists of a set of sophisticated image-processing functions based on a C5 decision tree – an algorithm that effectively classifies each pixel within an image as either cloudy or non-cloudy by comparing it with the millions of pre-classified examples in the training data set. Several million rows of training data are used for each satellite. DigitalGlobe designs the models in IBM SPSS and then exports them to the custom-built ACCA system where they are compiled and deployed.

The main challenge in designing a machine learning algorithm is usually the need to find a rich and accurate training data set, but this was not an issue for DigitalGlobe. Its vast imaging archive contains over 2 billion square kilometers of earth imagery that has already been manually classified for cloud cover. This provides more than enough training data for the ACCA system.

High classification accuracy
Although DigitalGlobe has not yet moved the ACCA system into production, tests have shown impressive gains in image classification accuracy over the company’s existing system, especially regarding the rate of false positives.

Over a test data set consisting of 600 images, comprising a mix of cloudy and non-cloudy images, ACCA reported a true positive rate of 90 percent and an associated false positive rate of just six percent. For the non-cloudy scenes in the data set – which are the most valuable images from the client’s perspective – the ACCA system reported a false positive rate of just 0.1 percent, compared to 22.7 percent for its previous system. DigitalGlobe has also seen the false positive rate for its black and white satellite fall to 4.9 percent, from the previous rate of 71.1 percent.

“The test data that we have seen so far is highly encouraging, as one of our top priorities was to reduce the false positive rate and the subsequent amount of manual checking that we must perform,” states Padwick. “Currently we have to review images manually because our existing algorithm is just not consistent enough. With the IBM SPSS solution, we hope to reduce the amount of images that require manual review by 90 percent.

“Ideally, we would like to be able to trust that the algorithm is accurate and perform spot-checking of a small sample of the images we collect, instead of having to keep our eyes on everything. SPSS will be a huge help in cutting down on the amount of work that goes into evaluating our imagery and allowing us to produce more accurate images more quickly.”

He continues: “Another key aspect of this system is our ability to learn from our mistakes. Once we deploy the system, we still expect to find images that exhibit sub-par classification performance. On a periodic basis, we will evaluate the performance of the system and identify the poorly classified images. We plan to include these images in the training set and retrain and re-deploy the models, which should result in improved performance from the system. Thus, the system will grow ‘smarter’ over time and will reduce the number of images requiring manual review.”

Improving the quality and accuracy of its images will deliver additional benefits to DigitalGlobe’s customers. For example, the company’s imagery has been used in the wake of natural disasters, such as the 2010 earthquake in Haiti and the 2011 earthquake and tsunami in Japan. DigitalGlobe imagery was used to create maps for first responders and insight for government agencies and humanitarian organizations. Improving the accuracy and accelerating the delivery of satellite imagery will further assist in the co-ordination of future relief efforts.

DigitalGlobe also anticipates that producing more accurate images, more quickly will deliver increased value to customers who use the company’s satellite imagery for commercial purposes. “Our images are often purchased by manufacturers of in-car GPS devices, as the most cost-effective way for them to track roads and map out routes is by using satellite imagery,” notes Padwick. “One of the problems they face is change management: they need to know where new construction is happening and if their route maps need to be updated to include new roads. So we, in turn, have to be able to provide them with the most accurate, high quality images, and the new solution will help to ensure that we do just that.”

Moving from images to information
DigitalGlobe anticipates that the new solution will form an instrumental part of the company’s strategy to expand into new areas, and turn the satellite images it takes into valuable information.

“Ultimately, we want to transition from providing pixels to providing information,” comments Padwick. “Our goal is to take the image as it comes down from the satellite, run it through a sophisticated classification engine and receive detailed information about what the image actually contains.

“With such detailed information, we could help customers to answer questions like: ‘What is the rate of urban change in Beijing over the last three years?’, ‘What species of trees comprise a particular forested region, and where are they located?, or ‘What is the depth of water off the shoreline of Doha, Qatar?’. Increasingly intelligent applications for satellite imagery are really starting to come to the forefront of our field, and we expect technology like the IBM SPSS software to be a big part of our plan to exploit these capabilities in the future.”

About IBM Business Analytics
IBM Business Analytics software delivers data-driven insights that help organisations work smarter and outperform their peers. This comprehensive portfolio includes solutions for business intelligence, predictive analytics and decision management, performance management, and risk management.

Business Analytics solutions enable companies to identify and visualise trends and patterns in areas, such as customer analytics, that can have a profound effect on business performance. They can compare scenarios, anticipate potential threats and opportunities, better plan, budget and forecast resources, balance risks against expected returns and work to meet regulatory requirements. By making analytics widely available, organisations can align tactical and strategic decision-making to achieve business goals.

For more information
For further information please visit ibm.com/business-analytics

Request a call
To request a call or to ask a question, go to ibm.com/business-analytics/contactus. An IBM representative will respond to your inquiry within two business days.

Products and services used

IBM products and services that were used in this case study.

Software:
SPSS Statistics, SPSS Modeler

Legal Information

© Copyright IBM Corporation 2012. IBM Corporation, Software Group, Route 100, Somers, NY 10589. Produced in the United States of America. November 2012. IBM, the IBM logo, ibm.com, and SPSS are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. The client examples cited are presented for illustrative purposes only. Actual performance results may vary depending on specific configurations and operating conditions. It is the user’s responsibility to evaluate and verify the operation of any other products or programs with IBM products and programs. THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided.