Published on 30-Jun-2012
"If you have a 24-hour turnaround, you can’t meet real-time client demands. Companies have to remain agile." - Brad Terrell, Vice President and General Manager, Netezza and Big Data Platforms, IBM
A leading online advertising network
Media & Entertainment
An online advertising network was looking for a new method to determine price and volume for reservation-based ads. The company decided to leverage the price/volume curve, which is essentially a snapshot of the company’s bidding environment. Price/volume curves require processing large amounts of data. The network may leverage 35 to 280 terabytes when analyzing data for its price/volume curves. The online advertising network needed a fast way to analyze this data to help clients bid on ad impressions, accommodate changes and support future growth.
An online advertising network needed a faster, better way to determine price and volume, help clients bid on ad impressions, accommodate changes and support future growth.
The company deployed the IBM Netezza® data warehouse appliance with IBM Netezza Analytics and Boolean expression algorithms.
Price/volume curves can be created in 1-5 minutes (versus 1 hour) and clients can quickly change campaigns or start new ones; only two people are needed to run the IBM Netezza appliance resulting in a low Total Cost of Ownership (TCO).
It can be astonishing when an ad for a product that you’re actually interested in appears on the website you’re browsing. But it makes sense if you understand the complex math used to match online advertisements with their relevant audience, as the R&D team of a leading online advertising network can attest.
The online advertising network helps clients bid on ad impressions across the web effectively: It reaches 86 percent of all U.S. Internet users. This leadership is based on three capabilities: targeting, effective bidding and performance. And these capabilities result from abstract science and technology. The science? Boolean logic. The underlying technology? The IBM Netezza data warehouse and analytics appliance.
The challenge: determining price and volume
This evolution started a few years ago when the online ad network asked its R&D staff to develop a method to determine price and volume for reservation-based ads. “Conceptually, it was pretty simple,” says Brad Terrell, vice president and general manager for Netezza and Big Data Platforms at IBM. “But there are many overlapping ways of targeting for a given web space. That’s where the underlying analysis gets complex.”
Examining a typical campaign to-do list reveals just how complex it is to effectively target online ads.
Step 1: Set objectives
Because different advertisers seek different results, there are three online pricing models from which to choose: CPM (cost per thousand impressions), CPC (cost per click) and CPA (cost per action).
“Some advertisers want people to go to their website, others want them to buy something or sign up for their list,” says Terrell. “The online ad network takes the risk to deliver those results.”
Step 2: Find the right market
The online advertising network consults with its client, the advertiser, to determine the optimal targeting expression―that is, the websites, geography and demographics it wants to reach. Then the ad network examines network impressions to determine which ones are eligible. Each impression record includes prices and the variables listed above.
The company has developed some powerful in-house tools to facilitate this process. One is an algorithm that has the ability to dynamically evaluate an arbitrary Boolean expression that simultaneously matches multiple targets. Another is a scheduler built by the online advertising network that determines which digital media ads to post and where.
Step 3: Bid
At this point, another tool of the online advertising network comes into play: a price/volume curve, which is essentially a snapshot of the network’s bidding environment. It tells the advertiser:
- How to bid when entering a market
- How to adjust bids when objectives change
- How to determine the most profitable bid
- Where to find the sweet spots
How is all this done? The regression-style algorithm of the ad network generates the curve, based on sample user behavior data, in a CSV (comma separated values) file, and from there charts or dashboards can be created for the client. The price/volume curve ignores data that fail to meet the target criteria, and it is relatively easy to use.
In a typical CPM campaign:
- X axis = prices
- Y axis = available volume expected at that price
Here’s an example of what a price/volume curve looks like:
And this is how you might use price/volume curves to find the optimal bid for your advertisement in two different markets:
But these curves are not so easy to build without the proper underlying technology, which requires handling:
- Large amounts of data
- Complex market targeting
- Time pressures
The price/volume curve is also determined by a set of histograms—rectangular statistical sets that explore different scenarios:
- Raw histogram—a set of buckets, including price and volume; each bar of the histogram represents the number of impressions between two price points
- Frequency cap histogram—three-dimensional view showing the number of users who got impressions at a certain price
- Event rate histogram—tells the “what-ifs” of the possible conversion rate
Here’s an example of what a frequency cap histogram looks like:
What happens if the advertiser isn’t satisfied with these scenarios? It has to change the targeting expression and produce more price/volume curves.
Step 4: The frequency factor
Most advertisers don’t want to hit unique viewers more than once per day, so they track users, assessing when they visited and when the next visit is expected. From this, they set a frequency cap. Frequency caps are designed to:
- Limit the number of times a unique viewer can see an advertisement
- Reduce the number of ineligible impressions on a given target
Once the cap has been determined, what happens if the advertiser isn’t satisfied with these scenarios? It has to change the targeting expression and produce more price/volume curves.
The sheer volume of data makes the work that the online advertising network does seem daunting. Its ad servers generate between 5 TB and 10 TB of data per day. And the company may leverage 35 to 280 TB when analyzing data for its price/volume curves.
Prior to using the IBM Netezza data warehouse appliance, the digital media company used a Hadoop cluster, starting with 30 nodes and eventually moving to 180 nodes. But it wasn’t fast enough—the turnaround to generate a new set of price/volume curves on Hadoop was one day, but customers increasingly wanted to optimize their campaigns on the fly, multiple times in a given day, requiring near real time access to live data.
The online advertising network examined technology from Aster Data—now owned by Teradata Corporation—and the IBM Netezza appliance to facilitate faster data loads and analysis. Other groups within the company had already been using the IBM Netezza data warehouse appliance, which demonstrated it could provide the performance and scalability the company needed without requiring much administrative support. And while Hadoop and Aster Data can handle massive data volumes, they don’t provide the flexibility and simplicity that the IBM Netezza data warehouse appliance offers. Hadoop and Aster Data cannot exclude data that isn’t needed for particular analyses; they broadcast all data across all nodes. The IBM Netezza data warehouse appliance, on the other hand, allows the company to control the distribution of data by user attribute and/or by time using its zone maps. The ability of IBM Netezza Analytics to support user defined functions (UDFs) and in-database analytics was another key selling factor.
The IBM Netezza data warehouse appliance is a purpose-built system for digital media companies that architecturally integrates database, server and storage into a single, easy to manage unit. The appliance is designed for simple, rapid analysis of data volumes scaling into the petabytes.
Faster, more accurate results
Upon purchasing the IBM Netezza data warehouse appliance, the online advertising network team began testing a prototype. The organization immediately saw performance gains. Previously, it took 60 to 90 minutes to load a batch of targeting expressions into the system; the process takes five minutes on the IBM Netezza data warehouse appliance.
“The company’s first reaction was, ‘This isn’t right,’” says Terrell. “But by the second or third time, they were saying ‘It is right!’”
The result of the appliance’s performance: the online ad network can generate more curves in less time with greater accuracy, and it can meet last-minute demands from customers.
As was noted above, the online ad network uses sample data to determine price/volume curves. Initially, the curves were driven based on one-in-one-thousand sampling. After refining its in-house technology, the company was able to sample larger sets, using a 1:100 ratio. And since deploying the IBM Netezza data warehouse appliance, it samples 1 in 10. Once the IBM Netezza platform is fully optimized, the company expects the ratio will be one in one. This drives near 100 percent accuracy.
It took more than a day for the company’s legacy technology to generate a single price/volume curve. Hadoop reduced that to less than a day. The IBM Netezza data warehouse appliance does it in one to five minutes.
With IBM, the company can also create multiple curves with a single table scan, and it can perform fast matching of complex targets. This speed comes in handy, for example, when a business manager wants to make a change to a campaign hours before it is supposed to start. “If you have a 24-hour turnaround, you can’t meet real-time client demands,” says Terrell. “Companies have to remain agile.”
Having near real-time access to campaign data also allows business managers to negotiate better prices and start campaigns earlier, resulting in fewer missed opportunities. Clients are pleased because the results match the projections. And the company generates higher ROI with greater accuracy and precision; the company gets paid for meeting the number of clicks, conversions and impressions it has estimated but not for exceeding that number.
“The IBM Netezza appliance is critical to the company’s operations,” notes Terrell. “They found that no other system was mature enough to give them what the IBM Netezza data warehouse appliance delivers.”
Benefits beyond performance: simplicity & scale
While improving the performance and precision of the campaigns it runs, and capturing much greater volumes of data than it could before, the online ad network has only two people managing the IBM Netezza appliance. The system is simple to manage and offers low total cost of ownership (TCO).
The company is also confident in the IBM Netezza platform’s ability to accommodate future growth as volumes and analytic requirements continue to increase, especially during peak business times, such as at the beginning or end of the month when clients are most eager to tweak their campaigns for maximum result.
Indeed, the online ad network expects volume to grow several times— to greater than 25 terabytes of data —when it starts sampling on a one-to-one basis. “The company needs big data,” says Terrell. “They need speed and they need control. The IBM Netezza data warehouse appliance gives them all three.”
About IBM Netezza data warehouse appliances
IBM Netezza data warehouse appliances revolutionized data warehousing and advanced analytics by integrating database, server and storage into a single, easy-to-manage appliance that requires minimal set-up and ongoing administration while producing faster and more consistent analytic performance. The IBM Netezza data warehouse appliance family simplifies business analytics dramatically by consolidating all analytic activity in the appliance, right where the data resides, for industry-leading performance. Visit: ibm.com/software/data/netezza to see how our family of data warehouse appliances eliminates complexity at every step and helps you drive true business value for your organization. For the latest data warehouse and advanced analytics blogs, videos and more, please visit: thinking.netezza.com
About IBM Data Warehousing and Analytics Solutions
IBM provides the broadest and most comprehensive portfolio of data warehousing, information management and business analytic software, hardware and solutions to help customers maximize the value of their information assets and discover new insights to make better and faster decisions and optimize their business outcomes.
For more information
To learn more about the IBM Data Warehousing and Analytics Solutions, please contact your IBM representative or IBM Business Partner, or visit the following website: ibm.com/software/data/netezza
To increase the business value of your IBM data warehouse appliance, participate in an on-line community. Join the IBM Netezza community .
Additionally, IBM Global Financing can help you acquire the software capabilities that your business needs in the most cost-effective and strategic way possible. We'll partner with credit-qualified clients to customize a financing solution to suit your business and development goals, enable effective cash management, and improve your total cost of ownership. Fund your critical IT investment and propel your business forward with IBM Global Financing. For more information, visit: ibm.com/financing
Products and services used
IBM products and services that were used in this case study.
IBM Netezza Analytics, IBM Netezza Performance Server
© Copyright IBM Corporation 2012 IBM Corporation Software Group Route 100 Somers, NY 10589 Produced in the United States of America June 2012 IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at ibm.com/legal/copytrade.shtml Netezza is a registered trademark of IBM International Group B.V., an IBM Company. This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates. The performance data discussed herein is presented as derived under specific operating conditions. Actual results may vary.THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided.