Blogs

Solution Architect, Advanced Analytics

In-Database Capabilities: Improve the runtime of Analytic Software

June 14, 2011

Many analysts have a strong preference for commercial analytic workbenches such as SAS or SPSS.  Both packages are widely used, respected by analysts, and each has strong advocates.  The purpose of this article is to point out that analytic users can benefit from the performance and simplicity of Netezza in-database analytics without abandoning their preferred interface.

Let’s start with SAS.  One of the most frequent complaints from IT organizations about SAS users is the propensity for users to require significant amounts of storage space for SAS data sets.  A leading credit card issuer, for example, reports that users have sixty terabytes of SAS files – and the volume is growing rapidly.

But SAS users can store data tables in the Netezza data warehouse appliance and run data preparation steps against those tables using the SAS Pass-Through Facility.  In addition to centralizing storage, reducing data movement and simplifying security, this practice significantly reduces overall runtime. 

While the Pass-Through Facility requires some SQL knowledge, users with the SAS/Access Engine for Netezza deployed can run DATA steps against SAS data sets held in Netezza;  SAS/Access converts the SAS DATA step code to SQL.  To implement this practice, users need only update library statements in SAS program code.

Another SAS feature that is not yet widely used is SAS’ support for in-database execution of Base SAS PROCs.  SAS currently enables in-database execution of FREQ, MEANS, RANK, REPORT, SORT, SUMMARY, and TABULATE in DB2, Oracle and Teradata. Analysts report that SAS expects to expand the number of databases on which this capability is supported in the next dot release.

SAS customers using SAS Enterprise Miner or SAS Model Manager can also benefit from the SAS Scoring Accelerator, which enables SAS users to define a scoring process within a SAS workflow and automate execution.  In the background, SAS handles the task of creating a scoring job from a model, running the job on Netezza and returning job statistics to the user.   SAS Scoring Accelerator is currently supported on DB2, Netezza and Teradata.

SAS users with good programming skills can make calls to Netezza in-database functions by invoking Netezza Analytics code snippets through PROC SQL.  In-database functions are far more efficient for building analytic data sets, data cleansing and enhancement.  Customers who have implemented this approach have observed remarkable improvements in overall runtime: jobs that ran in hours now run in minutes.

SPSS Modeler also offers the capability to work directly with database tables in Netezza; like SAS, it can be configured to minimize storage on the SPSS server.  Modeler also offers Pushback SQL capabilities, which enable the user to perform functions within the Netezza appliance, including table joins, aggregation, selections, sorting, field derivation, field projection and scoring.  While the in-database functional capabilities of the two packages are similar, SPSS accomplishes this entirely within the graphical environment of the Stream canvas. 

As with SAS, SPSS Modeler users can leverage Netezza in-database analytics to build, score and store predictive models, either through custom nodes or out-of-the box integration in Release 14.2.  Again, a key difference between SAS and SPSS is that while SPSS Release 14.2 surfaces Netezza in-database analytics through the graphical user environment, SAS users must have programming and SQL skills.

To summarize, leading commercial software packages like SAS and SPSS already offer the ability to manage files, perform data preparation, build models and run scoring processes entirely within the Netezza appliance.  Users of these tools can significantly improve runtime performance by leveraging these existing capabilities.

To find out more about IBM Netezza’s in-database capabilities, check out this WinterCorp report on Release 6.0