Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Home
Digital Library
Events
RSS | Newsletters
Webcasts



Teradata 13 Focuses on Advanced Analytic Performance | Intelligent Enterprise Blog
Data Frontiers, by Curt Monash
Curt Monash runs Monash Research, which provides strategic, analysis-based advice to users and vendors of advanced information technology. He also writes the blogs DBMS2, Text Technologies, and Strategic Messaging.
See More by Curt Monash

E-MAIL | Follow Us on Twitter FOLLOW US
Share
Teradata 13 Focuses on Advanced Analytic Performance

Posted by Curt Monash
Monday, August 3, 2009
3:28 PM

Last October I wrote about the Teradata 13 release of Teradata's database management software. Teradata 13, which will be used across the various Teradata product lines, has now been announced for GCA (General Customer Availability)*. So far as I can tell, there were two main points of emphasis for Teradata 13:

  • Performance (of course, performance is a point of emphasis for almost any release of any analytic DBMS product), especially but not only in the areas of aggregates, ETL (Extract/Transform/Load), and UDFs.
  • UDFs (User Defined Functions), especially but not only in the areas of data mining and geospatial analysis.

To put it even more concisely, the focus of Teradata 13 is on advanced analytic performance, although there of course are some enhancements in simple query performance and in analytic functionality as well.

* Teradata development chief Scott Gnau said a couple of customers have already received Teradata 13, although this was recent enough that presumably nobody has it in production. But let's not take all that too literally, since -- for example -- I heard nothing about the length or breadth of the beta cycle.

As just one example, when I asked Scott what was different between Teradata 13 as it is shipping now vs. Teradata 13 as it was foreshadowed back in October, he cited:

  • Improved performance
  • Additional "content," including:
  • Faster loading (sounds like an aspect of performance to me)
  • In-database data mining initiatives (these fit in both the "UDF" and "performance" buckets).


But the parts of Teradata 13 that Scott already discussed back in October 2008 largely boil down to performance and/or UDFs as well.

Scott also foreshadowed an area of emphasis for future Teradata releases -- temporal data analysis. Teradata 13 offers a new PERIOD datatype, which Scott thinks is a "sleeper" on its own for the value customers will find in it. And Scott made it clear that Teradata plans much more functionality for temporal data analysis in the future.

As I understand it, PERIOD works like this: Suppose you have a table that maintains, say, address or employment status. When you update it, you naturally create a Start_Date and End_Date for the validity of certain information. Teradata's PERIOD datatype automagically uses this to maintain a Period where information was true, even when that period is wholly in the past. Thus, when you update a row with new information, you wind up with two rows -- the newly changed row, and also a second row with the old information and an effectiveness period for same.

Note: I have no further detail about Teradata's PERIOD datatype at this time. Even what I said includes enough guesswork that there are probably at least small errors in it.

The Teradata 13 UDF, in-database data mining, and SAS integration stories seem to go something like this:


  • Teradata offered UDF support in C before Teradata 13. With Teradata 13 it supports Java UDFs as well.
  • Any Teradata UDF is automatically parallel, running across all nodes, etc.
  • Teradata 13 cleans up a variety of UDF issues, including:
  • Allowing the use of UDFs in certain aggregates that didn't support them before.
  • Recursion, whatever that means in this context. (Perhaps the prior point is a hint.)
  • Extended memory management/making more memory available to UDFs.

  • Teradata's work to enhance SAS integration has been focused on its general UDF framework. The memory management extensions seem to have particularly important to running SAS. (Note: That link refers to putting SAS on a "single node" in a Teradata grid. Scott gave me the impression that no such thing was possible. So I'm a bit confused. I'm also not sure it matters much.)
  • Teradata expects this same general UDF framework to support integration with a variety of analytic technologies. But the only examples actually discussed were SAS and geospatial.
  • Actually, we didn't really discuss geospatial much either, so I'll just refer you back to my October, 2008 post (already linked above) about Teradata's geospatial datatype.

Besides UDFs, the other performance focus in Teradata 13 seems to be aggregations and OLAP. One Teradata 13 performance boost lies in aggressive query rewriting. Business intelligence tools, written to support multiple analytic DBMS (including non-current versions), can produce very messy SQL queries. Teradata 13 takes an optimizing compiler mindset to those, and in some cases can get significant speedup as a result. I get the impression there was work on other OLAP and aggregation speed-ups as well.

Also, Teradata 13 added a feature for load performance that Scott cites as being useful in the cases of heavy ETL (actually, it sounded more like ELT — Extract/Load/Transform) and OLAP aggregate-building. Namely, for the first time Teradata lets you turn off hash distribution. Teradata still wants you to hash-distribute whatever you're going to persist to disk. But if you're just creating a temporary table that will be dropped as soon as the load process completes, you're now allowed to skip the hash distribution step. Scott says this can lead to >30% improvements in load performance.



E-MAIL | Follow Us on Twitter FOLLOW US
Share




This is a public forum. United Business Media and its affiliates are not responsible for and do not control what is posted herein. United Business Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.

Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of United Business Media LLC and may be edited and republished in print or electronic format as outlined in United Business Media's Terms of Service.

Important Note: This comment area is NOT intended for commercial messages or solicitations of business.


 




    Subscribe to RSS feed of all blogs