Intelligent Enterprise | Data Frontiers by Curt Monash http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/ Copyright 2010 Tue, 09 Feb 2010 16:39:37 -0500 http://www.movabletype.org/?v=3.14 http://blogs.law.harvard.edu/tech/rss Quick Thoughts on Sybase/Aleri Sybase announced an asset purchase that amounts to a takeover of CEP (Complex Event Processing) vendor Aleri. Perhaps not coincidentally, Sybase already had technology under the hood from Aleri predecessor/acquiree Coral8, for financial services uses (notwithstanding that between Aleri Classic and Coral8, Aleri Classic was the one of the two more focused on financial services). Quick reactions include:

]]>
http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2010/02/quick_thoughts.html /blog/archives/2010/02/quick_thoughts.html Business Intelligence Thu, 04 Feb 2010 15:44:05 -0500
Database Snooping Threatens Liberty - And We're All Making Matters Worse Every year or two, I get back on my soapbox to say:
  • Database and analytic technology, as they evolve, will pose tremendous danger to individual liberties.
  • We in the industry who are creating this problem also have a duty to help fix it.
  • Technological solutions alone won't suffice. Legal changes are needed.
  • The core of the needed legal changes are tight restrictions on governmental use of data, because relying on restrictions about data acquisition and retention clearly won't suffice.
But this time I don't plan to be so quick to shut up.

]]>
http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2010/02/database_snoopi.html /blog/archives/2010/02/database_snoopi.html Information Management Tue, 02 Feb 2010 14:30:56 -0500
Netezza Skimmer Joins the Short List As I previously complained, last week wasn't a very convenient time for me to have briefings. So when Netezza emailed to say it would release its new entry-level Skimmer appliance this week, while I asked for and got a Friday afternoon briefing, I kept it quick and basic.

That said, highlights of my Netezza Skimmer briefing included:

]]>
http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2010/01/netezza_skimmer.html /blog/archives/2010/01/netezza_skimmer.html Information Management Wed, 27 Jan 2010 08:14:27 -0500
Two Cornerstones of Oracle's Database Hardware Strategy After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:
  • Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being "bumped off" if they don't get it right.

  • Juan believes the "bulk" of Oracle's business will move over to Exadata-like technology over the next five to ten years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise's many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.

]]>
http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2010/01/two_cornerstone.html /blog/archives/2010/01/two_cornerstone.html Information Management Fri, 22 Jan 2010 15:41:49 -0500
Oracle Lifts Cloud Over MySQL Storage Engine Vendors Earlier this month, Oracle put out a press release promising to play nicely with MySQL if its Sun takeover is approved. The parts in italics below are quotes. My comments are in plain text.

1. Continued Availability of Storage Engine APIs. Oracle shall maintain and periodically enhance MySQL's Pluggable Storage Engine Architecture to allow users the flexibility to choose from a portfolio of native and third party supplied storage engines.

MySQL's Pluggable Storage Engine Architecture shall mean MySQL's current practice of using, publicly-available, documented application programming interfaces to allow storage engine vendors to "plug" into the MySQL database server. Documentation shall be consistent with the documentation currently provided by Sun.

Well, duh.

]]>
http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/12/oracle_lifts_cl.html /blog/archives/2009/12/oracle_lifts_cl.html Information Management Tue, 29 Dec 2009 12:50:30 -0500
Reports of Perfectly-Balanced Hardware Configurations are Greatly Exaggerated Data warehouse appliance and software appliance vendors like to claim that they've worked out just the right hardware configuration(s), and that a single configuration is correct for a fairly broad range of workloads. But there are a lot of reasons to be dubious about that. Specific vendor evidence includes:

    ]]> http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/11/reports_of_perf.html /blog/archives/2009/11/reports_of_perf.html Information Management Tue, 24 Nov 2009 09:26:41 -0500 Teradata's Hardware Strategy and Tactics In my opinion, the most important takeaways about Teradata's hardware strategy from the Teradata Partners conference last week are:
    • Teradata's future lies in solid-state memory. That's in line with what Carson Schmidt told me six months ago.
    • To Teradata's surprise, the solid-state future is imminent. Teradata is 6-9 months further along with solid-state drives (SSD) than it thought a year ago it would be at this point.
    • Short-term, Teradata is going to increase the number of appliance kinds it sells. I didn't actually get details on anything but the new SSD-based Blurr, but it seems there will be others as well.
    • Teradata's eventual future is to mix and match parts (especially different kinds of storage) in a more modular product line. Teradata Virtual Storage is of pretty limited value otherwise. I believe Teradata will go modular more emphatically than Teradata itself does, because I think doing so will meet users needs more effectively than if Teradata relies strictly on fixed appliance configurations.

    ]]>
    http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/10/teradatas_hardw.html /blog/archives/2009/10/teradatas_hardw.html Information Management Tue, 27 Oct 2009 09:06:42 -0500
    This Week at the Teradata Partners User Conference Here are some highlights of what's going on, although names, dates, and details will have to await conversations and press releases this week.

    • Teradata is productizing "private cloud," under names including "Teradata Enterprise Analytics Cloud," "Teradata Agile Analytics Cloud," and "Teradata Elastic Mart Builder." I.e., Teradata hopes to leapfrog Greenplum in its "Enterprise Data Cloud" strategy. This is only fair, in that Greenplum lifted the idea from Teradata and eBay in the first place. It also provides major support for what I think is an extremely sensible trend. Give or take issues of who announces and ships what a couple months before or after a competitor, my early thinking is that the main differences between Greenplum and Teradata in this regard will be:

        ]]> http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/10/this_week_at_th.html /blog/archives/2009/10/this_week_at_th.html Information Management Tue, 20 Oct 2009 11:18:09 -0500 Oracle Exadata 2 Capacity Pricing Revealed Analyzing Oracle Exadata pricing is always harder than one would first think. But I've finally gotten around to doing an Oracle Exadata 2 pricing spreadsheet. The main takeaways are:
        • If we believe Oracle's claims of 10X compression, Exadata 2 costs more per terabyte of user data than Netezza TwinFin -- $22-26K/TB vs. TwinFin's <$20K -- but less than the Teradata 2550.
        • These figures are highly sensitive to assumptions about Oracle's hybrid columnar compression.
        • Similarly, if Netezza or Teradata were to significantly upgrade their own compression, the price comparison would look quite different.
        • Options such as Data Mining or Oracle Spatial add 12% or so each to Exadata's total system price.

        ]]>
        http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/10/oracle_exadata.html /blog/archives/2009/10/oracle_exadata.html Information Management Tue, 06 Oct 2009 10:20:35 -0500
        Thoughts on Integrating OLTP and Data Warehousing (Especially in Exadata 2) Oracle is pushing Exadata 2 as being a great system for OLTP (OnLine Transaction Processing), data warehousing or, presumably, the integration of same. This claim rests on a few premises, namely:
        • Exadata is great for data warehousing. At this time, that's a claim much better supported by marketing and theory than by practice.

        • Exadata 2 is a suitable annual improvement over last year's Exadata 1. That's quite plausible.
        • Oracle is outstanding for OLTP. That's borne out by vast amounts of experience, especially if by "outstanding" you mean "Gets the job done really, really well at a very high cost in terms of both licenses and labor."
        • The Flash memory in Exadata 2 makes Oracle even better for OLTP.* That's plausible too. Worst-case is probably that Flash support doesn't really work well in those release, but will be cleaned up soon.**
        • OLTP and data warehousing uses for Exadata don't interfere with each other. That one bears some discussion.

          ]]> http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/09/thoughts_on_int.html /blog/archives/2009/09/thoughts_on_int.html Information Management Tue, 29 Sep 2009 16:44:12 -0500 Issues Comparing Analytic DBMS Performance The analytic DBMS/data warehouse appliance market is full of competitive performance claims. Sometimes, they're completely fabricated, with no basis in fact whatsoever. But often performance-advantage claims are based on one or more head-to-head performance comparisons. That is, System A and System B are used to run the same set of queries, and some function is applied that takes the two sets of query running times as an input, and spits out a relative performance number as an output.

          For example, Greg Rahn twittered to me that Oracle Exadata commonly outperforms existing Oracle installations by a factor of 50 or better, based on a "geometric mean". What I presume he meant by that is:

          • At any one user installation, a number of queries were compared on new system vs. old.
          • In each case, the ratio between new and old running time was taken.
          • The geometric mean of all those ratios was computed.

          ]]>
          http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/09/issues_comparin.html /blog/archives/2009/09/issues_comparin.html Information Management Tue, 22 Sep 2009 14:23:04 -0500
          Thinking About Analytic Speed For a variety of reasons, I don't plan to post my complete Enzee keynote slide deck soon, if ever. But perhaps one or more of its subjects are worth spinning out in their own blog posts.
          I'm going to start with analytic speed or, equivalently, analytic latency. There is, obviously, a huge industry emphasis on speed. Indeed, there's so much emphasis that confusion often ensues. My goal in this post is not really to resolve the confusion; that would be ambitious to the max. But I'm at least trying to call attention to it, so that we can all be more careful in our discussions going forward, and perhaps contribute to a framework for those discussions as well.

          ]]>
          http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/09/thinking_about.html /blog/archives/2009/09/thinking_about.html Business Intelligence Mon, 14 Sep 2009 09:36:37 -0500
          Teradata's Active Enterprise Data Warehouse Story Teradata used to tell a one-size-fits-all Enterprise Data Warehouse (EDW) story. That's no longer the case. Last year, Teradata introduced a range of products. I think Teradata is serious about selling its full product range, and by now has achieved buy-in from its sales force for that strategy. I base these beliefs on data points such as:
          • Teradata says so, repeatedly and persuasively
          • At least in passing, Teradata cites non-trivial sales figures for the appliance product lines
          • Competitors are less unanimous in asserting that Teradata's lower-end products are presented on just a bait-and-switch basis

          But that raises the question: How does Teradata pitch the advantages of its top-end product line these days? At least at the corporate level, the answer seems to focus less on the "EDW" concept than it used to, and more on "Active." Teradata -- which actually has been talking about "Active Data Warehousing" for about a decade — indeed calls its top-end 55xx series the "Teradata Active Enterprise Data Warehouse."

          ]]>
          http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/08/teradatas_activ.html /blog/archives/2009/08/teradatas_activ.html Information Management Mon, 24 Aug 2009 08:04:50 -0500
          Sorting out Netezza and Oracle Exadata Data Warehouse Appliance Pricing Netezza recently announced a new generation of data warehouse appliance called TwinFin. TwinFin's clearest stated list price is "a little under $20,000 per terabyte of user data," which in my opinion immediately became the new industry reference point for discussing prices in the data warehouse appliance category. Vigorous discussion ensued, especially in the comment thread to the first of the two posts linked above. Here's some followup.

          Netezza should not have claimed a "10-15X price/performance improvement," based on a 3-5X performance improvement and a 3X decrease in price/terabyte, and I should have grilled Netezza harder when it first made the claim. In fact, there is no unit of performance that you can, in a reasonable blended average, get 10-15X more of per dollar in TwinFin than you can in the predecessor NPS series.

          To look at it another way, multiplying 3-5X by 3X would only make sense if 3-5X were a measure of something like "terabytes/unit of performance." But in fact the 3-5X is a blended average of something more like "units of performance/unit of time"; i.e., you can do 3-5X more calculations or queries in a unit of time over the same database (of the same size*) on the new machine as you can on the old.

          ]]>
          http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/08/sorting_out_net.html /blog/archives/2009/08/sorting_out_net.html Information Management Mon, 10 Aug 2009 07:32:08 -0500
          Teradata 13 Focuses on Advanced Analytic Performance Last October I wrote about the Teradata 13 release of Teradata's database management software. Teradata 13, which will be used across the various Teradata product lines, has now been announced for GCA (General Customer Availability)*. So far as I can tell, there were two main points of emphasis for Teradata 13:
          • Performance (of course, performance is a point of emphasis for almost any release of any analytic DBMS product), especially but not only in the areas of aggregates, ETL (Extract/Transform/Load), and UDFs.
          • UDFs (User Defined Functions), especially but not only in the areas of data mining and geospatial analysis.

          To put it even more concisely, the focus of Teradata 13 is on advanced analytic performance, although there of course are some enhancements in simple query performance and in analytic functionality as well.

          ]]>
          http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/08/teradata_13_foc.html /blog/archives/2009/08/teradata_13_foc.html Information Management Mon, 03 Aug 2009 15:28:49 -0500
          Netezza Is Changing its Hardware Architecture, Slashing Prices Netezza is about to make its biggest product announcement in years. In particular:
          • Netezza is cutting prices to under $20K/terabyte of user data, with even lower numbers promised for the near future.
          • Netezza is replacing its PowerPC chips with Intel-based IBM blades.
          • There will be substantial changes in how data flows between the various parts of a Netezza node.
          • Netezza claims this will all produce an immediate 10-15X increase in price-performance, based on a 3X cut in price/terabyte and a 3-5X improvement in mixed workload performance. Edit: Netezza now agrees that it shouldn't have phrased things that way.")
          Allow me to explain.

          For months, it has been an increasingly open secret that Netezza was planning a major refresh of its product line. As signaled by a blog post from Netezza's product marketing VP Phil Francisco, many of the details are finally fit to post.*

          *A couple more will be revealed next week, and a longer-term roadmap will be laid out during Netezza's conference tour in September. (By the way, yours truly will be keynoting the Boston, Chicago, San Francisco, Washington, London, and Milan iterations of same. Come by and say hi!)

          ]]>
          http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/07/netezza_is_chan.html /blog/archives/2009/07/netezza_is_chan.html Information Management Fri, 31 Jul 2009 10:35:00 -0500
          Initial reactions to IBM acquiring SPSS IBM is acquiring SPSS. My initial thoughts (questions by Eric Lai of Computerworld) include:

          1) good buy for IBM? why or why not?

          Yes. The integration of predictive analytics with other analytic or operational technologies is still ahead of us, so there was a lot of value to be gained from SPSS beyond what it had standalone. (That said, I haven't actually looked at the numbers, so I have no comment on the price.)

          By the way, SPSS coined the phrase "predictive analytics," with the rest of the industry then coming around to use it. As with all successful marketing phrases, it's somewhat misleading, in that it's not wholly focused on prediction.

          2) how does it position IBM vs. competitors?

          ]]>
          http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/07/initial_reactio.html /blog/archives/2009/07/initial_reactio.html Business Intelligence Wed, 29 Jul 2009 07:50:13 -0500
          Update on Microsoft's Madison and Fast Track Data Warehouse Products I chatted with Stuart Frost of Microsoft on Tuesday. Stuart is and remains GM of Microsoft's data warehouse product unit, covering about $1 billion or so of revenue. While rumors of Stuart's departure from Microsoft are clearly exaggerated, it does seem that his role is more one of coordination than actual management.

          Microsoft Madison availability remains scheduled for H1 2010. Nothing new there. Tangible progress includes a few customer commitments of various sorts, including one outright planned purchase (due to some internal customer considerations around using up a budget). At the moment various Microsoft Madison technology "previews" are going on, which seem to amount to proofs-of-concept that:

          • Start with actual customer data (some from Microsoft, some from outside)
          • Generate larger synthesized data sets based on those (database size seems to be 10-100 TB)
          • Run in Microsoft data centers or "technology centers," rather than on customer premises.

          ]]>
          http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/07/update_on_micro.html /blog/archives/2009/07/update_on_micro.html Information Management Fri, 17 Jul 2009 10:43:43 -0500
          Hasso Plattner Calls for In-Memory OLTP Column Stores Former SAP CEO Hasso Plattner has written a paper called A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database, in association with a SIGMOD keynote address.* The approach Plattner advocates is an MPP in-memory column store, presumably somewhat akin to SAP's frequently renamed Business Warehouse Accelerator/Business Intelligence Accelerator/BWA/BIA/Son-of-TREX technology. There also are strong similarities to the MPP in-memory row store project H-Store/VoltDB, although I don't know whether Plattner would go so far as to adopt the H-Store view that all transactions should run in stored procedures. Unsurprisingly, SAP applications are used as the OLTP paradigm throughout.

          *Thanks to Dave Kellogg for tipping me off to Plattner's paper. I only went to two SIGMOD sessions, neither of which was Plattner's. Nobody actually mentioned Plattner's talk to me when I was down at SIGMOD.

          Perhaps the most interesting part is Plattner's claim that what's demanding about OLTP isn't database updating per se, but rather maintaining aggregates for quick-response analytics. In his main example of that point, Plattner proposes a real-life "more than 18″ table schema, of which two are base tables, and (most of?) the rest are materialized views that his proposed database architecture dispenses with (because analytic performance is sufficiently good without them). Thus, Plattner's core columnar argument seemingly is...

          ]]>
          http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/07/hasso_plattner.html /blog/archives/2009/07/hasso_plattner.html Information Management Wed, 08 Jul 2009 09:13:16 -0500
          Google Announces Fusion Tables Google has announced an experimental cloud-based data management system called Fusion Tables. A press article and Slashdot thread ensued, based on some bizarre-sounding analyst quotes that I will not attempt to parse.

          ]]>
          http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/06/google_announce.html /blog/archives/2009/06/google_announce.html Information Management Mon, 15 Jun 2009 09:10:10 -0500
          Greenplum's Announcement and the Future of Data Marts Greenplum is announcing today a long-term vision, under the name Enterprise Data Cloud (EDC). Key observations around the concept -- mixing mine and Greenplum's together -- include:

          • Data marts aren't just for performance (or price/performance). They also exist to give individual analysts or small teams control of their analytic destiny.
          • Thus, it would be really cool if business users could have their own analytic "sandboxes" -- virtual or physical analytic databases that they can manipulate without breaking anything else.
          • In any case, business users want to analyze data when they want to analyze it. It is often unwise to ask business users to postpone analysis until after an enterprise data model can be extended to fully incorporate the new data they want to look at.
          • Whether or not you agree with that, it's an empirical fact that enterprises have many legacy data marts (or even, especially due to M&A, multiple legacy data warehouses). Similarly, it's an empirical fact that many business users have the clout to order up new data marts as well.
          • Consolidating data marts onto one common technological platform has important benefits.

          In essence, Greenplum is pitching this story:

            ]]> http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/06/greenplums_anno.html /blog/archives/2009/06/greenplums_anno.html Information Management Mon, 08 Jun 2009 09:20:35 -0500 Reinventing Business Intelligence I've felt for quite a while that business intelligence tools are due for a revolution. But I've found the subject daunting to write about because -- well, because it's so multifaceted and big. So to break that logjam, here are some thoughts on the reinvention of business intelligence technology, with no pretense of being in any way comprehensive.

            Natural language and classic science fiction

            Actually, there's a pretty well-known example of BI near-perfection -- the Star Trek computers, usually voiced by the late Majel Barrett Roddenberry. They didn't have a big role in the recent movie, which was so fast-paced nobody had time to analyze very much, but were a big part of the Star Trek universe overall. Star Trek's computers integrated analytics, operations, and authentication, all with a great natural language/voice interface and visual displays. That example is at the heart of a 1998 article on natural language recognition I just re-posted.

            ]]>
            http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/06/reinventing_bus.html /blog/archives/2009/06/reinventing_bus.html Business Intelligence Tue, 02 Jun 2009 09:41:09 -0500
            More on MySQL Forks and Storage Engines The issue of MySQL forks and their possible effect on closed-source storage engine vendors continues to get attention. The underlying question is:

            Suppose Oracle wants to make life difficult for third-party storage engine vendors via its incipient control of MySQL? Can the storage engine vendors insulate themselves from this risk by working with a MySQL fork?

            As laid out most clearly in a comment thread to a previous post*, Mike Hogan (CEO of ScaleDB) believes closed-source storage engine vendors can use a MySQL fork without running afoul of the GPL. In a nutshell, what he proposes is an inbetween layer of software, itself open-sourced, that on one side interfaces with MySQL, and on the other side talks cleanly enough to storage engines that it doesn't infect them with the GPL.

            ]]>
            http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/05/more_on_mysql_f.html /blog/archives/2009/05/more_on_mysql_f.html Information Management Tue, 26 May 2009 09:30:12 -0500
            The Real Story on IBM's System S Release IBM hastily announced System S Streams this week, a product that was supposed to be called InfoSphere Streams and introduced only in 2010. Apparently, the rush is because senior management wanted to talk about it later this week, and perhaps also because it was implicitly baked into some of IBM's advertising already. Scrambling ensued. Even so, Jeff Jones and team got to me fast, and briefed me -- fairly non-technically, unfortunately, but otherwise how I like it, namely on a harmless embargo and without any NDAs.

            Microsoft also introduced CEP this week. Perhaps it is more than coincidence that IBM rushed out its own announcement of an immature CEP technology immediately after Microsoft revealed its plans. Taken together, these announcements support my theory that the small independent CEP/stream processing vendors are more or less ceding broad parts of the potential stream processing market.

            ]]>
            http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/05/the_real_story.html /blog/archives/2009/05/the_real_story.html Information Management Fri, 15 May 2009 09:53:21 -0500
            eBay's Enormous Data Warehouses Detailed A few weeks ago, I had the chance to visit eBay, meet briefly with Oliver Ratzesberger and his team, and then catch up later with Oliver for dinner. I've already alluded to those discussions in a couple of posts, specifically on MapReduce (which eBay doesn't like) and the astonishingly great difference between high- and low-end disk drives (to which eBay clued me in). Now I'm finally getting around to writing about the core of what we discussed, which is two of the very largest data warehouses in the world.

            ]]>
            http://www.intelligententerprise.com/movabletype/blog/cmonash.html/blog/archives/2009/05/ebays_enormous.html /blog/archives/2009/05/ebays_enormous.html Information Management Fri, 01 May 2009 09:38:21 -0500