Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Home
Digital Library
Events
RSS | Newsletters
Webcasts




The Intelligent Enterprise Blog: by Curt Monash
Data Frontiers, by Curt Monash

Curt Monash runs Monash Research, which provides strategic, analysis-based advice to users and vendors of advanced information technology. He also writes the blogs DBMS2, Text Technologies, and Strategic Messaging. Write him at contact@monash.com

Quick Thoughts on Sybase/Aleri

Sybase announced an asset purchase that amounts to a takeover of CEP (Complex Event Processing) vendor Aleri. Perhaps not coincidentally, Sybase already had technology under the hood from Aleri predecessor/acquiree Coral8, for financial services uses (notwithstanding that between Aleri Classic and Coral8, Aleri Classic was the one of the two more focused on financial services). Quick reactions include:

>>Continue reading "Quick Thoughts on Sybase/Aleri"


Posted Thursday, February 4, 2010
3:44 PM
>>Comments


Database Snooping Threatens Liberty - And We're All Making Matters Worse

Every year or two, I get back on my soapbox to say:

  • Database and analytic technology, as they evolve, will pose tremendous danger to individual liberties.
  • We in the industry who are creating this problem also have a duty to help fix it.
  • Technological solutions alone won't suffice. Legal changes are needed.
  • The core of the needed legal changes are tight restrictions on governmental use of data, because relying on restrictions about data acquisition and retention clearly won't suffice.
But this time I don't plan to be so quick to shut up.

>>Continue reading "Database Snooping Threatens Liberty - And We're All Making Matters Worse"


Posted Tuesday, February 2, 2010
2:30 PM
>>Comments


Netezza Skimmer Joins the Short List

As I previously complained, last week wasn't a very convenient time for me to have briefings. So when Netezza emailed to say it would release its new entry-level Skimmer appliance this week, while I asked for and got a Friday afternoon briefing, I kept it quick and basic.

That said, highlights of my Netezza Skimmer briefing included:

>>Continue reading "Netezza Skimmer Joins the Short List"


Posted Wednesday, January 27, 2010
8:14 AM
>>Comments


Two Cornerstones of Oracle's Database Hardware Strategy

After several months of careful optimization, Oracle managed to pick the most inconvenient* day possible for me to get an Exadata update from Juan Loaiza. But the call itself was long and fascinating, with the two main takeaways being:

  • Oracle thinks flash memory is the most important hardware technology of the decade, one that could lead to Oracle being "bumped off" if they don't get it right.

  • Juan believes the "bulk" of Oracle's business will move over to Exadata-like technology over the next five to ten years. Numbers-wise, this seems to be based more on Exadata being a platform for consolidating an enterprise's many Oracle databases than it is on Exadata running a few Especially Big Honking Database management tasks.

>>Continue reading "Two Cornerstones of Oracle's Database Hardware Strategy"


Posted Friday, January 22, 2010
3:41 PM
>>Comments


Oracle Lifts Cloud Over MySQL Storage Engine Vendors

Earlier this month, Oracle put out a press release promising to play nicely with MySQL if its Sun takeover is approved. The parts in italics below are quotes. My comments are in plain text.

1. Continued Availability of Storage Engine APIs. Oracle shall maintain and periodically enhance MySQL's Pluggable Storage Engine Architecture to allow users the flexibility to choose from a portfolio of native and third party supplied storage engines.

MySQL's Pluggable Storage Engine Architecture shall mean MySQL's current practice of using, publicly-available, documented application programming interfaces to allow storage engine vendors to "plug" into the MySQL database server. Documentation shall be consistent with the documentation currently provided by Sun.

Well, duh.

>>Continue reading "Oracle Lifts Cloud Over MySQL Storage Engine Vendors"


Posted Tuesday, December 29, 2009
12:50 PM
>>Comments


Reports of Perfectly-Balanced Hardware Configurations are Greatly Exaggerated

Data warehouse appliance and software appliance vendors like to claim that they've worked out just the right hardware configuration(s), and that a single configuration is correct for a fairly broad range of workloads. But there are a lot of reasons to be dubious about that. Specific vendor evidence includes:

    >>Continue reading "Reports of Perfectly-Balanced Hardware Configurations are Greatly Exaggerated"


    Posted Tuesday, November 24, 2009
    9:26 AM
    >>Comments


    Teradata's Hardware Strategy and Tactics

    In my opinion, the most important takeaways about Teradata's hardware strategy from the Teradata Partners conference last week are:

    • Teradata's future lies in solid-state memory. That's in line with what Carson Schmidt told me six months ago.
    • To Teradata's surprise, the solid-state future is imminent. Teradata is 6-9 months further along with solid-state drives (SSD) than it thought a year ago it would be at this point.
    • Short-term, Teradata is going to increase the number of appliance kinds it sells. I didn't actually get details on anything but the new SSD-based Blurr, but it seems there will be others as well.
    • Teradata's eventual future is to mix and match parts (especially different kinds of storage) in a more modular product line. Teradata Virtual Storage is of pretty limited value otherwise. I believe Teradata will go modular more emphatically than Teradata itself does, because I think doing so will meet users needs more effectively than if Teradata relies strictly on fixed appliance configurations.

    >>Continue reading "Teradata's Hardware Strategy and Tactics"


    Posted Tuesday, October 27, 2009
    9:06 AM
    >>Comments


    This Week at the Teradata Partners User Conference

    Here are some highlights of what's going on, although names, dates, and details will have to await conversations and press releases this week.


    • Teradata is productizing "private cloud," under names including "Teradata Enterprise Analytics Cloud," "Teradata Agile Analytics Cloud," and "Teradata Elastic Mart Builder." I.e., Teradata hopes to leapfrog Greenplum in its "Enterprise Data Cloud" strategy. This is only fair, in that Greenplum lifted the idea from Teradata and eBay in the first place. It also provides major support for what I think is an extremely sensible trend. Give or take issues of who announces and ships what a couple months before or after a competitor, my early thinking is that the main differences between Greenplum and Teradata in this regard will be:

        >>Continue reading "This Week at the Teradata Partners User Conference "


        Posted Tuesday, October 20, 2009
        11:18 AM
        >>Comments


        Oracle Exadata 2 Capacity Pricing Revealed

        Analyzing Oracle Exadata pricing is always harder than one would first think. But I've finally gotten around to doing an Oracle Exadata 2 pricing spreadsheet. The main takeaways are:

        • If we believe Oracle's claims of 10X compression, Exadata 2 costs more per terabyte of user data than Netezza TwinFin -- $22-26K/TB vs. TwinFin's <$20K -- but less than the Teradata 2550.
        • These figures are highly sensitive to assumptions about Oracle's hybrid columnar compression.
        • Similarly, if Netezza or Teradata were to significantly upgrade their own compression, the price comparison would look quite different.
        • Options such as Data Mining or Oracle Spatial add 12% or so each to Exadata's total system price.

        >>Continue reading "Oracle Exadata 2 Capacity Pricing Revealed"


        Posted Tuesday, October 6, 2009
        10:20 AM
        >>Comments


        Thoughts on Integrating OLTP and Data Warehousing (Especially in Exadata 2)

        Oracle is pushing Exadata 2 as being a great system for OLTP (OnLine Transaction Processing), data warehousing or, presumably, the integration of same. This claim rests on a few premises, namely:

        • Exadata is great for data warehousing. At this time, that's a claim much better supported by marketing and theory than by practice.

        • Exadata 2 is a suitable annual improvement over last year's Exadata 1. That's quite plausible.
        • Oracle is outstanding for OLTP. That's borne out by vast amounts of experience, especially if by "outstanding" you mean "Gets the job done really, really well at a very high cost in terms of both licenses and labor."
        • The Flash memory in Exadata 2 makes Oracle even better for OLTP.* That's plausible too. Worst-case is probably that Flash support doesn't really work well in those release, but will be cleaned up soon.**
        • OLTP and data warehousing uses for Exadata don't interfere with each other. That one bears some discussion.

          >>Continue reading "Thoughts on Integrating OLTP and Data Warehousing (Especially in Exadata 2)"


          Posted Tuesday, September 29, 2009
          4:44 PM
          >>Comments


          Issues Comparing Analytic DBMS Performance

          The analytic DBMS/data warehouse appliance market is full of competitive performance claims. Sometimes, they're completely fabricated, with no basis in fact whatsoever. But often performance-advantage claims are based on one or more head-to-head performance comparisons. That is, System A and System B are used to run the same set of queries, and some function is applied that takes the two sets of query running times as an input, and spits out a relative performance number as an output.

          For example, Greg Rahn twittered to me that Oracle Exadata commonly outperforms existing Oracle installations by a factor of 50 or better, based on a "geometric mean". What I presume he meant by that is:

          • At any one user installation, a number of queries were compared on new system vs. old.
          • In each case, the ratio between new and old running time was taken.
          • The geometric mean of all those ratios was computed.

          >>Continue reading "Issues Comparing Analytic DBMS Performance"


          Posted Tuesday, September 22, 2009
          2:23 PM
          >>Comments


          Thinking About Analytic Speed

          For a variety of reasons, I don't plan to post my complete Enzee keynote slide deck soon, if ever. But perhaps one or more of its subjects are worth spinning out in their own blog posts.
          I'm going to start with analytic speed or, equivalently, analytic latency. There is, obviously, a huge industry emphasis on speed. Indeed, there's so much emphasis that confusion often ensues. My goal in this post is not really to resolve the confusion; that would be ambitious to the max. But I'm at least trying to call attention to it, so that we can all be more careful in our discussions going forward, and perhaps contribute to a framework for those discussions as well.

          >>Continue reading "Thinking About Analytic Speed "


          Posted Monday, September 14, 2009
          9:36 AM
          >>Comments


          Teradata's Active Enterprise Data Warehouse Story

          Teradata used to tell a one-size-fits-all Enterprise Data Warehouse (EDW) story. That's no longer the case. Last year, Teradata introduced a range of products. I think Teradata is serious about selling its full product range, and by now has achieved buy-in from its sales force for that strategy. I base these beliefs on data points such as:

          • Teradata says so, repeatedly and persuasively
          • At least in passing, Teradata cites non-trivial sales figures for the appliance product lines
          • Competitors are less unanimous in asserting that Teradata's lower-end products are presented on just a bait-and-switch basis

          But that raises the question: How does Teradata pitch the advantages of its top-end product line these days? At least at the corporate level, the answer seems to focus less on the "EDW" concept than it used to, and more on "Active." Teradata -- which actually has been talking about "Active Data Warehousing" for about a decade — indeed calls its top-end 55xx series the "Teradata Active Enterprise Data Warehouse."

          >>Continue reading "Teradata's Active Enterprise Data Warehouse Story "


          Posted Monday, August 24, 2009
          8:04 AM
          >>Comments


          Sorting out Netezza and Oracle Exadata Data Warehouse Appliance Pricing

          Netezza recently announced a new generation of data warehouse appliance called TwinFin. TwinFin's clearest stated list price is "a little under $20,000 per terabyte of user data," which in my opinion immediately became the new industry reference point for discussing prices in the data warehouse appliance category. Vigorous discussion ensued, especially in the comment thread to the first of the two posts linked above. Here's some followup.

          Netezza should not have claimed a "10-15X price/performance improvement," based on a 3-5X performance improvement and a 3X decrease in price/terabyte, and I should have grilled Netezza harder when it first made the claim. In fact, there is no unit of performance that you can, in a reasonable blended average, get 10-15X more of per dollar in TwinFin than you can in the predecessor NPS series.

          To look at it another way, multiplying 3-5X by 3X would only make sense if 3-5X were a measure of something like "terabytes/unit of performance." But in fact the 3-5X is a blended average of something more like "units of performance/unit of time"; i.e., you can do 3-5X more calculations or queries in a unit of time over the same database (of the same size*) on the new machine as you can on the old.

          >>Continue reading "Sorting out Netezza and Oracle Exadata Data Warehouse Appliance Pricing"


          Posted Monday, August 10, 2009
          7:32 AM
          >>Comments


          Teradata 13 Focuses on Advanced Analytic Performance

          Last October I wrote about the Teradata 13 release of Teradata's database management software. Teradata 13, which will be used across the various Teradata product lines, has now been announced for GCA (General Customer Availability)*. So far as I can tell, there were two main points of emphasis for Teradata 13:

          • Performance (of course, performance is a point of emphasis for almost any release of any analytic DBMS product), especially but not only in the areas of aggregates, ETL (Extract/Transform/Load), and UDFs.
          • UDFs (User Defined Functions), especially but not only in the areas of data mining and geospatial analysis.

          To put it even more concisely, the focus of Teradata 13 is on advanced analytic performance, although there of course are some enhancements in simple query performance and in analytic functionality as well.

          >>Continue reading "Teradata 13 Focuses on Advanced Analytic Performance "


          Posted Monday, August 3, 2009
          3:28 PM
          >>Comments


          Netezza Is Changing its Hardware Architecture, Slashing Prices

          Netezza is about to make its biggest product announcement in years. In particular:

          • Netezza is cutting prices to under $20K/terabyte of user data, with even lower numbers promised for the near future.
          • Netezza is replacing its PowerPC chips with Intel-based IBM blades.
          • There will be substantial changes in how data flows between the various parts of a Netezza node.
          • Netezza claims this will all produce an immediate 10-15X increase in price-performance, based on a 3X cut in price/terabyte and a 3-5X improvement in mixed workload performance. Edit: Netezza now agrees that it shouldn't have phrased things that way.")
          Allow me to explain.

          For months, it has been an increasingly open secret that Netezza was planning a major refresh of its product line. As signaled by a blog post from Netezza's product marketing VP Phil Francisco, many of the details are finally fit to post.*

          *A couple more will be revealed next week, and a longer-term roadmap will be laid out during Netezza's conference tour in September. (By the way, yours truly will be keynoting the Boston, Chicago, San Francisco, Washington, London, and Milan iterations of same. Come by and say hi!)

          >>Continue reading "Netezza Is Changing its Hardware Architecture, Slashing Prices"


          Posted Friday, July 31, 2009
          10:35 AM
          >>Comments


          Initial reactions to IBM acquiring SPSS

          IBM is acquiring SPSS. My initial thoughts (questions by Eric Lai of Computerworld) include:

          1) good buy for IBM? why or why not?

          Yes. The integration of predictive analytics with other analytic or operational technologies is still ahead of us, so there was a lot of value to be gained from SPSS beyond what it had standalone. (That said, I haven't actually looked at the numbers, so I have no comment on the price.)

          By the way, SPSS coined the phrase "predictive analytics," with the rest of the industry then coming around to use it. As with all successful marketing phrases, it's somewhat misleading, in that it's not wholly focused on prediction.

          2) how does it position IBM vs. competitors?

          >>Continue reading "Initial reactions to IBM acquiring SPSS"


          Posted Wednesday, July 29, 2009
          7:50 AM
          >>Comments


          Update on Microsoft's Madison and Fast Track Data Warehouse Products

          I chatted with Stuart Frost of Microsoft on Tuesday. Stuart is and remains GM of Microsoft's data warehouse product unit, covering about $1 billion or so of revenue. While rumors of Stuart's departure from Microsoft are clearly exaggerated, it does seem that his role is more one of coordination than actual management.

          Microsoft Madison availability remains scheduled for H1 2010. Nothing new there. Tangible progress includes a few customer commitments of various sorts, including one outright planned purchase (due to some internal customer considerations around using up a budget). At the moment various Microsoft Madison technology "previews" are going on, which seem to amount to proofs-of-concept that:

          • Start with actual customer data (some from Microsoft, some from outside)
          • Generate larger synthesized data sets based on those (database size seems to be 10-100 TB)
          • Run in Microsoft data centers or "technology centers," rather than on customer premises.

          >>Continue reading "Update on Microsoft's Madison and Fast Track Data Warehouse Products"


          Posted Friday, July 17, 2009
          10:43 AM
          >>Comments


          Hasso Plattner Calls for In-Memory OLTP Column Stores

          Former SAP CEO Hasso Plattner has written a paper called A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database, in association with a SIGMOD keynote address.* The approach Plattner advocates is an MPP in-memory column store, presumably somewhat akin to SAP's frequently renamed Business Warehouse Accelerator/Business Intelligence Accelerator/BWA/BIA/Son-of-TREX technology. There also are strong similarities to the MPP in-memory row store project H-Store/VoltDB, although I don't know whether Plattner would go so far as to adopt the H-Store view that all transactions should run in stored procedures. Unsurprisingly, SAP applications are used as the OLTP paradigm throughout.

          *Thanks to Dave Kellogg for tipping me off to Plattner's paper. I only went to two SIGMOD sessions, neither of which was Plattner's. Nobody actually mentioned Plattner's talk to me when I was down at SIGMOD.

          Perhaps the most interesting part is Plattner's claim that what's demanding about OLTP isn't database updating per se, but rather maintaining aggregates for quick-response analytics. In his main example of that point, Plattner proposes a real-life "more than 18″ table schema, of which two are base tables, and (most of?) the rest are materialized views that his proposed database architecture dispenses with (because analytic performance is sufficiently good without them). Thus, Plattner's core columnar argument seemingly is...

          >>Continue reading "Hasso Plattner Calls for In-Memory OLTP Column Stores"


          Posted Wednesday, July 8, 2009
          9:13 AM
          >>Comments


          Google Announces Fusion Tables

          Google has announced an experimental cloud-based data management system called Fusion Tables. A press article and Slashdot thread ensued, based on some bizarre-sounding analyst quotes that I will not attempt to parse.

          >>Continue reading "Google Announces Fusion Tables "


          Posted Monday, June 15, 2009
          9:10 AM
          >>Comments


          Greenplum's Announcement and the Future of Data Marts

          Greenplum is announcing today a long-term vision, under the name Enterprise Data Cloud (EDC). Key observations around the concept -- mixing mine and Greenplum's together -- include:

          • Data marts aren't just for performance (or price/performance). They also exist to give individual analysts or small teams control of their analytic destiny.
          • Thus, it would be really cool if business users could have their own analytic "sandboxes" -- virtual or physical analytic databases that they can manipulate without breaking anything else.
          • In any case, business users want to analyze data when they want to analyze it. It is often unwise to ask business users to postpone analysis until after an enterprise data model can be extended to fully incorporate the new data they want to look at.
          • Whether or not you agree with that, it's an empirical fact that enterprises have many legacy data marts (or even, especially due to M&A, multiple legacy data warehouses). Similarly, it's an empirical fact that many business users have the clout to order up new data marts as well.
          • Consolidating data marts onto one common technological platform has important benefits.

          In essence, Greenplum is pitching this story:

            >>Continue reading "Greenplum's Announcement and the Future of Data Marts"


            Posted Monday, June 8, 2009
            9:20 AM
            >>Comments


            Reinventing Business Intelligence

            I've felt for quite a while that business intelligence tools are due for a revolution. But I've found the subject daunting to write about because -- well, because it's so multifaceted and big. So to break that logjam, here are some thoughts on the reinvention of business intelligence technology, with no pretense of being in any way comprehensive.

            Natural language and classic science fiction

            Actually, there's a pretty well-known example of BI near-perfection -- the Star Trek computers, usually voiced by the late Majel Barrett Roddenberry. They didn't have a big role in the recent movie, which was so fast-paced nobody had time to analyze very much, but were a big part of the Star Trek universe overall. Star Trek's computers integrated analytics, operations, and authentication, all with a great natural language/voice interface and visual displays. That example is at the heart of a 1998 article on natural language recognition I just re-posted.

            >>Continue reading "Reinventing Business Intelligence"


            Posted Tuesday, June 2, 2009
            9:41 AM
            >>Comments


            More on MySQL Forks and Storage Engines

            The issue of MySQL forks and their possible effect on closed-source storage engine vendors continues to get attention. The underlying question is:

            Suppose Oracle wants to make life difficult for third-party storage engine vendors via its incipient control of MySQL? Can the storage engine vendors insulate themselves from this risk by working with a MySQL fork?

            As laid out most clearly in a comment thread to a previous post*, Mike Hogan (CEO of ScaleDB) believes closed-source storage engine vendors can use a MySQL fork without running afoul of the GPL. In a nutshell, what he proposes is an inbetween layer of software, itself open-sourced, that on one side interfaces with MySQL, and on the other side talks cleanly enough to storage engines that it doesn't infect them with the GPL.

            >>Continue reading "More on MySQL Forks and Storage Engines"


            Posted Tuesday, May 26, 2009
            9:30 AM
            >>Comments


            The Real Story on IBM's System S Release

            IBM hastily announced System S Streams this week, a product that was supposed to be called InfoSphere Streams and introduced only in 2010. Apparently, the rush is because senior management wanted to talk about it later this week, and perhaps also because it was implicitly baked into some of IBM's advertising already. Scrambling ensued. Even so, Jeff Jones and team got to me fast, and briefed me -- fairly non-technically, unfortunately, but otherwise how I like it, namely on a harmless embargo and without any NDAs.

            Microsoft also introduced CEP this week. Perhaps it is more than coincidence that IBM rushed out its own announcement of an immature CEP technology immediately after Microsoft revealed its plans. Taken together, these announcements support my theory that the small independent CEP/stream processing vendors are more or less ceding broad parts of the potential stream processing market.

            >>Continue reading "The Real Story on IBM's System S Release"


            Posted Friday, May 15, 2009
            9:53 AM
            >>Comments


            eBay's Enormous Data Warehouses Detailed

            A few weeks ago, I had the chance to visit eBay, meet briefly with Oliver Ratzesberger and his team, and then catch up later with Oliver for dinner. I've already alluded to those discussions in a couple of posts, specifically on MapReduce (which eBay doesn't like) and the astonishingly great difference between high- and low-end disk drives (to which eBay clued me in). Now I'm finally getting around to writing about the core of what we discussed, which is two of the very largest data warehouses in the world.

            >>Continue reading "eBay's Enormous Data Warehouses Detailed"


            Posted Friday, May 1, 2009
            9:38 AM
            >>Comments


            It's Time to Strengthen MySQL Forkers

            As my first three posts on the Oracle/Sun merger suggested, I think Oracle will do a better job with MySQL product development than Sun has. But of course that's a low hurdle. And so it leaves open the questions:

            What should and/or will be the most widely adopted code lines of MySQL (or other open source DBMS), especially for the types of users and vendors who are engaged with MySQL (as opposed to principal alternative PostgreSQL) today?

            >>Continue reading "It's Time to Strengthen MySQL Forkers"


            Posted Tuesday, April 21, 2009
            11:51 AM
            >>Comments


            First Thoughts on Oracle Acquiring Sun


            • Wow.
            • And during the week of the MySQL conference, too.
            • In the must-read slide presentation, Oracle's says all the right things about being committed to all product lines and technologies. On the whole, this is believable.
            • Oracle says it's focusing Sun hardware sales on existing Oracle/Sun customers. Makes sense.
            • Oracle mentions OpenStorage prominently. Makes sense. Integrating DBMS with storage is Oracle's high-end DBMS future (e.g., Exadata).

              >>Continue reading "First Thoughts on Oracle Acquiring Sun "


              Posted Monday, April 20, 2009
              11:21 AM
              >>Comments


              Notes On Inforsense, Tableau, Jaspersoft and More

              I keep not finding the time to write as much about business intelligence as I'd like to. So I'm going to do one omnibus post here covering a lot of companies and trends, then circle back in more detail when I can. Top-level highlights include:

              • Jaspersoft has a new v3.5 product release. Highlights include multi-tenancy-for-SaaS and another in-memory OLAP option. Otherwise, things sound qualitatively much as I wrote last September.
              • Inforsense has a cool composite-analytical-applications story. More precisely, they said my phrase "analytics-oriented EAI" was an "exceptionally good" way to describe their focus. Inforsense's biggest target market seems to be health care, research and clinical alike. Financial services is next in line.
              • Tableau Software "gets it" a little bit more than other BI vendors about the need to decide for yourself how to define metrics. (Of course, it's possible that other "exploration"-oriented new-style vendors are just as clued-in, but I haven't asked in the right way.)

                >>Continue reading "Notes On Inforsense, Tableau, Jaspersoft and More"


                Posted Monday, April 6, 2009
                10:27 AM
                >>Comments


                SAS Enters Its Own Cloud

                The Register has a fairly detailed article about SAS expanding its cloud/SaaS offerings. I disagree with one part, namely:

                SAS may not have a choice but to build its own cloud. Given the sensitive nature of the data its customers analyze, moving that data out to a public cloud such as the Amazon EC2 and S3 combo is just not going to happen.

                And even if rugged security could make customers comfortable with that idea, moving large data sets into clouds (as Sun Microsystems discovered with the Sun Grid) is problematic. Even if you can parallelize the uploads of large data sets, it takes time.

                But if you run the applications locally in the SAS cloud, then doing further analysis on that data is no big deal. It's all on the same SAN anyway, locked down locally just as you would do in your own data center.

                I fail to see why SAS's campus would be better than leading hosting companies' data centers for either data privacy/security or data upload speed. Rather, I think major reasons for SAS building its own data center for cloud computing probably focus on:

                  >>Continue reading "SAS Enters Its Own Cloud "


                  Posted Wednesday, March 25, 2009
                  9:48 AM
                  >>Comments


                  Database Implications if IBM Acquires Sun

                  Reported or rumored merger discussions between IBM and Sun are generating huge amounts of discussion (some links below). Here are some quick thoughts around the subject of how the IBM/Sun deal — if it happens — might affect the database management system industry.

                  • IBM is already serious about supporting multiple database management systems. DB2 on open systems is IBM's flagship DBMS. But DB2 on mainframes and at least one flavor of Informix seem to be getting maintained and enhanced fairly seriously as well. And IBM has further DBMS products as well (e.g., DB/2 on the AS/400). There's little reason to think IBM would orphan MySQL or any other DBMS product.

                  • IBM is very open-source-friendly. For a company that grew up for decades on proprietary software — and still is a huge software products vendor — IBM is very serious about open source. If you doubt that, I have two words for you: "Linux" and "Eclipse."

                    >>Continue reading "Database Implications if IBM Acquires Sun "


                    Posted Thursday, March 19, 2009
                    10:14 AM
                    >>Comments


                    Complex Event Processing Vendors Flounder

                    Independent CEP (Complex/Event Processing) vendors continue to flounder, at least outside the financial services and national intelligence markets.

                    • StreamBase once planned to conquer the world, making an impact as big as database management's. Now it has retreated into niche markets.
                    • Progress Software, a decent-sized company, put a large fraction of its energy into Apama. Little has happened outside the financial service sector.
                    • Coral8 has some great-sounding ideas. But Coral8 now has merged into Aleri, basically a financial-markets specialist.
                    • Mike Franklin says some ambitious things on behalf of Truviso, but I haven't noticed much traction there either.

                    >>Continue reading "Complex Event Processing Vendors Flounder"


                    Posted Wednesday, March 18, 2009
                    10:11 AM
                    >>Comments


                    Quick Take on Microsoft SQL Server Fast Track

                    Stuart Frost of Microsoft (nee' DATAllegro) checked in, with Microsoft's TDWI-timed announcements. The news part was something called "SQL Server Fast Track," which is the Microsoft SQL Server equivalent to Oracle's "recommended configurations" or IBM's "BCUs." SQL Server Fast Track is further being portrayed as an incremental step toward Madison, Microsoft's future high-end data warehousing offering.

                      >>Continue reading "Quick Take on Microsoft SQL Server Fast Track"


                      Posted Monday, February 23, 2009
                      3:31 PM
                      >>Comments


                      Analytics' Role in a Frightening Economy

                      I chatted the other day with an executive on the general business side (as opposed to the trading operation) of a household-name brokerage firm, one that's in no immediate financial peril. It seems their #1 analytic-technology priority right now is changing planning from an annual to a monthly cycle.* That's a smart idea. While it's especially important in their business, larger enterprises of all kinds should consider following suiy.

                      *By the way, they seem to want use Applix technology, now owned by IBM/Cognos, to do it, more for the planning tools than for the cool in-memory OLAP engine itself. Your mileage may vary.

                      >>Continue reading "Analytics' Role in a Frightening Economy"


                      Posted Monday, February 9, 2009
                      2:41 PM
                      >>Comments


                      Why BI is in a Funk

                      I wrote recently that BI is in a "funk." Let me now offer a few ideas as to why that is so.

                      1. At its heart, BI is an application development technology, and making money from innovating in development is hard. To quote myself:

                      Products are obsolete before they [are] mature. Products commonly do only part of what is necessary. Generally, a new tool will be developed to help with a new need... But these tools will often be weak at what came before... By the time the shiny new tools mature to do a good job at the older requirements, some other... shift comes along, with yet newer and shinier tools to handle the latest twists.


                      >>Continue reading "Why BI is in a Funk"


                      Posted Friday, January 30, 2009
                      11:42 AM
                      >>Comments


                      Don't Let Gartner's Data Warehouse Magic Quadrant Confuse You

                      Gartner's latest Magic Quadrant for data warehouse DBMSs was published last last year. Thankfully, vendors don't seem to be taking it as seriously as usual, so I didn't immediately hear about. (I finally noticed it in a Greenplum pay-per-click ad.) Links to Gartner MQs tend to come and go, but as of now here are two working links to the 2008 Gartner Data Warehouse Database Management System MQ. My posts on the 2007 and 2006 MQs have also been updated with working links.

                      Highlights of this year's data warehouse DBMS Magic Quadrant include:

                        >>Continue reading "Don't Let Gartner's Data Warehouse Magic Quadrant Confuse You"


                        Posted Wednesday, January 21, 2009
                        10:50 AM
                        >>Comments


                        How to Buy an Analytic DBMS

                        I went to London for a couple of days recently, at the behest of Kognitio. Since I was in the neighborhood anyway, I visited their offices for a briefing. But the main driver for the trip was a seminar Thursday at which I was the featured speaker. As promised, the slides have been uploaded here.

                        The material covered on the first 13 slides should be very familiar to readers of this blog. I touched on database diversity and the disk-speed barrier, after which I zoomed through a quick survey of the data warehouse DBMS market. But then I turned to material I've been working on more recently – practical advice directly on the subject of how to buy an analytic DBMS.

                        I started by proposing a seven-part segmentation self-assessment:

                        >>Continue reading "How to Buy an Analytic DBMS"


                        Posted Monday, December 22, 2008
                        5:54 AM
                        >>Comments


                        Hot Topics in High-Performance Analytics

                        For the past few months, I've collected a lot of data points to the effect that high-performance analytics – i.e., beyond straightforward query — is becoming increasingly important. And I've written about some of them at length. For example:

                        Ack. I can't decide whether "analytics" should be a singular or plural noun. Thoughts?

                        Another area that's come up which I haven't blogged about so much is data mining in the database. Data mining accounts for a large part of data warehouse use. The traditional way to do data mining is to extract data from the database and dump it into SAS. But there are problems with this scenario, including:

                        >>Continue reading "Hot Topics in High-Performance Analytics"


                        Posted Monday, November 17, 2008
                        10:03 AM
                        >>Comments


                        Getting to Answers on Oracle's New Hardware

                        I spent about six hours at Oracle last week — talking with Andy Mendelsohn, Ray Roccaforte, Juan Loaiza, Cetin Ozbutun, et al. — and plan to write more later. For now, let me pass along a few quick comments.

                        • The key philosophical point that I had perhaps been missing is that Oracle thinks there is and should be a storage (server) tier, just as there also are database (server), application (server), and web (server) tiers.

                        • Exadata cells are designed to never talk with each other. Instead, they talk to a set of Infiniband switches, which then talk to a grid of servers on the database tier. Oracle thinks this has solved its I/O bandwidth problem for once and for all. It's hard to see why that wouldn't be the case.

                        • What Exadata does on the storage tier in query execution is throw stuff away. Mainly, this is projection and restriction/SELECT. But if a join has been resolved on a small fact table, and Oracle is now filtering a fact table to match a value or set of values, the storage tier can do that too.

                        >>Continue reading "Getting to Answers on Oracle's New Hardware"


                        Posted Wednesday, October 22, 2008
                        12:27 PM
                        >>Comments


                        A Quick Guide to Teradata's Latest News

                        The Teradata Partners (i.e., user) conference is this week. So there have been lots of press releases, some presentations, lots of meetings, and so on. A lot of Teradata's messaging is in flux, as it moves fairly rapidly to correct what I believe have been some deficiencies in the past. One confusing result is that there was very little prebriefing about the actual announcement details, and we're all scrambling to figure out what's up.

                        Teradata does a good job of collecting its press releases at one URL. So without linking to most of them individually, let me jump in to an overview of Teradata news this week (whether or not in actual press release format):

                        >>Continue reading "A Quick Guide to Teradata's Latest News"


                        Posted Tuesday, October 14, 2008
                        11:51 AM
                        >>Comments


                        HP-Oracle Appliance Prices Estimated

                        I've been trying to figure out how much the HP-Oracle Database Machine and HP-Oracle Exadata Storage Server actually cost. My first estimate was $58-190K/TB (user data), but I've since updated my pricing spreadsheet. Specifically:

                        • The first page of these estimates have been modestly altered to reflect more chargeable software options, as per the discussion below.
                        • Accordingly, my new estimate for HP Oracle Database Machine list price is $5,546,000. Per-terabyte prices (user data) are $60K and $198K for the two configurations.
                        • There's a whole new second page, for Exadata configurations smaller than a full Oracle Database Machine. Most of the work on that was done by Bence Aratσ of BI Consulting (Hungary), who graciously gave me permission to post it.
                        • The lowest per-terabyte Exadata price estimates are about 20% lower than for the full Oracle Database Machine. The difference is due mainly to eliminating Real Application Clusters for a single-node SMP machine, and secondarily to rounding down slightly on server hardware capacity. But these are rough estimates, as neither Bence nor I is a hardware pricing guy.

                        >>Continue reading "HP-Oracle Appliance Prices Estimated"


                        Posted Friday, October 3, 2008
                        1:11 PM
                        >>Comments


                        HP-Oracle Hardware Parallelization Clarified

                        Some kind Oracle development managers have reached out and helped me better understand where Oracle does or doesn't stand in query and analytic parallelization. Let's start with the part everybody pretty much knows already:

                        • There are two parts to a parallelization story — how you get data off of disk, and what you do with it once you have it.

                        • To a first approximation, the best way to get a lot of data off of disk is in parallel, specifically with different CPUs talking to different disk drives. Until last week's announcement of Exadata, Oracle was the most prominent holdout against this view. (That dubious honor now goes to Sybase.)

                        >>Continue reading "HP-Oracle Hardware Parallelization Clarified"


                        Posted Wednesday, October 1, 2008
                        11:49 AM
                        >>Comments


                        Oracle Finally Answers Data Warehouse Challengers

                        Oracle, in partnership with HP, has announced a new data warehouse appliance product line, cleverly branded "Exadata." The basic idea seems to be that database processing is split among two sets of servers:

                        • (The new stuff) A set of back-end servers — the Oracle Exadata Storage Servers — that gets data off of disk and does some preliminary query processing.
                        • (The old stuff) A conventional Oracle RAC cluster on the front-end.

                        Numbers are being thrown around suggesting that, unlike prior Oracle offerings, the Exadata-based appliance at least has scalability and price/performance worth comparing to Teradata — hey, Exa is bigger than Tera! — Netezza, et al.

                        >>Continue reading "Oracle Finally Answers Data Warehouse Challengers"


                        Posted Thursday, September 25, 2008
                        1:49 AM
                        >>Comments


                        Vertica Spells Out Compression Claims

                        Omer Trajman of column-store DBMS vendor Vertica put up a must-read blog spelling out detailed compression numbers, based on actual field experience (which I'd guess is from a combination of production systems and POCs):

                        >>Continue reading "Vertica Spells Out Compression Claims "


                        Posted Wednesday, September 24, 2008
                        12:54 PM
                        >>Comments


                        Infobright Open Source Move Packs Potential

                        Infobright announced today that it's going full-bore into open source – specifically in the MySQL ecosystem — with the licensing approach, pricing, distribution strategy, and VC money from Sun that such a move naturally entails. I think this is a great idea, for a number of reasons:

                        >>Continue reading "Infobright Open Source Move Packs Potential"


                        Posted Monday, September 15, 2008
                        11:16 AM
                        >>Comments


                        Tradeoffs In Splitting DBMS Work Among MPP Nodes

                        I talk with lots of vendors of MPP data warehouse DBMS. I've now heard enough different approaches to MPP architecture that I think it might be interesting to contrast some of the alternatives. The base-case MPP DBMS architecture is one in which there are two kinds of nodes:

                        A boss node, whose jobs include:
                        - Receiving and parsing queries
                        - Optimizing queries, determining execution plans, and sending execution plans to the nodes
                        - Receiving result sets and sending them back to the querier
                        Worker nodes, which do their part of the query execution job and eventually ship data back to the head

                        >>Continue reading "Tradeoffs In Splitting DBMS Work Among MPP Nodes"


                        Posted Tuesday, September 9, 2008
                        12:16 PM
                        >>Comments


                        Why MapReduce Matters to SQL Data Warehousing

                        Greenplum and Aster Data have both just announced the integration of MapReduce into their SQL MPP data warehouse products. So why do I think this could be a big deal? The short answer is "Because MapReduce offers dramatic performance gains in analytic application areas that still need great performance speed-up." The long answer goes something like this.

                        The core ideas of MapReduce are:

                        >>Continue reading "Why MapReduce Matters to SQL Data Warehousing"


                        Posted Thursday, August 28, 2008
                        8:53 AM
                        >>Comments


                        David Raab Offers Kudos for QlikView

                        David Raab is a great fan and former reseller of QlikTech's QlikView. His recent lengthy post about the product (I hesitate to call it "detailed" only because he rightly observes that QlikTech is in fact stingy with technical detail) is positive enough to have been recommended by the company itself. Specifically, it was cited in the comment thread to my recent post on QlikTech, where David himself also addressed some of my questions.

                        But of course, no technology is perfect, not even one as great as David thinks QlikView is.

                        >>Continue reading "David Raab Offers Kudos for QlikView"


                        Posted Monday, August 25, 2008
                        8:18 AM
                        >>Comments


                        When to Use Modern DBMS Alternatives

                        If there's one central theme in my DBMS2 blog, it's that modern database management system alternatives should in many cases be used instead of the traditional market leaders. So it was only a matter of time before somebody sponsored a white paper on that subject. The paper, sponsored by EnterpriseDB (disclosure noted), is now posted along with my other recent white papers. Its conclusion — summarizing what kinds of database management system you should use in which circumstances — is reproduced below.

                        Many new applications are built on existing databases, adding new features to already-operating systems. But others are built in connection with truly new databases. And in the latter cases, it's rare that a market-leading product is the best choice. Mid-range DBMS (for OLTP) or specialty data warehousing systems (for analytics) are usually just as capable, and much more cost-effective. Exceptions arise mainly in three kinds of cases:

                        >>Continue reading "When to Use Modern DBMS Alternatives"


                        Posted Thursday, August 21, 2008
                        8:13 AM
                        >>Comments


                        Comparing Vertica, ParAccel and Exasol

                        I talked with executives at Nuremberg, Germany-based Exasol last week — at 5:00 am ET! — and of course want to blog about it. For clarity, I'd like to start by comparing/contrasting the fundamental data structures at Vertica, ParAccel, and Exasol. And it feels like that should be a separate post. So here goes.

                        >>Continue reading "Comparing Vertica, ParAccel and Exasol"


                        Posted Tuesday, August 19, 2008
                        9:00 AM
                        >>Comments


                        Patent Nonsense in the Data Warehouse DBMS Market

                        There are two recent patent lawsuits in the data warehouse DBMS market. In one, Sybase is suing Vertica. In another, an individual named Cary Jardin (techie founder of XPrime, a sort of predecessor company to ParAccel) is suing DATAllegro. Naturally, there's press coverage of the DATAllegro case, due in part to its surely non-coincidental timing right after the Microsoft acquisition was announced and in part to a vigorous PR campaign around it. And the Sybase case so excited one troll that he posted identical references to it on about 12 different threads in this blog, as well as to a variety of Vertica-related articles in the online trade press. But I think it's very unlikely that either of these cases turns out to much matter.

                        >>Continue reading "Patent Nonsense in the Data Warehouse DBMS Market"


                        Posted Friday, August 15, 2008
                        10:15 AM
                        >>Comments


                         




    Subscribe to RSS feed of all blogs