Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Home
Digital Library
Events
RSS | Newsletters
Webcasts




May 31, 2003

Is Hand-Coded ETL the Way to Go?

Absolutely yes, or absolutely no, depending...

by Gary Nissen
edited by Ralph Kimball

Continued from Page 1

Just How Simple Is It?

I use the following guidelines on my projects to keep the process simple enough so that no one feels the need to cheat. These guidelines don't fit every time, but probably work for 80 percent of our projects.

The requirements should be about three to five pages, but definitely no more than 10. Gather requirements in four to eight hours of meetings or interviews. Ask questions such as "What are your top three administrative requirements for this ETL system?" and "What are the most difficult things you currently do, and what can you not do, but wish you could?" You should complete your requirements document in about five calendar days. Anything more might indicate that your project is too ambitious or not fully understood, or that you're trying to do too much too early.

Your design documents should be five to 20 pages and take about a week for the development team to complete. The data flows for each table in your data warehouse take up most of this document. Each target table can be described in one or two pages. Use a diagramming tool if possible.

Developer testing should include a clean install, a complete configuration, loading a representative set of data sources, and delivery of required final data structures within specified processing windows. On the second or third iteration, you should introduce into the testing any "surprises" you anticipate will crop up after deployment, such as corrupted data. Complete this process several times throughout the development phase. If possible, automate the process so you can run it more frequently.

Developers should either write the documentation or at least be very involved in it. The documentation should be brief and simple, focusing on the primary users of the system — database administrators, IT managers, and future developers. Take time to describe what each needs to know. But, again, don't go overboard.

During acceptance testing, expect the target ETL administrative users to spend significant time using the system the way they will after delivery. I always make a point of saying, "We take our responsibilities seriously and we expect the same from you."

Online Exclusive

For the relative advantages of each option, see "Tool-Based vs. Hand-Coded ETL."

If everyone does a good job, only a few problems will exist by the time you get to the acceptance testing phase. Fix the problems, let the target users retest, and deploy.

What About the Hand Coding Question?

Look back at the original question. Do you now see why it surprises me? You can't know whether it's appropriate to hand code the ETL system until you understand the project's scope. The development team makes this decision after the project scope and requirements are well understood. At this point, you may still have questions regarding the right ETL development method. But the questions will be more detailed and specific. For example:

  • We need to make an ETL system that takes data from only one packaged application and will be distributed to many customers. As part of this system, the ETL needs to have a lot of configuration options. I don't think I can address this requirement with a tool-based ETL system. Can I?
  • We're building a small data warehouse with a little database and a small budget. The ETL programming seems straightforward and we have programmers on staff who can handle it. The tools available all cost about twice our development estimates. We think we should hand code the ETL to save money. Everyone tells us we absolutely need a tool. Are we missing something big here?
  • We've designed a relatively large data warehouse. Even though the ETL processes are pretty complex, we're inclined to write these processes by hand because the tools cost so much. Do the tools really save enough time to pay for themselves? Where does the big payback come from?

So the point is that you need to understand your project and requirements before you can decide. Although hand-coded ETL isn't a highly regarded approach, don't let this industry bias affect you. Evaluate your options independently and choose the best fit for your project. For the relative advantages of each option, see the sidebar, "Tool-Based vs. Hand-Coded ETL."



Rate This Article

Comments:

Optional e-mail address:

Why the Opposition to Hand Coding?

Many people in the data warehouse industry believe that ETL processes should not be hand coded. They say that hand-coded systems are more difficult to maintain, developers never produce documentation, and everything slowly migrates to a convoluted mess. But the problem isn't that hand-coded systems deliver poor results. Poor project management and the lack of enforced discipline deliver poor results. It just so happens that when a project isn't adequately analyzed or managed, the hand-coded route is frequently taken, but the project isn't managed as a serious software development effort should be managed. The real costs and issues aren't understood, so the ETL tool cost always looks out of proportion to the perceived need.

Just because poorly run projects are hand coded, doesn't mean that tool-based ETL is always the right choice either. There are no silver bullets that will make your project successful — except hard work and a disciplined approach.


Gary Nissen [garyn@healthplustech.com] has been involved in analytics for managed healthcare since 1989 as a software developer, project manager, independent consultant, and software vendor.








IE Weekly Newsletter
Subscribe to the newsletter
    Email Address