Brace for IncomingImporting external data deserves its own full design and development life cycleBy Warren ThornthwaiteEdited by Ralph Kimball Continued from Page 1 SETUP PROCESSMany companies use two firewalls, as shown in Figure 1, to provide a double layer of defense. In this example, the FTP server resides between these firewalls in an area network engineers call the DMZ. Internet security is best left to the experts. Make sure you involve one in setting up your server. Once you get the FTP server set up, create a user account and password. Limit the user account to the directory assigned to this partner. Securely send the name and password to your data partner. At this point, your data partner can create a test file and send it over to the directory. Note that your data partner will have to create a complete, robust extract program with error and exception handling to generate the proper set of data on schedule. Their part of the process is very much like the standard extract portion of a typical data warehouse ETL process, along with the added complexity of cross-organizational data transfer. Meanwhile, you need to set up a process that will monitor the directory to see when the file shows up. You can improve the monitoring process by agreeing on a verbose notification system with your data partner. When the file arrives, your process will need to run the file through data validation before it gets loaded. This validation includes making sure the file format is correct and the contents are as expected. It is a good idea to send a checksum along with the file to verify that you received the entire data set. EXCEPTION HANDLINGYour process has to handle all kinds of potential problems. If the file doesn't arrive within a certain time period, the process should begin to send out notifications, both to you and your data partner. There should be an escalation procedure in place if the data doesn't arrive within a specified interval. Next, the process needs to deal with any content validation failures by either stopping the load or writing reject rows out to a suspend file. Finally, after the load is complete, you need to update the log files and send a success notification both internally and to your data partner. DECEPTIVE SIMPLICITYThis kind of data integration can be extremely valuable in improving your users' understanding of their business. It can also make the data warehouse the sole source of information to be found anywhere by both partners. However, do not be fooled by the innocent simplicity of the FTP "get" command. For data integration to work reliably, it needs to be a full system effort that involves multiple independent organizations, with its own infrastructure and control systems. Because the value of external information is so great, and all this processing can be a bit sticky, in the future you should keep an eye open for new companies that may provide secure, scalable platforms that let you start your data integration process without having to build everything from scratch. Warren Thornthwaite [warrent@infodynamics-LLC.com] began his data warehouse career at Metaphor in 1984. He is a coauthor of the Data Warehouse Lifecycle Toolkit (Wiley, 1998), a Kimball University instructor, and cofounder of InfoDynamics LLC.
|
Most Popular This Week
IE Weekly Newsletter
Subscribe to the newsletter
|
|
|



