Kimball University: Eight Recommendations for International Data QualityLanguage, culture, and country-by-country compliance and privacy requirements are just a few of the tough data quality problems global organizations must solve. Start by addressing data accuracy at the source and adopting an MDM strategy, then follow these six other best-practice approaches. By Ralph Kimball August 1, 2008
Thomas Friedman's wonderful book, The World is Flat, chronicles a revolution that most of us in IT are well aware of. Our enterprises collect and process data from around the world. We have hundreds or even thousands of suppliers, and we have millions of customers in almost every country. Our employees, with their attendant names and addresses, come from every conceivable culture. Our financial transactions are denominated in dozens of currencies. We need to know the exact time in remote cities. And above all, even though thanks to the Web we have a tight electronic connection to all of our computing assets, we are dealing with a profoundly distributed system. This, of course, is the point of Friedman's book. Data quality is enough of a challenge in an idealized mono-cultural environment, but it is inflamed to epic proportions in a flat world. But strangely, the issues of international data quality are not a single coherent theme in the IT world. For the most part, IT organizations are simply reacting to specific data problems in specific locations, without an overall architecture. Is an overall architecture even possible? This article examines the many challenges surrounding international data quality and concludes with eight recommendations for addressing the problem. Languages and Character Sets Beyond America and Western Europe there are hundreds of languages and writing systems that cannot be rendered using a single-byte character set such as ASCII. The Unicode standard, of course, is the internationally agreed-upon multi-byte encoding intended to handle all the writing systems on the earth. The latest release, Unicode 5.1, encodes 100,715 characters in virtually every modern language. It is important to understand that Unicode is not a font. It is a character set. The architectural challenge for the data warehouse is to ensure that there is end-to-end support for Unicode all the way from data capture, through all forms of storage, DBMSs, ETL processes, and finally the report writers and BI tools. If any one of these stages cannot support Unicode, the final result will be corrupted and unacceptable. Cultures, Names and Salutations The handling of names is a sensitive issue, and doing it incorrectly is a sign of disrespect. Consider the following examples from different cultures:
Brazil: Mauricio do Prado Filho Are you confident that you can parse these names? Where does the last name start? Is Frances male or female? Some years ago, my title was Director of Applications. I received a letter addressed to "Dir of Apps", which began with "Dear Dir." I didn't take that letter very seriously!
|
New on the BLOG
Is Gartner's Quadrant the Problem, Or Is It How It's Used?
02. 8.2010
Read more from Cindi Howson >>
Add "customer" to Jimi Hendrix' song title and you have a question central to last week's Clarabridge Customer Connections (C3) conference, Are You Customer Experienced? 02. 5.2010 Read more from Seth Grimes >> Quick Thoughts on Sybase/Aleri 02. 4.2010
Read more from Curt Monash >> Most Popular This Week
Intelligent Enterprise Newsletters
Subscribe Here:
| ||||||||||||||||||
|
|




