TABLE: Building Taxonomies: Leading Tools & Options
Every day, we're presented with new content and documents that must be stored for easy retrieval and later consumption. The amount of information available to us today is greater than it has been at any point in history, yet we're expected to be able to share this data at a moment's notice.
Enter taxonomical classification: the process of organizing information logically. You're probably using a taxonomy already — perhaps storing your work in a hierarchy of folders on your hard drive or a networked folder.
In business, content classification presents new challenges because there is so much information to classify and so many contexts in which a document may be relevant. The more diverse and voluminous the information, the more creative organizations must be in developing classification and indexing schemes.
The root of the problem is that business information exists in many forms. Along with conventional structured data from corporate databases, volumes of unstructured information reside in documents, such as Microsoft Office files, Portable Document Format (PDF) files, presentations, graphics and videos. This information may cover hundreds or thousands of subjects, have many authors, and have been created in different contexts for a variety of audiences. Finding exactly what you need in all this information can be difficult, time-consuming and expensive. Up to 35 percent of the typical workday is consumed by the search for information, according to recent studies by technology research firm IDC and search and taxonomy vendor Verity.
Classification: The Three Typical Approaches
In many organizations, documents are stored on a networked server or file system, and organized within folders in a hierarchical fashion established by the IT department. Although this structure provides a predictable way to find and store information, it doesn't help users who are seeking information available on documents unknown to them.
To solve this problem, many businesses deploy enterprise content management (ECM) systems in the hope that centralization of an organization's content along with enforcing the assignment of metadata to content will simplify the task of finding information. Although it is true that ECM systems bring order to content chaos, you can get more accurate and more efficient information retrieval if you plan your classification and taxonomy strategy. Without an accurate picture of the organizational structure of documents across the entire organization — and more important, an idea of how content interrelates and the context in which it will be searched — your ECM efforts will be far less rewarding than they could be.
Creating a taxonomy is the process of classifying information (and the associated metadata that further describes the information) according to a logical system. The resulting structure provides a framework for information retrieval. Taxonomies may be imported into or referenced by other applications when searching, storing or retrieving information. There are several ways to create taxonomies, but most organizations build them manually, buy pre-existing taxonomies or apply automated taxonomy/ classification tools to their data. Each approach has advantages and disadvantages.