Intelligent Enteprise

Welcome Guest. | Log In| Register | Membership Benefits

Intelligent Enterprise

Better Insight for Business Decisions

Intelligent Enterprise - Better Insight for Business Decisions
search Intelligent Enterprise
Home
Digital Library
Events
RSS | Newsletters
Webcasts


December 21, 1999, Volume 2 - Number 18



Search Engines in the Age of Knowledge

Search engine technology has a potentially important role in the knowledge management process, but only if it attains more intelligence and proactivity

By Carl Frappaolo and Mark Tucker



Search engines are a relatively ancient technology. One of their first commercial applications, in the 1970s, was in an internal IBM system for searching voluminous text files created by corporate lawyers defending the U.S. government’s anti-trust suit. By the ’90s, because search engine technology had matured with a narrow focus on query term-to-text matching, it had achieved relative anonymity. Indeed, what started as a standalone technology had been subsumed by document management product vendors as a component part of their applications. After the inevitable consolidation that occurs in mature markets, the number of text search engine vendors dropped from several dozen to just a handful of well-known companies such as Fulcrum (now PCDocs/Fulcrum), Verity Inc., and Excalibur Technologies International Ltd.

By the mid-’90s, text retrieval was a technology in search of a new identity, looking for new mountains to climb. Conveniently, those mountains — the World Wide Web and knowledge management (KM) — presented themselves along two somewhat convergent paths.

Among its many influences, the Web introduced to the masses the challenge of working amidst an overwhelming mass of textual information. Whereas online text search and retrieval had once been the province of specialists — researchers, librarians, and so on — the Web made everyone a researcher, and in so doing made us all long for better tools to help us do our jobs. This requirement became even more critical in organizations where the Web’s internal cousin, the intranet, became the repository of choice for unstructured information (documents). But although the Web and the intranet represent a potentially limitless library of documents, the core technology involved has no inherent, intelligent means of effectively cataloging its contents.

Initially, the unprecedented challenge of searching and retrieving documents from a library as large as the Web focused attention on retrieval speed and breadth. Users were led to believe that the value of a query engine lay in its ability to enhance recall: to find as many documents as possible, from as many sites as possible, in the shortest period of time. These challenges, however, are formidable and blind to the goal of search precision; you can make no more sense of “3,652 documents found” than you can of the entire Web (or even an intranet). It soon became clear that the focus should be not only on processing power, but also on the ability to intelligently discern the essence of the query as a first step to ensuring an accurate result set.

As more and more information became available in electronic form, we soon realized that although “more is good,” at some point, “enough is enough.” This realization reflects the classic tension between recall and precision, the need to know that you have found all possible information sources without having to look at a great deal of unrelated and unusable ones. Today’s knowledge worker does not lack for information, but certainly lacks the time to look at it all. But the inherent flaws in traditional search engines — their lack of precision, high degree of subjectivity, and rigidity — make traditional text retrieval useless in such a high-volume environment. These search engines help us achieve the goal of finding more, but leave us with the responsibility of knowing when we have found enough.

The emergence of KM, both as a business practice and as a technology, is largely driven by the desire to find that magical point on the continuum between precision and recall, the point where we have found “enough” from the many sources available to us to make business decisions. But the truth is that we do not use KM to “make” decisions for us. Rather, we use it to augment our own ability to make decisions, as a tool to bring additional relevant knowledge to bear on a particular element of the decision-making process. The subject of KM is broader than our discussion here, but keeping this broader context in mind will be helpful.


FIGURE 1 The knowledge chain.

The Knowledge Chain at Work

Innovative, responsive organizations tend to excel in four areas: external awareness, internal awareness, external responsiveness, and internal responsiveness. These areas are what form the knowledge chain (see Figure 1); they imply that the organization has the following capabilities:

External awareness: to understand and monitor the outside environment, such as competition, government regulations and direction, customer demands and opinions, and market trends.

Internal awareness: to understand its own composition, such as core competencies of the organization, past successes and failures, and strategic goals and directions, as well as the solutions behind them.

Internal responsiveness: to leverage its internal resources to affect or respond to the external environment, meet new opportunities, and overcome obstacles.

External responsiveness: to measure external factors and respond to them in a timely manner.

Based on this perspective, search engines clearly have a defined role in the KM picture: They address the problems that arise in the area of internal responsiveness, which is largely about applying the collective wisdom of the organization to new and different opportunities. We can most easily define “collective wisdom” as the sum total of the explicit (documented, captured) and tacit (experiential, uncaptured) knowledge of all the organization’s employees, its internal systems and databases, and other available information resources. During the downsizing fervor of the early ’90s, many organizations placed too much emphasis on information systems and databases, ignoring the intrinsic value of the knowledge embodied in their employee resources. Indeed, one of the clearest challenges for any KM practice is to bring to bear the right combination of all these resources, as quickly as possible, to solve the organization’s problems. Search engines can help us reach that goal.

FIGURE 2 A window onto the knowledge-scape.

Understanding Your Knowledge-Scape

Another way to look at KM is to classify knowledge itself in terms of the organization’s grasp of its knowledge problems. As Figure 2 shows, a useful metaphor for knowledge awareness is to describe it as the organization’s knowledge-scape, a four-paned window that “looks out” onto the knowledge landscape. Although represented here as four boxes of equal size, in reality, each organization’s knowledge-scape will have window panes of different sizes. In general, most organizations that are struggling with KM will find that the two panes at the bottom — DKK and DKDK — best describe their current state of affairs. The size of the DKK pane is roughly proportional to the amount of knowledge embedded in the organization’s processes, people, and documentation that currently is inaccessible, or at best difficult to find. The DKDK pane is largely a reflection of the organization’s overall lack of awareness of its own knowledge needs. Ultimately, your KM initiatives should be making these panes smaller. If they’re not, you’re wasting your investment in KM.

The KK and KDK panes are the ones you are trying to increase in size through better KM, and this is where search technology comes into play. Certainly, it is easy to see how a search tool can increase the size of the KK pane and why that would be a good thing to do. But what about the KDK pane — isn’t it counterintuitive to say this pane should be bigger? In a word, no. If you’re in a setting where KM is drastically needed — the fast changing, high pressure, innovation-driven work environment — you may not have all the knowledge available to make every decision. Here, it is just as critical to find out quickly and with certainty that you don’t know what you need to know, as it is to discover the knowledge that you do have that can help to make those same decisions.

Think again about the capabilities of the traditional search engine. What if you were to actually take the time to weed through those “3,652 documents found,” only to discover that you wasted your time because none of them helped answer your question? The more certain you are that you’re looking at the most relevant knowledge available, the more certain you’ll be that those cases where you do not find what you are looking for represent actual cases of “knowing that you don’t know.” Reaching this realization earlier in the decision-making process than later provides additional time for research that can fill these knowledge gaps in your organization.

What you need is a tool that can help to shorten these gaps between knowledge need recognition and true knowledge discovery. But don’t expect to find such a capability in traditional search engines. It should be clear by now that a fundamental gap exists between the capabilities of traditional search and retrieval tools and the needs posed by modern KM. Fortunately, search and retrieval has evolved from its roots as a keyword-matching tool into a more robust tool for knowledge retrieval.

Finding the Knowledge Within

Traditional search tools scored early successes in our robust, wide-open information access environment, but information access alone is not enough. We don’t want to be miners of information, but rather discoverers of knowledge. Indeed, what we’re trying to do is see through the information to find the knowledge captured and implied within the documents, between the documents, and within our experiences, queries, perspectives, and so on. Indeed, KM raised the ante for search engines by changing the nature of the search itself, in two important ways.

First, the emphasis shifted from content-based retrieval to context-based retrieval. The former is the familiar domain of traditional text search engines, where a query such as “Find all documents containing the phrase ‘intelligent search’” will return a candidate list of documents that explicitly contain only that phrase as defined in the query. The latter, context-based retrieval, has become the new high ground for search and retrieval. Here, no one-for-one matching between query terms and document content is necessary; a true context-based tool understands the context of the query itself (what the user is looking for) and the ways in which that query relates to the underlying information repositories. If you were to enter a query such as “I want to learn more about engineers and trains,” the search engine would understand that you want to learn more about the people who drive railroad trains, and therefore would not return documents on subjects such as education or electrical engineers.

Second, search engines (or, more precisely, the applications that use them) are increasingly recognizing that the knowledge you’re looking for is not to be found solely in a document, database, or Web site, but rather in some expert’s head. To effectively manage access to this expertise, systems are including information about the people who have the knowledge to contribute: the experts we rely on but often struggled to find. The idea of using search engines to find people may seem a little odd at first, but in essence, you’re matching descriptive information about each expert’s experience with the larger body of explicit, topical knowledge found in traditional information sources, using a single contextual analysis tool. (One example, the Autonomy Knowledge Visualizer, is shown in Figure 3.)
FIGURE 3 Autonomy Knowledge Visualizer.

The result is a view into your organization’s collective wisdom — its explicit information resources and its people — selected from, and grouped according to, its relevance to a particular knowledge need. The “3,652 documents found” result shrinks to a more manageable number because contextual analysis removes from the candidate list the unrelated documents found by keyword searching. And by providing a link to experts who know something about the knowledge topic at hand, you get a more complete picture to start from; if the documents found do not answer all your questions, you already know who to ask for additional insight or clarification. In a practical sense, there is nothing magical about any of this; reading and asking is how you go about your work on a daily basis. But through the application of context-based searching, you get a clearer idea of what to read and who to ask, and, just as important, what not to read and who not to ask.

The Technology of Knowledge Search

Many traditional vendors, such as Fulcrum, Verity, and Excalibur, have made dramatic changes to their products, and new vendors such as Semio Corp., Autonomy Inc., and Inktomi Corp. have released a new breed of product to meet the demands of context-based retrieval. These systems include features such as automatic document abstracting, semantic analysis, and agent technology to propagate searches across multiple environments, combine them into a single set of findings, and then produce a new subset of information to propagate another search.

Document abstracting offers benefits on two levels. First, it addresses the fact that much of today’s information retrieval occurs in a networked environment where transmission speed and bandwidth may be an issue. A document abstract is a smaller version of the document (typically 60 percent smaller in size) that conveys the essence of the full document content. Systems that produce abstracts provide them as the first level of detail to requested documents, thereby minimizing network traffic. Based on evaluation of the abstracts, you can make informed, intelligent decisions as to which documents’ full content you want to view. Second, abstracting technology enables creation of virtual documents, a conglomeration of two or more documents structured in a manner that focuses attention on the salient points across the query set.

A semantic network is a series of intersecting sets that tracks the meaning of words, their relationship to other words, and the meaning of word phrases. By default, such a network supports conceptual analysis because it understands that changing the order of words changes their meaning. (A semantic network would, for example, understand that “State,” “Art,” and “State of the Art” each have different meanings.) Semantic analysis relies on an underlying lexicon of words and phrases that has to be refined and maintained over time.

An agent is basically a query that has been submitted for continual background processing, resulting in the automatic notification of status changes relevant to your expressed query. You might, for example, be alerted to changes in a Web site (edits to existing pages or the insertion of new pages) relevant to the interest profile expressed in your query. Agents are particularly useful when the targeted information source is dynamic in nature; the agent lets you maintain constant vigilance on the site without having to re- issue queries proactively from time to time.

Getting access to the right knowledge sources is essential in all the approaches. The diversity of these sources is impressive: the Web, intranets, document management repositories, email message repositories, groupware databases (Lotus Notes and Microsoft Exchange, for example), LAN file servers, and even structured databases. We are even seeing the emergence of a new class of topical or vertical industry repository that aggregates information according to subject area. These information aggregators use advanced categorization technology such as the Inktomi Directory Engine to assemble related content by subject, which they then provide access to on a fee or subscription basis.

The number of available resources continues to grow, and with that growth comes the attendant need to ensure that no matter how many places we have to look, we want to use only one search engine to do our searching. The ability to run a single query against multiple sources and aggregate the search results into a single candidate list has quickly become a core requirement for any search engine.

The Future of Search Engines

Traditional search and retrieval technology, which was based on content-based (keyword) retrieval, is useful when applied to a single information repository by people who understand the content and know how to use the tool. But such a narrow application of search and retrieval hardly exists; today’s knowledge seekers are mostly casual users, not professional researchers or librarians, and although their job may be to apply relevant knowledge in useful ways, they are certainly not paid to find it. And yet increasingly, this is exactly what we force these people to do: Search a vastly more complicated array of ever-increasing information resources, using tools that are inadequate to the task at hand. Traditional search and retrieval doesn’t cut it in this environment, because no one has the time to manually work around the limitations it imposes.

Without the ability to keep pace with the production of information through superior access methodologies, we are not amassing information to bring about control — we are actually losing control and not benefiting from the embedded knowledge contained within this information. The need to improve KM has changed the face of search engines forever, through the addition of intelligence that allows the engine to do more than just respond to queries. Tools are beginning to take on the behavior of an active knowledge partner, monitoring the work environment, and automatically and proactively categorizing content and suggesting new resources to users before they know these resources are there.

This definition of the search engine as an active knowledge partner, and not simply a tool, more accurately reflects the challenges faced as search engines evolve within the framework of KM. This evolution is helping us realize a vision in which KM exists in the background — not as something we consciously think about, but rather as a natural, ongoing part of the work environment. If this idea sounds far-fetched, keep in mind that just five years ago there were only about 50 sites on the Web, and no way to find any of them.



Mark Tucker (met@delphigroup.com) is a senior analyst with The Delphi Group in Boston specializing in document and knowledge management applications. You can reach him by phone at 617-247-1511.

Carl Frappaolo (cf@delphigroup.com) is a cofounder of The Delphi Group. He has designed document- and knowledge-based applications for many Fortune 500 companies and major government agencies. You can also reach him at 617-247-1511.


RESOURCES

Autonomy: www.autonomy.com

Excalibur: www.excalib.com

Inktomi: www.inktomi.com

PCDocs/Fulcrum: www.pcdocs.com

Semio: www.semio.com

Verity: www.verity.com


 





IE Weekly Newsletter
Subscribe to the newsletter
    Email Address








Enabling People and Organizations to Harness the Transformative Power of Technology