A Structural Look at Search

Essays in this search series:

As you examine the many possible characteristics of searches, it should become clear that many of these characteristics describe different aspects of search. If you look at search as being a transaction, where a user creates what is essentially a question, asks that question of a source of information, and then gets back an answer, then you'll see that search involves layers of communication.

Comparing characteristics or functions across layers is usually not be very useful. For example, the statement "this search is stemmed" refers to an entirely different layer of search than does the statement "this search has multiple input fields"; both statements are important descriptions, but the characteristics are unrelated. it's like the difference between saying that you're looking for a red book, and saying that you're looking for a book about dogs.

When evaluating the effectiveness, usability, or performance of a search system, avoid comparing the characteristics or functions from different layers. Focus on characteristics on the same layer, and focus on how the various elements of that search system meet the needs of the particular users.

This essay presents a structural view of search, setting the stage for a discussion of the various types of search. I don't argue that this is the only way to view search structure, only that this is a useful way to view search, especially for examining the quality and usability of a search system.## The Components or Levels of a Search

When you type couple of words into a form and click on the search button, several different things happen. Every search task -- that is, a user employing a search mechanism to perform a query or lookup -- consists of several different stages, or perhaps layers is a better word.

The following diagram shows a simplified view of the steps in creating a search query and receiving results back.

Diagram 1: Layers in a Typical Search breakdown of levels within search

Layer 1: The Search Interface

The first stage in any search is the user interface. Most of the searches on the web are mediated by forms built on top of an application or operating system's command prompt input; in other words, a search form provides a

This layer in the search transaction should guide the user through the creation and submission of an appropriate query. The interface should explain how to use the form, and set expectations about what will happen to the query after the form is submitted. A search interface should address the following kinds of concerns (some of these points will overlap):

How do I use the form?
The search form should explain which elements are required, which functions are modal (for example, does a checkbox change how the query is treated, like a "make this search case sensitive" option), and if there is a particular formatting required for the query ("type in the product number without spaces or dashes"). Instructions may range from very complete help text to simple input field labels.
What kinds of things can I enter?
The search form should explain what kinds of input should be entered for creating effective searches. For example, if the engine is optimized for a certain kind of vocabulary -- say, medical jargon -- the interface should make that clear. Or if numbers must be spelled out, and not in numeral form, that should be explained. Can I enter in partial words if I don't know the whole word, or the right spelling?
What am I searching against?
What kinds of information am I querying? If I don't the name of a product, but do know what it's used for -- "I don't remember what it's called, but it's that little ball thing that goes on the end of a rapier so you can't accidentally stab somebody" -- can I search on that?
What additional query logic can I specify?
The search form should explain what kinds of logic, if any are supported. For example, Boolean logic to specify the relationship between search terms.
What's the scope of the search?
What collection of information will the search query against? If you are searching a sporting goods site, will your search look at only roller blades or only bicycles, or both? Is there a date range I can specify? A language?
How will my query be modified?
Will the search engine treat my query a certain way? For example, if I type in several keywords, will they be _AND_ed or _OR_ed?

Yahoo's advanced search options page is a good example of how an interface can provide a range of options for creating the query. While this is an interface for a search against document collections, and not a product catalogue, it does demonstrate how an interface can guide the user. This interface (as of June 26, 1999) includes the following cues and instructions:

  • Directly beneath the input box are two radio button options for specifying the scope of the search: Yahoo! or Usenet.
  • In a separate grouping are four radio buttons for "Select a search method": Intelligent default, An exact phrase match, Matches on all words (AND), and Matches on any word (OR).
  • In separate grouping is set of options for "Select a search area": Yahoo Categories and Web Sites. This seems to be yet another way for specifying scope, though I'm not sure about the difference between these and the options under my first bullet point.
  • On a separate line is an option to select the age of the items, which is another way of specifying the scope of the search.
  • The form also has an option to specify the number of results per page to be displayed.

Layer 2: The Query Preprocessor

The simple truth about web searches is that the interface is at least one step removed from the query, because the user is rarely allowed direct access to the database or collection; users aren't allowed to log in to the operating system or application that's running the information. The search form passes the user's query from the form to the search system, but beyond this necessary mediation between the user and the collection, the search system may perform additional logic on the user's query.

The search system may tweak the query to change its chances for success, increase its scope, or to translate it to a format more suited to the collection or collection architecture – before running the query. I call the mechanism through which this modification happens the query processor.

For example, the query processor may take the individual words of the query and stem them to increase the number of potential "hits". Fuzzy logic algorithms may be applied to handle typos. Or the words may have Boolean logic applied to them so they are _AND_ed or _OR_ed.

Layer 3: The Query Execution Against the Collection or Index

The third layer involves the core work of the search: the looking up of information and the returning of raw results. The final query runs against the collection, or more likely, against an index built from the collection. The structure, architecture, and organization of the collection, whether documents or database, all can affect how the search query is performed; likewise, the logic and strategy used to build an index also affects the lookup.

An example of how database structure directly affects search:

With databases, adding new information to a database requires inserting new rows to hold the information. Most databases maintain some kind of sort order for their contents, a simplified example of which might be an alphabetical sort order by brand name. If I use a database product that inserts new information at the bottom of a database, that is, new rows are appended to the end of tables, then on addition of new information the database potentially becomes out of order and must be re-sorted to put my new information where it belongs in the sort order. This affects search because until the database is re-sorted, if the database processes queries sequentially down the rows, then a query won't encounter my new information until the end of the query run. Some search systems also have a limit to the number of results that can be returned, which means that until re-sorted, my new information may not even be "seen" by the query.

An example of how index design directly affects search:

Layer 4: Results Processor

The query returns raw results that usually need to be massaged to optimize their value for the user and/or the business. This layer of search is not always distinguishable from the query itself, but treating this as a distinct layer helps separate issues of query performance and accuracy from issues of results sorting, formatting, and relevance. This layer describes any logic that may be applied to raw results before passing those results to the user.

Some examples of how raw results may be tweaked:

  • results may be ordered according to specific rules and logic particular to the site
  • page and results item formatting may be specified according to the types of results found
  • relevancy may be computed according to schemes particular to the site
  • matching strings may be highlighted in the results

Layer 5: Results Page

A search results page is every bit the interface that the search form is, and often provides mechanisms for re-sorting results and refining the search.

Search results are usually presented to the user according to a set of rules particular to that system. Search results are usually sorted and ordered by some criteria; items may be sorted according to alphabetical or numerical order, ordered by a logical or contextual grouping scheme, ranked according to some scale of importance or priority, or even organized according to some user-specified parameter(s).Some examples of the kinds of features and information that may be found on search pages:

  • multiple ways to re-sort the search results
  • ways to specify the kind of information shown for results items
  • relevancy flagged strings -- for example, if you searched for quality, a results page that flags for relevancy might have every instance of that word in the results bolded
  • a search form so you can re-search from the results page
  • a search form that allows you to refine the search