Indexing Engines for Web sites.
[ This is a very old
document. For a more recent review, see Comparing
Open Source Indexers. Also see the Java
World article on Lucene]
There are several tools
available that make indexing a site relatively easy. It
is no longer sensible to write your own search gateway;
it makes a lot more sense to use somebody else's.
Probably the most widely
used indexing software is Excite for Web
Servers. It produces high quality results;
the search interface can be customised; it is very easy
to set up; and it is free.
Other indexing software
in the public domain includes Harvest
and its derivatives and Swish.
A partial list of
indexing software follows.
Excite for Web Servers
(EWS)
EWS is an application webmasters and web server
administrators can download and install on their web
servers. EWS provides intelligent, concept-based
searching of the HTML and ASCII documents which are
locally stored on their web server. You must be root
(superuser) on your system to run Excite for Web Servers.
The Harvest Information
Discovery and Access System
An integrated set of tools to gather, extract,
organise, search, cache, and replicate relevant
information across the Internet. With modest effort users
can tailor Harvest to digest information in many
different formats from many different machines, and offer
custom search services on the web. Netscape's Catalog
Server is based on the Harvest design.
Glimpse
a very powerful indexing and query system that
allows you to search through all your files very quickly.
It can be used by individuals for their personal file
systems as well as by organizations for large data
collections. Glimpse is the default search engine in
Harvest.
WebGlimpse
WebGlimpse adds search capabilities to your WWW
site automatically and easily. It attaches a small search
box to the bottom of every HTML page, and allows the
search to cover the neighborhood of that page or the
whole site. With WebGlimpse there is no need to construct
separate search pages, and no need to interrupt the users
from their browsings. All pages remain unchanged except
for the extra search capabilities. It is even possible
for the search to efficiently cover remote pages linked
from your pages. (WebGlimpse will collect such remote
pages to your disk and index them.) Installation,
customization (e.g., deciding which pages to collect and
which ones to index), and maintenance are easy.
SWISH
SWISH stands for Simple Web Indexing System for
Humans. With it, you can index directories of files and
search the generated indexes. SWISH was created to fill
the need of the growing number of Web administrators on
the Internet - many current indexing systems are not well
documented, are hard to use and install, and are too
complex for their own good. Written in C.
Hyperlative Ltd. ©1997
|