TODO ---- General development directions * More various transport protocols support. * More various APIs. e.g write Java class with libdpsearch support. * Support for huge databases with hundred or thousand millions documents. * Make it more managable, i.e. administration tools, etc. Search quality and results presentation --------------------------------------- * Click rank * Administator defined dynamic site priority: - approved sites which should be displayed in the top of results; - disapproved sites (e.g. for abuse) which should not be displayed. * Take in account words context: , , and so on. * Optional automatic URL limit by SERVER_NAME variable. * "Exclude" limits, for example "to search though everything except given site": ue=http://esite/ * Rank URLs with long pathnames lower than direct hits on let's say a domain name with no directory path. Indexing related stuff ---------------------- * Detect clones on site level. Currently it is implemented on page level only. The idea is to detect that site being indexed is a mirror of another site without having to index all pages but after indexing several pages only. * SPAM clearance. * Fix that indexer bacame slow when ServerTable is big. This is because of full consecutive examination. Make in-memory cache for ServerTable part. * Fix that "posgreSQL.org" and "posgresql.org" are considered as a different sites. * FTP digest ls-lR.gz support. For example,ftp://ftp.chg.ru/ls-lR.gz * Make it possible for external parsers to return converted content together with headers like Content-Type, Title and so on. Charset related stuff --------------------- * Remove "ForceIISCharset1251 yes/no"command. Replcase it with enhanced "CharsetByServer [...]" commmand. * Stateful character sets support: UTF-7, Asian ISO-2022-XX and others. They will not be used as a LocalCharset because of much space, however indexer should be able to index them, as well as search frontend should be able to use them as a BrowserCharset. Misc ---- * Smart search results cache cleaning after reindexing. * Make it possible to set table names in indexer.conf and search.htm * Learn about dublin core. A simple set of standard metadata for web pages. http://www.searchtools.com/related/metadata.html#dc * Add curl library support. * Optimization for clusterisation. Portability and code quality ---------------------------- Remove warnings on various platforms. Currenly it is built without warnings on Linux and FreeBSD with these CFLAGS: -Wall -Wconversion -Wshadow -Wpointer-arith -Wcast-qual -Wcast-align -Wwrite-strings -Waggregate-return -Wstrict-prototypes -Wmissing-prototypes -Wmissing-declarations -Wredundant-decls -Wnested-externs -Wlong-long -Winline However some other platform compilers do produce warnings. For example, mixed signed/unsigned chars on NetBSD Alpha compiler. Please report those warnings and suggetions to fix to maxime@sochi.net.ru