routines

SYNOPSIS

       #include <dbz.h>

       bool dbzinit(const char *base)

       bool dbzclose(void)

       bool dbzfresh(const char *base, long size)

       bool dbzagain(const char *base, const char *oldbase)

       bool dbzexists(const HASH key)

       off_t dbzfetch(const HASH key)
       bool dbzfetch(const HASH key, void *ivalue)

       bool dbzstore(const HASH key, off_t offset)
       bool dbzstore(const HASH key, void *ivalue)

       bool dbzsync(void)

       long dbzsize(long nentries)

       void dbzgetoptions(dbzoptions *opt)

       void dbzsetoptions(const dbzoptions opt)

DESCRIPTION

       These functions provide an indexing system for rapid random access to a
       text file (the base file).

       Dbz stores offsets into the base text file for  rapid  retrieval.   All
       retrievals  are keyed on a hash value that is generated by the HashMes-
       sageID() function.

       Dbzinit opens a database, an index into the base file base,  consisting
       of  files  base.dir  ,  base.index  ,  and base.hash which must already
       exist.  (If the database is new, they  should  be  zero-length  files.)
       Subsequent  accesses  go  to  that database until dbzclose is called to
       close the database.

       Dbzfetch searches the database for the  specified  key,  returning  the
       corresponding  value  if any, if <--enable-tagged-hash at configure> is
       specified.  If <--enable-tagged-hash at configure> is not specified, it
       returns  true  and content of ivalue is set.  Dbzstore stores the key -
       value pair in the database, if <--enable-tagged-hash at  configure>  is
       specified.  If <--enable-tagged-hash at configure> is not specified, it
       stores the content of ivalue.  Dbzstore will fail unless  the  database
       files  are  writable.   Dbzexists  will verify whether or not the given
       the number of key-value pairs exceeds  about  80%  of  size.   (Nothing
       awful  will  happen  if  the  database  grows  beyond 100% of size, but
       accesses will slow down quite a bit and the .index and .hash files will
       grow somewhat.)

       Dbz  stores up to DBZ_INTERNAL_HASH_SIZE bytes of the message-id's hash
       in the .hash file to confirm a hit.  This eliminates the need  to  read
       the  base file to handle collisions.  This replaces the tagmask feature
       in previous dbz releases.

       A size of ``0'' given to dbzfresh is synonymous with the local default;
       the normal default is suitable for tables of 5,000,000 key-value pairs.
       Calling dbzinit(name) with the empty  name  is  equivalent  to  calling
       dbzfresh(name, 0).

       When databases are regenerated periodically, as in news, it is simplest
       to pick the parameters for a new database based on the old  one.   This
       also  permits  some memory of past sizes of the old database, so that a
       new database size can be chosen to cover expected fluctuations.   Dbza-
       gain  is a variant of dbzinit for creating a new database as a new gen-
       eration of an old database.  The database files for oldbase must exist.
       Dbzagain  is  equivalent  to  calling dbzfresh with a size equal to the
       result of applying dbzsize to the largest number of entries in the old-
       base database and its previous 10 generations.

       When many accesses are being done by the same program, dbz is massively
       faster if its first hash table is in  memory.   If  the  ``pag_incore''
       flag is set to INCORE_MEM, an attempt is made to read the table in when
       the database is opened, and dbzclose writes it out to disk again (if it
       was  read  successfully  and  has been modified).  Dbzsetoptions can be
       used to set the pag_incore and exists_incore flag to  new  value  which
       should  be  ``INCORE_NO'',  ``INCORE_MEM'',  or ``INCORE_MMAP'' for the
       .hash and .index files separately; this does not affect the status of a
       database  that  has  already been opened.  The default is ``INCORE_NO''
       for the .index file  and  ``INCORE_MMAP''  for  the  .hash  file.   The
       attempt  to  read the table in may fail due to memory shortage; in this
       case dbz fails with an error.  Stores to an in-memory database are  not
       (in  general)  written out to the file until dbzclose or dbzsync, so if
       robustness in the presence of crashes or concurrent  accesses  is  cru-
       cial,   in-memory   databases   should   probably  be  avoided  or  the
       writethrough option should be set to ``true'';

       If the nonblock option is ``true'', then writes to the .hash and .index
       files  will  be done using non-blocking I/O.  This can be significantly
       faster if your platform supports non-blocking I/O with files.

       Dbzsync causes all buffers etc. to be flushed out to the files.  It  is
       typically  used  as a precaution against crashes or concurrent accesses
       when a dbz-using process will be running for a  long  time.   It  is  a
       somewhat expensive operation, especially for an in-memory database.

       Concurrent  reading  of  databases  is  fairly  safe,  but  there is no
       indicates that the database did not appear to be in dbz format.

       If DBZTEST is defined at compile-time then a main()  function  will  be
       included.  This will do performance tests and integrity test.

HISTORY

       The   original   dbz   was  written  by  Jon  Zeeff  (zeeff@b-tech.ann-
       arbor.mi.us).  Later contributions by David  Butler  and  Mark  Moraes.
       Extensive  reworking,  including  this  documentation, by Henry Spencer
       (henry@zoo.toronto.edu) as part of the C News project.  MD5  code  bor-
       rowed  from RSA.  Extensive reworking to remove backwards compatibility
       and   to   add   hashes   into   dbz   files   by    Clayton    O'Neill
       (coneill@oneill.net)

BUGS

       Unlike dbm, dbz will refuse to dbzstore with a key already in the data-
       base.  The user is responsible for avoiding this.

       The RFC822 case mapper implements only a  first  approximation  to  the
       hideously-complex RFC822 case rules.

       Dbz no longer tries to be call-compatible with dbm in any way.



                                  6 Sep 1997                            DBZ(3)

Man(1) output converted with man2html