===========================================================================

	      The Hoard Multiprocessor Memory Allocator
			<http://www.hoard.org>

			   by Emery Berger
		<http://www.cs.utexas.edu/users/emery>

  Copyright (c) 1998, 1999, 2000, The University of Texas at Austin.

---------------------------------------------------------------------------
emery@cs.utexas.edu                | <http://www.cs.utexas.edu/users/emery>
Department of Computer Sciences    |             <http://www.cs.utexas.edu>
University of Texas at Austin      |                <http://www.utexas.edu>
===========================================================================


What's Hoard?
-------------

The Hoard memory allocator is a fast, scalable, and memory-efficient
memory allocator for shared-memory multiprocessors.

Why Hoard?
----------

Multithreaded programs that perform dynamic memory allocation do not
scale because the heap is a bottleneck. When multiple threads
simultaneously allocate or deallocate memory from the heap, they will
be serialized while waiting for the heap lock. Programs making
intensive use of the heap actually slow down as the number of
processors increases. (Note: If you make a lot of use of the STL, you
may not know it, but you are making a lot of use of the heap.)

Hoard is a fast allocator that solves this problem. In addition, it
has very reasonable bounds on memory consumption.


How do I use it?
----------------

Using Hoard is easy. It is written to work on any variant of UNIX that
supports pthreads, and should compile out of the box using make.  (See
INSTALL for more details. Also, if you're using Windows or the BeOS,
please read the appropriate NOTES file.)

You can build Hoard in one of two ways (see INSTALL). Below, I assume
you used the configure script.

To link Hoard with the program foo (after doing "make install"):

	Linux:
	  g++ foo.o -L/usr/local/lib -lhoard -lpthread -o foo

	Solaris:
	  g++ foo.o -L/usr/local/lib -lhoard -lthread -lrt -o foo

You *must* add "-lpthread" or "-lthread" to your list of libraries
(except if you're using the sproc library on the SGI). Don't forget to
add /usr/local/lib to your LD_LIBRARY_PATH environment variable.

In UNIX, you might be able to avoid relinking your application and use
Hoard just by changing the environment variable LD_PRELOAD, as in

	setenv LD_PRELOAD "/lib/libpthread.so.0 /usr/local/lib/libhoard.so"

This won't work for applications compiled with the "-static" option.


Did it work?
------------

When you compile Hoard ("make"), you'll get six test programs:
testmymalloc(-hoard), threadtest(-hoard), and
cache-scratch(-hoard). The first one is just to measure raw,
uniprocessor speed. The second one lets you observe scalability with
multiple threads. The third tests the cache locality of your
allocator (see cache-scratch.cpp for more details).

** NOTE: using the configure script dynamically links these
** (*-hoard) to the Hoard library. Static linking (using "make -f
** Makefile.orig") improves performance (at the cost of increasing the
** size of the executable).

For instance,

	threadtest 2 1 800000

will create two threads that will each allocate and free 400,000
objects (each object is 8 bytes). Compare this to

	threadtest-hoard 2 1 800000

(the same program as above, but linked with Hoard).

Likewise, try

	testmymalloc 100000 1
and
	testmymalloc-hoard 100000 1

to compare Hoard's uniprocessor performance with the stock allocator.

For cache-scratch, try the following (on a P-processor machine):

   cache-scratch 1 1000 1 1000000
   cache-scratch P 1000 1 1000000

   cache-scratch-hoard 1 1000 1 1000000
   cache-scratch-hoard P 1000 1 1000000

The ideal is a P-fold speedup.

Hoard has been successfully built on a 2-processor x86 running Windows
NT SP4 with and without CygWin, a 4-processor x86 box running Linux
(Red Hat 6.0, kernel version 2.2.5-22 SMP), a 14-processor SPARC
running Solaris 7, a 56-processor SGI Origin 2000 (cc/NUMA
architecture), and a 4-processor IBM F50 (PowerPC-based) under AIX.


More information
----------------

For more information on Hoard, along with some nice performance graphs, see

	Hoard: A Fast, Scalable and Memory-Efficient Allocator
	       for Shared-Memory Multiprocessors
	September 1999
	University of Texas Dept. of Computer Sciences
	UTCS-TR99-22.

	(Included in this distribution in docs/UTCS-TR99-22.ps.gz)

The latest version of Hoard will always be available from the Hoard web page:

	<http://www.hoard.org>


Feedback
--------

Please send any bug reports and information about new platforms Hoard
has been built on to emery@cs.utexas.edu.


Mailing lists
-------------

There are two mailing lists for Hoard: hoard-announce, a low-volume
mailing list for announcements of new releases of Hoard, and hoard,
a mailing list for Hoard-related discussions.

To subscribe, go to the Hoard home page (www.hoard.org) and enter your
e-mail address in the appropriate box.


Acknowledgements
----------------

In addition to those thanked in the paper, I'd like to thank Ganesan
Rajagopal for submitting the autoconf and automake scripts, John
Hickin and Paul Larson for improving the NT port, Trey Boudreau for
the BeOS port. Thanks also to Kevin Mills, Robert Fleischman, Martin
Bachtold, and John Hickin.


--
Emery Berger                           | Parallel Programming
emery@cs.utexas.edu                    | & Multiprogramming MP Groups 
<http://www.cs.utexas.edu/users/emery> | University of Texas at Austin

