User-defined malloc replacement

Users can replace the memory subsystem (malloc, calloc, realloc, free, mallopt and mallinfo subroutines ) with one of their own design.

Note: Replacement Memory Subsystems written in C++ are not supported due to the use of the libc.a memory subsystem in the C++ library libC.a.

The existing memory subsystem works for both threaded and non-threaded applications. The user-defined memory subsystem must be threadsafe so that it works in both threaded and non-threaded processes. Because there are no checks to verify that it is, if a non-threadsafe memory module is loaded in a threaded application, memory and data may be corrupted.

The user defined memory subsystem 32- and 64- bit objects must be placed in an archive with the 32-bit shared object named mem32.o and the 64-bit shared object named mem64.o.

The user-shared objects must export the following symbols :
  • __malloc__
  • __free__
  • __realloc__
  • __calloc__
  • __mallinfo__
  • __mallopt__
  • __malloc_init__
  • __malloc_prefork_lock__
  • __malloc_postfork_unlock__
The user-shared objects can optionally export the following symbol:
  • __malloc_start__
  • __posix_memalign__

Execution does not stop if these symbols do not exist.

The functions are defined as follows:
void *__malloc__(size_t) :
This function is the user equivalent of the malloc subroutine.
void __free__(void *) :
This function is the user equivalent of the free subroutine.
void *__realloc__(void *, size_t) :
This function is the user equivalent of the realloc subroutine.
void *__calloc__(size_t, size_t) :
This function is the user equivalent of the calloc subroutine.
int __mallopt__(int, int) :
This function is the user equivalent of the mallopt subroutine.
struct mallinfo __mallinfo__() :
This function is the user equivalent of the mallinfo subroutine.
void __malloc_start__()
This function will be called once before any other user-defined malloc entry point is called.
void __posix_memalign__()
This function is the user equivalent of the posix_memalign subroutine. If this symbol does not exist, the execution will not stop, but a call made to the posix_memalign subroutine will cause unexpected results.
The following functions are used by the thread subsystem to manage the user-defined memory subsystem in a multithreaded environment. They are only called if the application and/or the user defined module are bound with libpthreads.a. Even if the the user-defined subsystem is not threadsafe and not bound with libpthreads.a, these functions must be defined and exported. Otherwise, the object will not be loaded.
void __malloc_init__(void)
Called by the pthread initialization routine. This function is used to initialize the threaded-user memory subsystem. In most cases, this includes creating and initializing some form of locking data. Even if the user-defined memory subsystem module is bound with libpthreads.a, the user-defined memory subsystem must work before __malloc_init__() is called.
void __malloc_prefork_lock__(void)
Called by pthreads when the fork subroutine is called. This function is used to insure that the memory subsystem is in a known state before the fork() and stays that way until the fork() has returned. In most cases this includes acquiring the memory subsystem locks.
void __malloc_postfork_unlock__(void)
Called by pthreads when the fork subroutine is called. This function is used to make the memory subsystem available in the parent and child after a fork. This should undo the work done by __malloc_prefork_lock__. In most cases, this includes releasing the memory subsystem locks.
All of the functions must be exported from a shared module. Separate modules must exist for 32- and 64-bit implementations placed in an archive. For example:
  • mem.exp module:
    __malloc__
    __free__
    __realloc__
    __calloc__
    __mallopt__
    __mallinfo__
    __malloc_init__
    __malloc_prefork_lock__
    __malloc_postfork_unlock__
    __malloc_start__
  • mem_functions32.o module:

    Contains all of the required 32-bit functions

  • mem_functions64.o module:

    Contains all of the required 64-bit functions

The following examples are for creating the shared objects. The -lpthreads parameter is needed only if the object uses pthread functions.
  • Creating 32-bit shared object:
    ld -b32 -m -o mem32.o mem_functions32.o \
    -bE:mem.exp \
    -bM:SRE -lpthreads -lc
  • Creating 64-bit shared object:
    ld -b64 -m -o mem64.o mem_functions64.o \
    -bE:mem.exp \
    -bM:SRE -lpthreads -lc
  • Creating the archive (the shared objects name must be mem32.o for the 32bit object and mem64.o for the 64bit object):
     ar -X32_64 -r archive_name mem32.o mem64.o

Enabling the user-defined memory subsystem

The user-defined memory subsystem can be enabled by using one of the following:
  • The MALLOCTYPE environment variable
  • The _malloc_user_defined_name global variable in the user's application

To use the MALLOCTYPE environment variable, the archive containing the user defined memory subsystem is specified by setting MALLOCTYPE to user:archive_name where archive_name is in the application's libpath or the path is specified in the LIBPATH environment variable.

To use the _malloc_user_defined_name global variable, the user's application must declare the global variable as:
char *_malloc_user_defined_name="archive_name"

where archive_name must be in the application's libpath or a path specified in the LIBPATH environment variable.

Note:
  1. When a setuid application is run, the LIBPATH environment variable is ignored so the archive must be in the application's libpath.
  2. archive_name cannot contain path information.
  3. When both the MALLOCTYPE environment variable and the _malloc_user_defined_name global variable are used to specify the archive_name, the archive specified by MALLOCTYPE will override the one specified by _malloc_user_defined_name.

32-bit and 64-bit considerations

If the archive does not contain both the 32-bit and 64-bit shared objects and the user-defined memory subsystem was enabled using the MALLOCTYPE environment variable, there will be problems executing 64-bit processes from 32-bit applications and 32-bit processes from 64-bit applications. When a new process is created using the exec subroutine, the process inherits the environment of the calling application. This means that the MALLOCTYPE environment variable will be inherited and the new process will attempt to load the user-defined memory subsystem. If the archive member does not exist for this type of program, the load will fail and the new process will exit.

Thread considerations

All of the provided functions must work in a multithreaded environment. Even if the module is linked with libpthreads.a, at least __malloc__() must work before __malloc_init__() is called and pthreads is initialized. This is required because the pthread initialization requires malloc() before __malloc_init__() is called.

All provided memory functions must work in both threaded and non-threaded environments. The __malloc__() function should be able to run to completion without having any dependencies on __malloc_init__() (that is, __malloc__() should initially assume that __malloc_init__() has not yet run.) After __malloc_init__() has completed, __malloc__() can rely on any work done by __malloc_init__(). This is required because the pthread initialization uses malloc() before __malloc_init__() is called.

The following variables are provided to prevent unneeded thread-related routines from being called:
  • The __multi_threaded variable is zero until a thread is created when it becomes non-zero and will not be reset to zero for that process.
  • The __n_pthreads variable is -1 until pthreads has been initialized when it is set to 1. From that point on it is a count of the number of active threads.

Example:

If __malloc__() uses pthread_mutex_lock(), the code might look similar to the following:

if (__multi_threaded)
pthread_mutex_lock(mutexptr);

/* ..... work ....... */

if (__multi_threaded)
pthread_mutex_unlock(mutexptr);

In this example, __malloc__() is prevented from executing pthread functions before pthreads is fully initialized. Single-threaded applications are also accelerated because locking is not done until a second thread is started.

Limitations

Memory subsystems written in C++ are not supported due to initialization and the dependencies of libC.a and the libc.a memory subsystem.

Error messages are not translated because the setlocale subroutine uses malloc() to initialize the locales. If malloc() fails then the setlocale subroutine cannot finish and the application is still in the POSIX locale. Therefore, only the default English messages will be displayed.

Existing statically built programs cannot use the user-defined memory subsystem without recompiling.

Error reporting

The first time the malloc subroutine is called, the 32- or 64-bit object in the archive specified by the MALLOCTYPE environment variable is loaded. If the load fails, a message displays and the application exits. If the load is successful, an attempt is made to verify that all of the required symbols are present. If any symbols are missing, the application is terminated and the list of missing symbols displays.