Wednesday, January 3, 2007

Memory Overcommit and the OOM Killer

Linux has a feature called memory overcommit. Put simply, it means kernel allocates memory even if it doesn't have enough. This happens when a new process is created using fork(). This effectively copies the parent's address space, and so requires twice the parent process' memory once the new process (child) is created. The memory overcommit feature means that fork() always returns a success. Even if there is not enough memory to create a new child process!
The idea behind a memory overcommit feature of Linux is that the child process rarely uses all the memory allocated to it. fork() is followed by exec() which overlays the child address space with some exectutable. Once the exec() is done, the child process exits and the parent process (which goes into wait() after creation of child) resumes.
Failing to allocate enough memory when it is needed by the child results in another process being invoked. This process is called Out Of Memory (OOM) killer. The job of this process is to select a process to kill so that the memory requirements after fork() can be satisfied. Not a very desirable feature, but it is necessary to keep memory overcommit feature of Linux. This made OOM killer infamous. How to select a process to kill is tricky. It might happen that some important processes (e.g. a database) gets killed by OOM killer. Analogies like this show how serious the situation is when killer is invoked.
It seems that during 2.4, OOM killer's favourite process to kill was the Netscape browser. The browser would crash all of a sudden and you'd have no idea why.
The memory overcommit along with OOM is not an example of a good design feature, but has even made its way into AIX. With 2.6 the memory overcommit feature can be suppressed using some variables, but by default the feature is present.
Fortunately, it doesn't exist in Solaris. Solaris never used memory overcommit. First it was vfork() instead of fork() to prevent the failure of process creation. In Solaris 10, posix_spawn() is used instead of vfork() since vfork() is not MT-safe.


Curtis said...

The exec() doesn't terminate the child process; it keeps the same PID.

I know that's not the point of your post though, which is the memory situation. Linux gives you a broken hack, Solaris provides new system interfaces...

Anonymous said...

Yes, linux had a broken hack (and still has by default) and shows poor design taste.

But Solaris isn't inventing system interfaces. The name should give the game away. See

UX-admin said...

I believe that the point is, among other things, that Solaris uses POSIX implementation, whereas Linux internals are just another in a pile of hacks.

Just makes me appreciate my Solaris systems even more.