Memory usage during sorting

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Memory usage during sorting
Date: 2012-01-16 00:59:26
Message-ID: CAMkU=1zBo3jQmjNOQrXdBYm3yZXpM3e2+_ATydEjbkFY4uto0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In tuplesort.c, it initially reads tuples into memory until availMem
is exhausted.

It then switches to the tape sort algorithm, and allocates buffer
space for each tape it will use. This substantially over-runs the
allowedMem, and drives availMem negative.

It works off this deficit by writing tuples to tape, and pfree-ing
their spot in the heap, which pfreed space is more or less randomly
scattered. Once this is done it proceeds to alternate between freeing
more space in the heap and adding things to the heap (in a nearly
strict 1+1 alternation if the tuples are constant size).

The space freed up by the initial round of pfree where it is working
off the space deficit from inittapes is never re-used. It also cannot
be paged out by the VM system, because it is scattered among actively
used memory.

I don't think that small chunks can be reused from one memory context
to another, but I haven't checked. Even if it can be, during a big
sort like an index build the backend process doing the build may have
no other contexts which need to use it.

So having over-ran workMem and stomped all over it to ensure no one
else can re-use it, we then scrupulously refuse to benefit from that
over-run amount ourselves.

The attached patch allows it to reuse that memory. On my meager
system it reduced the building of an index on an integer column in a
skinny 200 million row totally randomly ordered table by about 3% from
a baseline of 25 minutes.

I think it would be better to pre-deduct the tape overhead amount we
will need if we decide to switch to tape sort from the availMem before
we even start reading (and then add it back if we do indeed make that
switch). That way we wouldn't over-run the memory in the first place.
However, that would cause apparent regressions in which sorts that
previously fit into maintenance_work_mem no longer do. Boosting
maintenance_work_mem to a level that was actually being used
previously would fix those regressions, but pointing out that the
previous behavior was not optimal doesn't change the fact that people
are used to it and perhaps tuned to it. So the attached patch seems
more backwards-friendly.

Cheers,

Jeff

Attachment Content-Type Size
sort_mem_usage_v1.patch application/octet-stream 574 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2012-01-16 01:03:16 Re: pgstat documentation tables
Previous Message Josh Kupershmidt 2012-01-15 23:14:16 Re: disable prompting by default in createuser