Re: 9.5: Better memory accounting, towards memory-bounded HashAgg

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.5: Better memory accounting, towards memory-bounded HashAgg
Date: 2014-08-06 15:43:58
Message-ID: CA+Tgmobnu7XEn1gRdXnFo37P79bF=qLt46=37ajP3Cro9dBRaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Aug 2, 2014 at 4:40 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> Attached is a patch that explicitly tracks allocated memory (the blocks,
> not the chunks) for each memory context, as well as its children.
>
> This is a prerequisite for memory-bounded HashAgg, which I intend to
> submit for the next CF. Hashjoin tracks the tuple sizes that it adds to
> the hash table, which is a good estimate for Hashjoin. But I don't think
> it's as easy for Hashagg, for which we need to track transition values,
> etc. (also, for HashAgg, I expect that the overhead will be more
> significant than for Hashjoin). If we track the space used by the memory
> contexts directly, it's easier and more accurate.
>
> I did some simple pgbench select-only tests, and I didn't see any TPS
> difference.

I was curious whether a performance difference would show up when
sorting, so I tried it out. I set up a test with pgbench -i 300. I
then repeatedly restarted the database, and after each restart, did
this:

time psql -c 'set trace_sort=on; reindex index pgbench_accounts_pkey;'

I alternated runs between master and master with this patch, and got
the following results:

master:
LOG: internal sort ended, 1723933 KB used: CPU 2.58s/11.54u sec
elapsed 16.88 sec
LOG: internal sort ended, 1723933 KB used: CPU 2.50s/12.37u sec
elapsed 17.60 sec
LOG: internal sort ended, 1723933 KB used: CPU 2.14s/11.28u sec
elapsed 16.11 sec

memory-accounting:
LOG: internal sort ended, 1723933 KB used: CPU 2.57s/11.97u sec
elapsed 17.39 sec
LOG: internal sort ended, 1723933 KB used: CPU 2.30s/12.57u sec
elapsed 17.68 sec
LOG: internal sort ended, 1723933 KB used: CPU 2.54s/11.99u sec
elapsed 17.25 sec

Comparing the median times, that's about a 3% regression. For this
particular case, we might be able to recapture that by replacing the
bespoke memory-tracking logic in tuplesort.c with use of this new
facility. I'm not sure whether there are other cases that we might
also want to test; I think stuff that runs all on the server side is
likely to show up problems more clearly than pgbench. Maybe a
PL/pgsql loop that does something allocation-intensive on each
iteration, for example, like parsing a big JSON document.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-08-06 15:53:29 Re: text search: restricting the number of parsed words in headline generation
Previous Message Robert Haas 2014-08-06 15:35:15 Re: Fixed redundant i18n strings in json