Re: Question about lazy_space_alloc() / linux over-commit

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question about lazy_space_alloc() / linux over-commit
Date: 2015-03-09 22:12:22
Message-ID: 54FE1AC6.8080502@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/9/15 12:28 PM, Alvaro Herrera wrote:
> Robert Haas wrote:
>> On Sat, Mar 7, 2015 at 5:49 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>>> On 2015-03-05 15:28:12 -0600, Jim Nasby wrote:
>>>> I was thinking the simpler route of just repalloc'ing... the memcpy would
>>>> suck, but much less so than the extra index pass. 64M gets us 11M tuples,
>>>> which probably isn't very common.
>>>
>>> That has the chance of considerably increasing the peak memory usage
>>> though, as you obviously need both the old and new allocation during the
>>> repalloc().
>>>
>>> And in contrast to the unused memory at the tail of the array, which
>>> will usually not be actually allocated by the OS at all, this is memory
>>> that's actually read/written respectively.
>>
>> Yeah, I'm not sure why everybody wants to repalloc() that instead of
>> making several separate allocations as needed. That would avoid
>> increasing peak memory usage, and would avoid any risk of needing to
>> copy the whole array. Also, you could grow in smaller chunks, like
>> 64MB at a time instead of 1GB or more at a time. Doubling an
>> allocation that's already 1GB or more gets big in a hurry.
>
> Yeah, a chunk list rather than a single chunk seemed a good idea to me
> too.

That will be significantly more code than a simple repalloc, but as long
as people are OK with that I can go that route.

> Also, I think the idea of starting with an allocation assuming some
> small number of dead tuples per page made sense -- and by the time that
> space has run out, you have a better estimate of actual density of dead
> tuples, so you can do the second allocation based on that new estimate
> (but perhaps clamp it at say 1 GB, just in case you just scanned a
> portion of the table with an unusually high percentage of dead tuples.)

I like the idea of using a fresh idea of dead tuple density when we need
more space. We would also clamp this at maintenance_work_mem, not a
fixed 1GB.

Speaking of which... people have referenced allowing > 1GB of dead
tuples, which means allowing maintenance_work_mem > MAX_KILOBYTES. The
comment for that says:

/* upper limit for GUC variables measured in kilobytes of memory */
/* note that various places assume the byte size fits in a "long"
variable */

So I'm not sure how well that will work. I think that needs to be a
separate patch.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2015-03-09 22:24:34 Re: New functions
Previous Message Alvaro Herrera 2015-03-09 20:46:28 Re: tracking commit timestamps