Quick Links

Relation extension scalability

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Relation extension scalability
Date:	2015-03-29 18:56:19
Message-ID:	20150329185619.GA29062@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello,

Currently bigger shared_buffers settings don't combine well with
relations being extended frequently. Especially if many/most pages have
a high usagecount and/or are dirty and the system is IO constrained.

As a quick recap, relation extension basically works like:
1) We lock the relation for extension
2) ReadBuffer*(P_NEW) is being called, to extend the relation
3) smgrnblocks() is used to find the new target block
4) We search for a victim buffer (via BufferAlloc()) to put the new
block into
5) If dirty the victim buffer is cleaned
6) The relation is extended using smgrextend()
7) The page is initialized

The problems come from 4) and 5) potentially each taking a fair
while. If the working set mostly fits into shared_buffers 4) can
requiring iterating over all shared buffers several times to find a
victim buffer. If the IO subsystem is buys and/or we've hit the kernel's
dirty limits 5) can take a couple seconds.

I've prototyped solving this for heap relations moving the smgrnblocks()
+ smgrextend() calls to RelationGetBufferForTuple(). With some care
(including a retry loop) it's possible to only do those two under the
extension lock. That indeed fixes problems in some of my tests.

I'm not sure whether the above is the best solution however. For one I
think it's not necessarily a good idea to opencode this in hio.c - I've
not observed it, but this probably can happen for btrees and such as
well. For another, this is still a exclusive lock while we're doing IO:
smgrextend() wants a page to write out, so we have to be careful not to
overwrite things.

There's two things that seem to make sense to me:

First, decouple relation extension from ReadBuffer*, i.e. remove P_NEW
and introduce a bufmgr function specifically for extension.

Secondly I think we could maybe remove the requirement of needing an
extension lock alltogether. It's primarily required because we're
worried that somebody else can come along, read the page, and initialize
it before us. ISTM that could be resolved by *not* writing any data via
smgrextend()/mdextend(). If we instead only do the write once we've read
in & locked the page exclusively there's no need for the extension
lock. We probably still should write out the new page to the OS
immediately once we've initialized it; to avoid creating sparse files.

The other reason we need the extension lock is that code like
lazy_scan_heap() and btvacuumscan() that tries to avoid initializing
pages that are about to be initilized by the extending backend. I think
we should just remove that code and deal with the problem by retrying in
the extending backend; that's why I think moving extension to a
different file might be helpful.

I've attached my POC for heap extension, but it's really just a POC at
this point.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0001-WIP-Saner-heap-extension.patch	text/x-patch	5.1 KB

Responses

Re: Relation extension scalability at 2015-03-29 19:21:44 from Tom Lane
Re: Relation extension scalability at 2015-03-30 00:02:06 from Robert Haas
Re: Relation extension scalability at 2015-03-30 04:03:57 from Amit Kapila
Re: Relation extension scalability at 2015-04-17 18:19:19 from Qingqing Zhou
Re: Relation extension scalability at 2015-07-19 13:58:41 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2015-03-29 19:06:48	Re: compute_index_stats is missing a CHECK_FOR_INTERRUPTS
Previous Message	Tom Lane	2015-03-29 18:55:29	Re: How about to have relnamespace and relrole?