Re: mdnblocks() sabotages error checking in _mdfd_getseg()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: mdnblocks() sabotages error checking in _mdfd_getseg()
Date: 2015-12-10 18:26:17
Message-ID: CA+TgmoZY0U+XCMzs+iBw8PnrNi7E4+uD4Fnxbr9YmFk+P-KFYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 10, 2015 at 1:22 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 10 December 2015 at 16:47, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> On Thu, Dec 10, 2015 at 11:36 AM, Andres Freund <andres(at)anarazel(dot)de>
>> wrote:
>> >> In fact, having no way to get the relation length other than scanning
>> >> 1000 files doesn't seem like an especially good choice even if we used
>> >> a better data structure. Putting a header page in the heap would make
>> >> getting the length of a relation O(1) instead of O(segments), and for
>> >> a bonus, we'd be able to reliably detect it if a relation file
>> >> disappeared out from under us. That's a difficult project and
>> >> definitely not my top priority, but this code is old and crufty all
>> >> the same.)
>> >
>> > The md layer doesn't really know whether it's dealing with an index, or
>> > with an index, or ... So handling this via a metapage doesn't seem
>> > particularly straightforward.
>>
>> It's not straightforward, but I don't think that's the reason. What
>> we could do is look at the call sites that use
>> RelationGetNumberOfBlocks() and change some of them to get the
>> information some other way instead. I believe get_relation_info() and
>> initscan() are the primary culprits, accounting for some enormous
>> percentage of the system calls we do on a read-only pgbench workload.
>> Those functions certainly know enough to consult a metapage if we had
>> such a thing.
>
> It looks pretty straightforward to me...
>
> The number of relations with >1 file is likely to be fairly small, so we can
> just have an in-memory array to record that. 8 bytes per relation >1 GB
> isn't going to take much shmem, but we can extend using dynshmem as needed.
> We can seq scan the array at relcache build time and invalidate relcache
> when we extend. WAL log any extension to a new segment and write the table
> to disk at checkpoint.

Invaliding the relcache when we extend would be extremely expensive,
but we could probably come up with some variant of this that would
work. I'm not very excited about this design, though; I think
actually putting a metapage on each relation would be better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2015-12-10 18:29:52 Re: [patch] Proposal for \rotate in psql
Previous Message Simon Riggs 2015-12-10 18:22:29 Re: mdnblocks() sabotages error checking in _mdfd_getseg()