Re: Partitioned tables and relfilenode

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partitioned tables and relfilenode
Date: 2017-03-23 01:02:39
Message-ID: d937752c-bf74-5560-6b7c-a315efc1da77@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Tue, Mar 21, 2017 at 10:37 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Mar 21, 2017 at 9:49 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Tue, Mar 21, 2017 at 5:05 AM, Amit Langote
>>> <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>> Attached updated patches.
>>>
>>> Committed 0001 after removing a comma.
>>
>> Regarding 0002,

Thanks for the review.

>> I notice this surprising behavior:
>>
>> rhaas=# create table foo (a int, b text) partition by list (a);
>> CREATE TABLE
>> rhaas=# select relfilenode from pg_class where oid = 'foo'::regclass;
>> relfilenode
>> -------------
>> 16385
>> (1 row)
>>
>> Why do we end up with a relfilenode if there's no storage? I would
>> have expected to see 0 there.

Hmm, I see that happening even with views (and other relkinds that do not
have storage):

create table foo (a int);
create view foov as select * from foo;
select relfilenode from pg_class where relname = 'foov';
-[ RECORD 1 ]------
relfilenode | 24606

create type typ as (a int);
select relfilenode from pg_class where relname = 'typ';
-[ RECORD 1 ]------
relfilenode | 24610

After staring at the relevant code, fixing things so that relfilenode is 0
for relations without storage seems to be slightly involved. In the header
comment of relmapper.c, there is a line:

Mapped catalogs have zero in their pg_class.relfilenode entries.

which is kind of significant for the relcache initialization code.
RelationInitPhysicalAddr() called when building a relcache entry
initializes the rd_rel.relfilenode from pg_class.relfilenode if it's valid
and from relation mapper if invalid (i.e. 0). So when relcache entries
for the mapped catalogs are being built, pg_class.relfilenode being
InvalidOid is a signal to get the same from the relmapper. Since the same
code path is exercised in other cases, all relations supposedly without
storage would wrongly be looked up in the relmapper. It may be possible
to fix that, but will require non-trivial rearrangement of relevant code.

>> Other than that, there's not much to see here. I'm a little worried
>> that you might have missed some case that can result in an access to
>> the file, but I can't find one. Stuff I tried:
>>
>> VACUUM
>> VACUUM FULL
>> CLUSTER
>> ANALYZE
>> CREATE INDEX
>> ALTER TABLE .. ALTER COLUMN .. TYPE
>> TRUNCATE
>>
>> It would be good to go through and make sure all of those - and any
>> others you can think of - are represented in the regression tests.

We now have tests that cover commands that previously would try to access
file for partitioned tables (most of those your listed above). The commit
yesterday to eliminate scan nodes is also covered by tests.

Speaking of - do you think the following error message is reasonable when
a a partitioned table is passed to pg_prewarm():

select pg_prewarm('p');
ERROR: fork "main" does not exist for this relation

It's the same if a view name is passed.

>> Also, a documentation update is probably in order to explain that
>> we're not going to accept heap_reloptions on the parent because the
>> parent has no storage; such options must be set on the child in order
>> to have effect.

Hmm, actually I am no longer sure if we should completely get rid of the
reloptions. Some options like fillfactor don't make sense, but we might
want to use the values for autovacuum_* options when we will improve
autovacuum's handling of partitioned tables. Maybe even parallel_workers
could be useful to specify for parent tables. So, I took out reloptions.c
changes from the patch for now and documented that specifying the storage
options for partitioned parents is ineffective currently. Does that make
sense?

Thanks,
Amit

Attachment Content-Type Size
0001-Do-not-allocate-storage-for-partitioned-tables.patch text/x-diff 2.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-03-23 01:02:45 Re: Measuring replay lag
Previous Message Robert Haas 2017-03-23 00:25:41 Re: Parallel Append implementation