Re: some namespace.c refactoring

From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: some namespace.c refactoring
Date: 2023-02-23 11:07:58
Message-ID: 064767e5-c6af-a7e5-36a5-14013e6795f3@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20.02.23 15:03, Peter Eisentraut wrote:
> On 15.02.23 19:04, Alvaro Herrera wrote:
>> That said, I think most of this code is invoked for DDL, where
>> performance is not so critical; probably just fixing
>> get_object_property_data to not be so naïve would suffice.
>
> Ok, I'll look into that.

I did a variety of performance testing on this now.

I wrote a C function that calls the "is visible" functions in a tight loop:

Datum
pg_test_visible(PG_FUNCTION_ARGS)
{
int32 count = PG_GETARG_INT32(0);
Oid relid = PG_GETARG_OID(1);
Oid typid = PG_GETARG_OID(2);

for (int i = 0; i < count; i++)
{
RelationIsVisible(relid);
TypeIsVisible(typid);
//ObjectIsVisible(RelationRelationId, relid);
//ObjectIsVisible(TypeRelationId, typid);
}

PG_RETURN_VOID();
}

(It's calling two different ones to defeat the caching in
get_object_property_data().)

Here are some run times:

unpatched:

select pg_test_visible(100_000_000, 'pg_class', 'int4');
Time: 4536.747 ms (00:04.537)

select pg_test_visible(100_000_000, 'tenk1', 'widget');
Time: 10828.802 ms (00:10.829)

(Note that the "is visible" functions special case system catalogs.)

patched:

select pg_test_visible(100_000_000, 'pg_class', 'int4');
Time: 11409.948 ms (00:11.410)

select pg_test_visible(100_000_000, 'tenk1', 'widget');
Time: 18649.496 ms (00:18.649)

So, it's slower, but it's not clear whether it matters in practice,
considering this test.

I also wondered if this is visible through a normal external function
call, so I tried

do $$ begin perform pg_get_statisticsobjdef(28999) from
generate_series(1, 1_000_000); end $$;

(where that is the OID of the first object from select * from
pg_statistic_ext; in the regression database).

unpatched:

Time: 6952.259 ms (00:06.952)

patched (first patch only):

Time: 6993.655 ms (00:06.994)

patched (both patches):

Time: 7114.290 ms (00:07.114)

So there is some visible impact, but again, the test isn't realistic.

Then I tried a few ways to make get_object_property_data() faster. I
tried building a class_id+index cache that is qsort'ed (once) and then
bsearch'ed, that helped only minimally, not enough to make up the
difference. I also tried just searching the class_id+index cache
linearly, hoping maybe that if the cache is smaller it would be more
efficient to access, but that actually made things (minimally) worse,
probably because of the indirection. So it might be hard to get much
more out of this. I also thought about PerfectHash, but I didn't code
that up yet.

Another way would be to not use get_object_property_data() at all but
write a "common" function that we pass in all it needs hardcodedly, like

bool
RelationIsVisible(Oid relid)
{
return IsVisible_common(RELOID,
Anum_pg_class_relname
Anum_pg_class_relnamespace);
}

This would still save a lot of duplicate code.

But again, I don't think the micro-performance really matters here.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-02-23 11:20:23 Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)
Previous Message marekmosiewicz 2023-02-23 11:04:05 Disable vacuuming to provide data history