Re: Experimenting with hash tables inside pg_dump

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Experimenting with hash tables inside pg_dump
Date: 2021-10-22 14:53:31
Message-ID: 2709766.1634914411@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2021-10-21 22:13:22 -0400, Tom Lane wrote:
>> I've thought about doing something like
>> SELECT unsafe-functions FROM pg_class WHERE oid IN (someoid, someoid, ...)
>> but in cases with tens of thousands of tables, it seems unlikely that
>> that's going to behave all that nicely.

> That's kinda what I'm doing in the quick hack. But instead of using IN(...) I
> made it unnest('{oid, oid, ...}'), that scales much better.

I'm skeptical of that, mainly because it doesn't work in old servers,
and I really don't want to maintain two fundamentally different
versions of getTableAttrs(). I don't think you actually need the
multi-array form of unnest() here --- we know the TableInfo array
is in OID order --- but even the single-array form only works
back to 8.4.

However ... looking through getTableAttrs' main query, it seems
like the only thing there that's potentially unsafe is the
"format_type(t.oid, a.atttypmod)" call. I wonder if it could be
sane to convert it into a single query that just scans all of
pg_attribute, and then deal with creating the formatted type names
separately, perhaps with an improved version of getFormattedTypeName
that could cache the results for non-default typmods. The main
knock on this approach is the temptation for somebody to stick some
unsafe function into the query in future. We could stick a big fat
warning comment into the code, but lately I despair of people reading
comments.

> To see where it's worth putting in time it'd be useful if getSchemaData() in
> verbose mode printed timing information...

I've been running test cases with log_min_duration_statement = 0,
which serves well enough.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-10-22 15:21:59 Re: Experimenting with hash tables inside pg_dump
Previous Message Magnus Hagander 2021-10-22 14:42:01 Re: parallelizing the archiver