Re: PATCH: optimized DROP of multiple tables within a transaction

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: optimized DROP of multiple tables within a transaction
Date: 2012-12-08 14:49:17
Message-ID: 50C3536D.7080600@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8.12.2012 15:26, Andres Freund wrote:
> On 2012-12-06 23:38:59 +0100, Tomas Vondra wrote:
>> I've re-run the tests with the current patch on my home workstation, and
>> the results are these (again 10k tables, dropped either one-by-one or in
>> batches of 100).
>>
>> 1) unpatched
>>
>> dropping one-by-one: 15.5 seconds
>> dropping in batches of 100: 12.3 sec
>>
>> 2) patched (v3.1)
>>
>> dropping one-by-one: 32.8 seconds
>> dropping in batches of 100: 3.0 sec
>>
>> The problem here is that when dropping the tables one-by-one, the
>> bsearch overhead is tremendous and significantly increases the runtime.
>> I've done a simple check (if dropping a single table, use the original
>> simple comparison) and I got this:
>>
>> 3) patched (v3.2)
>>
>> dropping one-by-one: 16.0 seconds
>> dropping in batches of 100: 3.3 sec
>>
>> i.e. the best of both. But it seems like an unnecessary complexity to me
>> - if you need to drop a lot of tables you'll probably do that in a
>> transaction anyway.
>>
>
> Imo that's still a pretty bad performance difference. And your
> single-table optimization will probably fall short as soon as the table
> has indexes and/or a toast table...

Why? I haven't checked the code but either those objects are droppped
one-by-one (which seems unlikely) or they're added to the pending list
and then the new code will kick in, making it actually faster.

Yes, the original code might be faster even for 2 or 3 objects, but I
can't think of a simple solution to fix this, given that it really
depends on CPU/RAM speed and shared_buffers size.

>> +
>> +/*
>> + * Used to sort relfilenode array (ordered by [relnode, dbnode, spcnode]), so
>> + * that it's suitable for bsearch.
>> + */
>> +static int
>> +rnode_comparator(const void * p1, const void * p2)
>> +{
>> + RelFileNodeBackend n1 = * (RelFileNodeBackend *) p1;
>> + RelFileNodeBackend n2 = * (RelFileNodeBackend *) p2;
>> +
>> + if (n1.node.relNode < n2.node.relNode)
>> + return -1;
>> + else if (n1.node.relNode > n2.node.relNode)
>> + return 1;
>> +
>> + if (n1.node.dbNode < n2.node.dbNode)
>> + return -1;
>> + else if (n1.node.dbNode > n2.node.dbNode)
>> + return 1;
>> +
>> + if (n1.node.spcNode < n2.node.spcNode)
>> + return -1;
>> + else if (n1.node.spcNode > n2.node.spcNode)
>> + return 1;
>> + else
>> + return 0;
>> +}
>
> ISTM that this whole comparator could be replaced by memcmp(). That
> could quite possibly lower the overhead of the bsearch() in simple
> cases. We already rely on the fact that RelFileNode's have no padding
> atm (c.f. buf_internal.h), so a memcmp() seems fine to me.

Good point, I'll try that. The original code used a macro that simply
compared the fields and I copied that logic, but if we can remove the
code ...

Thanks for the review!

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-12-08 14:55:12 Re: Support for REINDEX CONCURRENTLY
Previous Message Andres Freund 2012-12-08 14:41:35 Re: tuplesort memory usage: grow_memtuples