Re: Patch: ResourceOwner optimization for tables with many partitions

From: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: ResourceOwner optimization for tables with many partitions
Date: 2015-12-24 08:19:25
Message-ID: 20151224111925.56e267e9@fujitsu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I believe I fixed all flaws mentioned so far (see attachment).

Also I did a new benchmark to make sure that new patch makes PostgreSQL
work faster and doesn't cause performance degradation in some cases.

"usual pgbench" row corresponds to `pgbench -j 8 -c 8 -T 30 pgbench`
performed on a 4-core PC.

"N partitions" rows correspond to a benchmark described in a first
message of this thread performed on the same PC. N is and argument
given to gen.pl script i.e. number of partitions in generated
partitioned table.

Here are results (3 tests, TPS excluding connections establishing):

Test | Before | After | Delta avg
----------------|-----------|-----------|-------------
| 295.7 | 295.0 |
usual pgbench | 303.1 | 299.6 | ~ 0%
| 297.7 | 302.7 |
----------------|-----------|-----------|-------------
| 28022.3 | 27956.1 |
10 partitions | 27550.1 | 28916.9 | ~ 0%
| 28617.0 | 28022.9 |
----------------|-----------|-----------|-------------
| 3021.4 | 3184.0 |
100 partitions | 2949.1 | 3120.1 | 3% more TPS
| 2870.6 | 2825.2 |
----------------|-----------|-----------|-------------
| 106.7 | 158.6 |
1000 partitions | 105.2 | 168.4 | 53% more TPS
| 105.9 | 162.0 |

On Fri, 18 Dec 2015 13:39:01 -0500
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I don't know, I'm still not very comfortable with this. And Tom
> > didn't like dictating that hash_any() must be no-fail, though I'm
> > not sure why.
>
> What I definitely didn't like was assuming at a distance that it would
> be no-fail. If we're to depend on that, the patch had better attach
> a comment saying so to the header comments of the function(s) it's
> assuming that about. Otherwise, somebody could hack up hashfunc.c
> in a way that breaks the assumption, without any clue that some code
> in a very-far-away module is critically reliant on it.
>
> > Let's wait to see what others think.
>
> A few observations:
>
> * This bit is too cute by half, if not three-quarters:
>
> + uint32 itemsizelg:2; /* sizeof one
> item log 2 */
> + uint32 capacity:30; /* capacity of
> array */
>
> Is there a good reason to assume that the only things we'll ever store
> in these arrays are of size no more than 8 bytes? Are we so desperate
> to save space that we cannot spare two separate words for itemsize and
> capacity? (ISTM it's a good bet that the extra code for accessing
> these bitfields occupies more space than would be saved, considering
> how few ResourceOwners typically exist at one time.) Let's just make
> it a couple of ints and be done. Actually, maybe nitems and capacity
> should be size_t, just in case.
>
> * An alternative design would be to forget itemsizelg altogether and
> insist that everything stored in the resource arrays be a Datum,
> which could then be coerced to/from some form of integer or some form
> of pointer as appropriate. That would waste some space in the int
> case, but it would considerably simplify both the ResourceArray code
> and the APIs to it, which might be worth the price of assuming we'll
> never store anything bigger than 8 bytes. It also would make this
> look more like some existing APIs such as the on_exit callbacks.
>
> * A lot of the code churn comes from the insistence on defining
> callbacks, which I'm dubious that we need. We could instead have a
> function that is "get any convenient one of the array elements" and
> revise the loops in ResourceOwnerReleaseInternal to be like
>
> while ((item = getconvenientitem(resourcearray)))
> {
> drop item in exactly the same way as before
> }
>
> I find that preferable to the proposed ResourceArrayRemoveAll
>
> + while (resarr->nitems > 0)
> + {
> + releasecb(resarr->itemsarr, isCommit);
> + }
>
> which certainly looks like it's an infinite loop; it's assuming (again
> with no documentation) that the callback function will cause the array
> to get smaller somehow. With the existing coding, it's much more
> clear why we think the loops will terminate.
>
> * The reason that ResourceOwnerReleaseInternal was not horribly
> inefficient was that its notion of "any convenient one" of the items
> to be deleted next was in fact the one that the corresponding Forget
> function would examine first, thus avoiding an O(N^2) cost to
> re-identify the item to be dropped. I think we should make an effort
> to be more explicit about that connection in any rewrite. In
> particular, it looks to me like when a hash array is in use, things
> will get slower not faster because we'll be adding a hash lookup step
> to each forget operation. Maybe we should consider adjusting the
> APIs so that that can be avoided. Or possibly we could have internal
> state in the ResourceArrays that says "we expect this item to be
> dropped in a moment, check that before going to the trouble of a hash
> lookup".
>
> * Actually, I'm not convinced that the proposed reimplementation of
> ResourceArrayRemove isn't horribly slow much of the time. It sure
> looks like it could degrade to a linear search very easily.
>
> * I still say that the assumption embodied as
> RESOURCE_ARRAY_ZERO_ELEMENT (ie that no valid entry is all-zero-bits)
> is pretty unacceptable. It might work for pointers, but I don't like
> it for resources represented by integer indexes.
>
> regards, tom lane
>
>

Attachment Content-Type Size
resource-owner-optimization-v4-step1.patch text/x-patch 32.4 KB
resource-owner-optimization-v4-step2.patch text/x-patch 7.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2015-12-24 08:57:19 Re: Remove Windows crash dump support?
Previous Message Michael Paquier 2015-12-24 07:45:08 Typo in pg_rewind.sgml