Re: Patch: ResourceOwner optimization for tables with many partitions

From: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch: ResourceOwner optimization for tables with many partitions
Date: 2016-01-25 16:18:27
Message-ID: 20160125191827.615fe277@fujitsu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Tom

> Also, there are defined ways to convert between Datum format and
> other formats (DatumGetPointer() etc). This code isn't using 'em.

Fixed. But I was not sure how to deal with File and Buffer types since
they are ints (see fd.h and buf.h) and there is no DatumGetInt macro,
only DatumGetInt32/Int64. I don't know if there is a good reason for
this. So for now I just added these definitions right into resowner.c:

```
/* Convert File to Datum */
#define FileGetDatum(file) ((Datum)(file))

/* Convert Datum to File */
#define DatumGetFile(datum)((File)(datum))

/* Convert Buffer to Datum */
#define BufferGetDatum(buffer)((Datum)(buffer))

/* Convert Datum to Buffer */
#define DatumGetBuffer(datum)((Buffer)(datum))
```

... to make all code look similar and all intentions --- clear.

I have a feeling you could suggest a better solution :)

> This is certainly not doing what I had in mind for communication
> between ResourceOwnerGetAny and ResourceOwnerRelease, because the
> latter pays zero attention to "lastidx", but instead does a blind
> hash search regardless.
>
> In general, I'm suspicious of the impact of this patch on "normal"
> cases with not-very-large resource arrays. It might be hard to
> measure that because in such cases resowner.c is not a bottleneck;
> but that doesn't mean I want to give up performance in cases that
> perform well today. You could probably set up a test harness that
> exercises ResourceOwnerAdd/Release/etc in isolation and get good
> timings for them that way, to confirm or disprove whether there's
> a performance loss for small arrays.
>
> I still suspect it would be advantageous to have two different
> operating modes depending on the size of an individual resource
> array, and not bother with hashing until you got to more than a
> couple dozen entries. Given that you're rehashing anyway when you
> enlarge the array, this wouldn't cost anything except a few more
> lines of code, ISTM --- and you already have a net code savings
> because of the refactoring.

To be honest I don't have much faith in such micro benchmarks. They
don't consider how code will be executed under real load. Therefore any
results of such a benchmark wouldn't give us a clear answer which
solution is preferable. Besides different users can execute the same
code in different fashion.

I compared two implementations - "always use hashing" (step2a.path) and
"use hashing only for large arrays" (step2b.path). Both patches give
the same performance according to benchmark I described in a first
message of this thread.

Since none of these patches is better in respect of problem I'm trying
to solve I suggest to use step2b.path. It seems to be a good idea not to
use hashing for searching in small arrays. Besides in this case
behaviour of PostgreSQL will be changed less after applying a patch.
Users will probably appreciate this.

Best regards,
Aleksander

Attachment Content-Type Size
resource-owner-optimization-v5-step1.patch text/x-patch 30.2 KB
resource-owner-optimization-v5-step2a.patch text/x-patch 7.3 KB
resource-owner-optimization-v5-step2b.patch text/x-patch 7.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-01-25 16:29:31 Re: [patch] Proposal for \crosstabview in psql
Previous Message Shulgin, Oleksandr 2016-01-25 16:11:17 Re: More stable query plans via more predictable column statistics