Re: [POC] hash partitioning

From: amul sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Thom Brown <thom(at)linux(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>, David Steele <david(at)pgmasters(dot)net>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [POC] hash partitioning
Date: 2017-11-01 10:16:27
Message-ID: CAAJ_b96fhpJAP=ALbETmeLk1Uni_GFZD938zgenhF49qgDTjaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 31, 2017 at 10:17 AM, amul sul <sulamul(at)gmail(dot)com> wrote:
> On Tue, Oct 31, 2017 at 9:54 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Oct 30, 2017 at 5:52 PM, amul sul <sulamul(at)gmail(dot)com> wrote:
>>> Actually, int4[] is also inappropriate type as we have started using a 64bit
>>> hash function. We need something int8[] which is not available, so that I
>>> have used ANYARRAYOID in the attached patch(0004).
>>
>> I don't know why you think int8[] is not available.
>>
>> rhaas=# select 'int8[]'::regtype;
>> regtype
>> ----------
>> bigint[]
>> (1 row)
>>
>
> I missed _int8, was searching for INT8ARRAYOID in pg_type.h, my bad.
>

Fixed in the 0003 patch.

>>>>[....]
>>> Something similar I've tried in my initial patch version[1], but I have missed
>>> user specified opclass handling for each partitioning column. Do you want me
>>> to handle opclass using RelabelType node? I am afraid that, that would make
>>> the \d+ output more horrible than the current one if non-default opclass used.
>>
>> Maybe we should just pass the OID of the partition (or both the
>> partition and the parent, so we can get the lock ordering right?)
>> instead.
>>
> Okay, will try this.
>

In 0005, I rewrote satisfies_hash_partition, to accept parent id, modulus and
remainder as before, and the column values directly. This function opens parent
relation to get its PartitionKey which has extended hash function information in
a partsupfunc array, using this it will calculates a hash for the partition key.
Also, it will copy this partsupfunc array into function memory context so that
we don't need to open parent relation again and again in the subsequent function
call to get extended hash functions information (e.g. bulk insert).

In \d+ partition constraint will be :
satisfies_hash_partition('16384'::oid, 2, 0, a, b)
where 16384 is parent relid, 2 is modulus, 0 is remainder and 'a' &
'b' are partition
column.

In the earlier version partition constraint was (i.e. without 0005 patch):
satisfies_hash_partition(2, 0,
hashint4extended(a,'8816678312871386365'::bigint),
hashtextextended(b, '8816678312871386365'::bigint))

I did small performance test using a copy command to load 100,000,000 records
and a separate insert command for each record to load 2,00,000 records and
result are as follow:

+---------+-----------------+--------------------+
| Command | With 0005 patch | Without 0005 patch |
+---------+-----------------+--------------------+
| COPY | 63.719 seconds | 64.925 seconds |
+---------+-----------------+--------------------+
| INSERT | 179.21 seconds | 174.89 seconds |
+---------+-----------------+--------------------+

Although partition constraints become more simple, there isn't any performance
gain with 0005 patch. Also I am little skeptic about logic in 0005 where we
copied extended hash function info from the partition key, what if parent is
changed while we are using it? Do we need to keep lock on parent until commit in
satisfies_hash_partition?

Regards,
Amul

Attachment Content-Type Size
0003-hash-partitioning_another_design-v27.patch application/octet-stream 89.3 KB
0004-Enable-partition-wise-join-support-v4.patch application/octet-stream 10.8 KB
0005-satisfies_hash_partition-signature-change-WIP.patch application/octet-stream 7.8 KB
0001-Add-PG_GETARG_UINT64-macro.patch application/octet-stream 957 bytes
0002-partition_bounds_copy-code-refactoring-v1.patch application/octet-stream 2.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-11-01 10:23:47 Re: Dynamic result sets from procedures
Previous Message Michael Paquier 2017-11-01 09:04:43 Commit fest 2017-11