| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | Amit Langote <amitlangote09(at)gmail(dot)com> | 
| Cc: | Imai Yoshikazu <yoshikazu_i443(at)live(dot)jp>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, "jesper(dot)pedersen(at)redhat(dot)com" <jesper(dot)pedersen(at)redhat(dot)com>, "Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: speeding up planning with partitions | 
| Date: | 2019-03-31 17:04:27 | 
| Message-ID: | 3529.1554051867@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Amit Langote <amitlangote09(at)gmail(dot)com> writes:
> On Sun, Mar 31, 2019 at 11:45 AM Imai Yoshikazu <yoshikazu_i443(at)live(dot)jp> wrote:
>> Certainly, using bitmapset contributes to the performance when scanning
>> one partition(few partitions) from large partitions.
> Thanks Imai-san for testing.
I tried to replicate these numbers with the code as-committed, and
could not.  What I get, using the same table-creation code as you
posted and a pgbench script file like
\set param random(1, :N)
select * from rt where a = :param;
is scaling like this:
N tps, range tps, hash
2	10520.519932	10415.230400
8	10443.361457	10480.987665
32	10341.196768	10462.551167
128	10370.953849	10383.885128
512	10207.578413	10214.049394
1024	10042.794340	10121.683993
4096	8937.561825	9214.993778
8192	8247.614040	8486.728918
If I use "-M prepared" the numbers go up a bit for lower N, but
drop at high N:
N tps, range tps, hash
2	11449.920527	11462.253871
8	11530.513146	11470.812476
32	11372.412999	11450.213753
128	11289.351596	11322.698856
512	11095.428451	11200.683771
1024	10757.646108	10805.052480
4096	8689.165875	8930.690887
8192	7301.609147	7502.806455
Digging into that, it seems like the degradation with -M prepared is
mostly in LockReleaseAll's hash_seq_search over the locallock hash table.
What I think must be happening is that with -M prepared, at some point the
plancache decides to try a generic plan, which causes opening/locking all
the partitions, resulting in permanent bloat in the locallock hash table.
We immediately go back to using custom plans, but hash_seq_search has
more buckets to look through for the remainder of the process' lifetime.
I do see some cycles getting spent in apply_scanjoin_target_to_paths
that look to be due to scanning over the long part_rels array,
which your proposal would ameliorate.  But (a) that's pretty small
compared to other effects, and (b) IMO, apply_scanjoin_target_to_paths
is a remarkable display of brute force inefficiency to begin with.
I think we should see if we can't nuke that function altogether in
favor of generating the paths with the right target the first time.
BTW, the real elephant in the room is the O(N^2) cost of creating
these tables in the first place.  The runtime for the table-creation
scripts looks like
N range hash
2	0m0.011s	0m0.011s
8	0m0.015s	0m0.014s
32	0m0.032s	0m0.030s
128	0m0.132s	0m0.099s
512	0m0.969s	0m0.524s
1024	0m3.306s	0m1.442s
4096	0m46.058s	0m15.522s
8192	3m11.995s	0m58.720s
This seems to be down to the expense of doing RelationBuildPartitionDesc
to rebuild the parent's relcache entry for each child CREATE TABLE.
Not sure we can avoid that, but maybe we should consider adopting a
cheaper-to-read representation of partition descriptors.  The fact that
range-style entries seem to be 3X more expensive to load than hash-style
entries is strange.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrey Borodin | 2019-03-31 17:52:31 | Re: Google Summer of Code: question about GiST API advancement project | 
| Previous Message | Nikolay Petrov | 2019-03-31 16:51:54 | Re: [HACKERS][Proposal] LZ4 Compressed Storage Manager |