From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | CLUSTER, reform_and_rewrite_tuple(), and parallelism |
Date: | 2016-08-17 23:12:09 |
Message-ID: | CAM3SWZTCU6DCgvMFzA1+=Os7NViiDM65Jkc36RCJqvp0ZEBAFw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
During preliminary analysis of what it would take to produce a
parallel CLUSTER patch that is analogous of what I came up with for
CREATE INDEX, which in general seems quite possible, I identified
reform_and_rewrite_tuple() as a major bottleneck for the current
CLUSTER implementation.
Excluding the cost of the subsequent REINDEX of the clustered-on
index, reform_and_rewrite_tuple() appears to account for roughly 25% -
35% of both the cache misses, and instructions executed, for my test
case (this used a tuplesort, not an indexscan on the old heap
relation, of course). Merging itself was far less expensive (with my
optimization of how the heap is maintained during merging + 16
tapes/runs), so it would be reasonable to not parallelize that part,
just as it was for parallel CREATE INDEX. I don't think that it's
reasonable to not do anything about this reform_and_rewrite_tuple()
bottleneck, though.
Does anyone have any ideas on how to:
1). Directly address the reform_and_rewrite_tuple() bottleneck.
and/or:
2). Push down some or all of the reform_and_rewrite_tuple() work till
before tuples are passed to the tuplesort.
"2" would probably make it straightforward to have
reform_and_rewrite_tuple() work occur in parallel workers instead,
which buys us a lot.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2016-08-17 23:16:35 | Re: CLUSTER, reform_and_rewrite_tuple(), and parallelism |
Previous Message | Tom Lane | 2016-08-17 22:37:42 | Re: regexp_match() returning text |