From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|

To: | Peter Geoghegan <pg(at)bowt(dot)ie> |

Cc: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |

Subject: | Re: Parallel tuplesort (for parallel B-Tree index creation) |

Date: | 2017-02-03 13:04:55 |

Message-ID: | CAEepm=1_LTOeGGJr2BquEwUJmBnT6PtEQx5GmD=i42GvVS3bcQ@mail.gmail.com |

Views: | Raw Message | Whole Thread | Download mbox |

Thread: | |

Lists: | pgsql-hackers |

On Wed, Feb 1, 2017 at 8:46 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:

> On Tue, Jan 31, 2017 at 11:23 PM, Thomas Munro

> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:

>> So I'm in favour of this patch, which is relatively simple and give us

>> faster index builds soon. Eventually we might also be able to have

>> approach 1. From what I gather, it's entirely possible that we might

>> still need 2 to fall back on in some cases.

>

> Right. And it can form the basis of an implementation of 1, which in

> any case seems to be much more compelling for parallel query, when a

> great deal more can be pushed down, and we are not particularly likely

> to be I/O bound (usually not much writing to the heap, or WAL

> logging).

I ran some tests today. First I created test tables representing the

permutations of these choices:

Table structure:

int = Integer key only

intwide = Integer key + wide row

text = Text key only (using dictionary words)

textwide = Text key + wide row

Uniqueness:

u = each value unique

d = 10 duplicates of each value

Heap physical order:

rand = Random

asc = Ascending order (already sorted)

desc = Descending order (sorted backwards)

I used 10 million rows for this test run, so that gave me 24 tables of

the following sizes as reported in "\d+":

int tables = 346MB each

intwide tables = 1817MB each

text tables = 441MB each

textwide tables = 1953MB each

It'd be interesting to test larger tables of course but I had a lot of

permutations to get through.

For each of those tables I ran tests corresponding to the permutations

of these three variables:

Index type:

uniq = CREATE UNIQUE INDEX ("u" tables only, ie no duplicates)

nonu = CREATE INDEX ("u" and "d" tables)

Maintenance memory: 1M, 64MB, 256MB, 512MB

Workers: from 0 up to 8

Environment: EDB test machine "cthulhu", Intel(R) Xeon(R) CPU E7-

8830 @ 2.13GHz, 8 socket, 8 cores (16 threads) per socket, CentOS

7.2, Linux kernel 3.10.0-229.7.2.el7.x86_64, 512GB RAM, pgdata on SSD.

Database initialised with en_US.utf-8 collation, all defaults except

max_wal_size increased to 4GB (otherwise warnings about too frequent

checkpoints) and max_parallel_workers_maintenance = 8. Testing done

with warm OS cache.

I applied your v2 patch on top of

7ac4a389a7dbddaa8b19deb228f0a988e79c5795^ to avoid a conflict. It

still had a couple of harmless conflicts that I was able to deal with

(not code, just some header stuff moving around).

See full results from all permutations attached, but I wanted to

highlight the measurements from 'textwide', 'u', 'nonu' which show

interesting 'asc' numbers (data already sorted). The 'mem' column is

maintenance_work_mem in megabytes. The 'w = 0' column shows the time

in seconds for parallel_workers = 0. The other 'w = N' columns show

times with higher parallel_workers settings, represented as speed-up

relative to the 'w = 0' time.

1. 'asc' = pre-sorted data (w = 0 shows time in seconds, other columns

show speed-up relative to that time):

mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8

-----+--------+-------+-------+-------+-------+-------+-------+-------+-------

1 | 119.97 | 4.61x | 4.83x | 5.32x | 5.61x | 5.88x | 6.10x | 6.18x | 6.09x

64 | 19.42 | 1.18x | 1.10x | 1.23x | 1.23x | 1.16x | 1.19x | 1.20x | 1.21x

256 | 18.35 | 1.02x | 0.92x | 0.98x | 1.02x | 1.06x | 1.07x | 1.08x | 1.10x

512 | 17.75 | 1.01x | 0.89x | 0.95x | 0.99x | 1.02x | 1.05x | 1.06x | 1.07x

2. 'rand' = randomised data:

mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8

-----+--------+-------+-------+-------+-------+-------+-------+-------+-------

1 | 130.25 | 1.82x | 2.19x | 2.52x | 2.58x | 2.72x | 2.72x | 2.83x | 2.89x

64 | 117.36 | 1.80x | 2.20x | 2.43x | 2.47x | 2.55x | 2.51x | 2.59x | 2.69x

256 | 124.68 | 1.87x | 2.20x | 2.49x | 2.52x | 2.64x | 2.70x | 2.72x | 2.75x

512 | 115.77 | 1.51x | 1.72x | 2.14x | 2.08x | 2.19x | 2.31x | 2.44x | 2.48x

3. 'desc' = reverse-sorted data:

mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8

-----+--------+-------+-------+-------+-------+-------+-------+-------+-------

1 | 115.19 | 1.88x | 2.39x | 2.78x | 3.50x | 3.62x | 4.20x | 4.19x | 4.39x

64 | 112.17 | 1.85x | 2.25x | 2.99x | 3.63x | 3.65x | 4.01x | 4.31x | 4.62x

256 | 119.55 | 1.76x | 2.21x | 2.85x | 3.43x | 3.37x | 3.77x | 4.24x | 4.28x

512 | 119.50 | 1.85x | 2.19x | 2.87x | 3.26x | 3.28x | 3.74x | 4.24x | 3.93x

The 'asc' effects are much less pronounced when the key is an int.

Here is the equivalent data for 'intwide', 'u', 'nonu':

1. 'asc'

mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8

-----+-------+-------+-------+-------+-------+-------+-------+-------+-------

1 | 12.19 | 1.55x | 1.93x | 2.21x | 2.44x | 2.64x | 2.76x | 2.91x | 2.83x

64 | 7.35 | 1.29x | 1.53x | 1.69x | 1.86x | 1.98x | 2.04x | 2.07x | 2.09x

256 | 7.34 | 1.26x | 1.47x | 1.64x | 1.79x | 1.92x | 1.96x | 1.98x | 2.02x

512 | 7.24 | 1.24x | 1.46x | 1.65x | 1.80x | 1.91x | 1.97x | 2.00x | 1.92x

2. 'rand'

mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8

-----+-------+-------+-------+-------+-------+-------+-------+-------+-------

1 | 15.16 | 1.56x | 2.01x | 2.32x | 2.57x | 2.73x | 2.87x | 2.95x | 2.91x

64 | 12.97 | 1.55x | 1.97x | 2.25x | 2.44x | 2.58x | 2.70x | 2.74x | 2.71x

256 | 13.14 | 1.47x | 1.86x | 2.12x | 2.31x | 2.50x | 2.62x | 2.58x | 2.69x

512 | 13.61 | 1.48x | 1.91x | 2.22x | 2.37x | 2.55x | 2.65x | 2.73x | 2.73x

3. 'desc'

mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8

-----+-------+-------+-------+-------+-------+-------+-------+-------+-------

1 | 13.45 | 1.51x | 1.94x | 2.31x | 2.56x | 2.75x | 2.95x | 3.05x | 3.00x

64 | 10.27 | 1.42x | 1.82x | 2.05x | 2.30x | 2.46x | 2.59x | 2.64x | 2.65x

256 | 10.52 | 1.39x | 1.70x | 2.02x | 2.24x | 2.34x | 2.39x | 2.48x | 2.56x

512 | 10.62 | 1.43x | 1.82x | 2.06x | 2.32x | 2.51x | 2.61x | 2.68x | 2.69x

Full result summary and scripts used for testing attached.

--

Thomas Munro

http://www.enterprisedb.com

Attachment | Content-Type | Size |
---|---|---|

speedup.txt | text/plain | 15.6 KB |

make_test_data.sh | application/x-sh | 2.6 KB |

run_tests.sh | application/x-sh | 1.2 KB |

- Re: Parallel tuplesort (for parallel B-Tree index creation) at 2017-02-01 07:46:43 from Peter Geoghegan

- Re: Parallel tuplesort (for parallel B-Tree index creation) at 2017-02-03 19:12:42 from Peter Geoghegan
- Re: Parallel tuplesort (for parallel B-Tree index creation) at 2017-02-03 22:58:11 from Peter Geoghegan

From | Date | Subject | |
---|---|---|---|

Next Message | Fabien COELHO | 2017-02-03 13:24:33 | Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless) |

Previous Message | Heikki Linnakangas | 2017-02-03 12:52:52 | Re: Password identifiers, protocol aging and SCRAM protocol |