Increasing default value for effective_io_concurrency?

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Increasing default value for effective_io_concurrency?
Date: 2019-06-29 20:15:19
Message-ID: 20190629201519.fzbeyd3jrf4ngmqy@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I think we should consider changing the effective_io_concurrency default
value, i.e. the guc that determines how many pages we try to prefetch in
a couple of places (the most important being Bitmap Heap Scan).

The default is 1 since forever, but from my experience hardly the right
value, no matter what storage system you use. I've always ended up with
values that are either 0 (so, disabled prefetching) or significantly
higher (at least 8 or 16). In fact, e_i_c=1 can easily be detrimental
depending on the workload and storage system.

Which is an issue, because people often don't know how to tune this and
I see systems with the default value quite often.

So I do propose to increase the defaut to a value between 4 and 16.

I'm hardly the first person to notice this, as illustrated for example
by this [1] post by Merlin Moncure on pgsql-hackers from 2017, which
measured this behavior on Intel S3500 SSD:

effective_io_concurrency 1: 46.3 sec, ~ 170 mb/sec peak via iostat
effective_io_concurrency 2: 49.3 sec, ~ 158 mb/sec peak via iostat
effective_io_concurrency 4: 29.1 sec, ~ 291 mb/sec peak via iostat
effective_io_concurrency 8: 23.2 sec, ~ 385 mb/sec peak via iostat
effective_io_concurrency 16: 22.1 sec, ~ 409 mb/sec peak via iostat
effective_io_concurrency 32: 20.7 sec, ~ 447 mb/sec peak via iostat
effective_io_concurrency 64: 20.0 sec, ~ 468 mb/sec peak via iostat
effective_io_concurrency 128: 19.3 sec, ~ 488 mb/sec peak via iostat
effective_io_concurrency 256: 19.2 sec, ~ 494 mb/sec peak via iostat

That's just one anecdotal example of behavior, of course, so I've
decided to do a couple of tests on different storage systems. Attached
is a couple of scripts I used to generate synthetic data sets with data
laid out in different patterns (random vs. regular), and running queries
scanning various fractions of the table (1%, 5%, ...) using plans using
bitmap index scans.

I've done that on three different storage systems:

1) SATA RAID (3 x 7.2k drives in RAID0)
2) SSD RAID (6 x SATA SSD in RAID0)
3) NVMe drive

Attached is a spreadsheet with a summary of results fo the tested cases.
In general, the data support what I already wrote above - the current
default is pretty bad.

In some cases it helps a bit, but a bit higher value (4 or 8) performs
significantly better. Consider for example this "sequential" data set
from the 6xSSD RAID system (x-axis shows e_i_c values, pct means what
fraction of pages matches the query):

pct 0 1 4 16 64 128
---------------------------------------------------------------
1 25990 18624 3269 2219 2189 2171
5 88116 60242 14002 8663 8560 8726
10 120556 99364 29856 17117 16590 17383
25 101080 184327 79212 47884 46846 46855
50 130709 309857 163614 103001 94267 94809
75 126516 435653 248281 156586 139500 140087

compared to the e_i_c=0 case, it looks like this:

pct 1 4 16 64 128
----------------------------------------------------
1 72% 13% 9% 8% 8%
5 68% 16% 10% 10% 10%
10 82% 25% 14% 14% 14%
25 182% 78% 47% 46% 46%
50 237% 125% 79% 72% 73%
75 344% 196% 124% 110% 111%

So for 1% of the table the e_i_c=1 is faster by about ~30%, but with
e_i_c=4 (or more) it's ~10x faster. This is a fairly common pattern, not
just on this storage system.

The e_i_c=1 can perform pretty poorly, especially when the query matches
large fraction of the table - for example in this example it's 2-3x
slower compared to no prefetching, and higher e_i_c values limit the
damage quite a bit.

It's not entirely terrible because in most cases those queries would use
seqscan (the benchmark forces queries to use bitmap heap scan), but it's
not something we can ignore either because of possible underestimates.

Furthermore, there are cases with much worse behavior. For example, one
of the tests on SATA RAID behaves like this:

pct 1 4 16 64 128
----------------------------------------------------
1 147% 101% 61% 52% 55%
5 180% 106% 71% 71% 70%
10 208% 106% 73% 80% 79%
25 225% 118% 84% 96% 86%
50 234% 123% 91% 102% 95%
75 241% 127% 94% 103% 98%

Pretty much all cases are significantly slower with e_i_c=1.

Of course, I'm sure there may be other things to consider. For example,
these tests were done in isolation, while on actual systems there will
be other queries running concurrently (and those may also generate I/O).

regards

[1] https://www.postgresql.org/message-id/flat/55AA2469.20306%40dalibo.com#dda46134fb309ae09233b1547411c029

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
prefetch.tgz application/gzip 1.3 KB
prefetch-results.ods application/vnd.oasis.opendocument.spreadsheet 36.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-06-29 20:36:51 Where is SSPI auth username determined for TAP tests?
Previous Message Peter Eisentraut 2019-06-29 20:05:22 base backup client as auxiliary backend process