Quick Links

Re: Allow a per-tablespace effective_io_concurrency setting

From:	Julien Rouhaud <julien(dot)rouhaud(at)dalibo(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject:	Re: Allow a per-tablespace effective_io_concurrency setting
Date:	2015-09-02 19:12:41
Message-ID:	55E74A29.5030804@dalibo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

On 02/09/2015 18:06, Tomas Vondra wrote:
> Hi
>
> On 09/02/2015 03:53 PM, Andres Freund wrote:
>>
>> Hi,
>>
>> On 2015-07-18 12:17:39 +0200, Julien Rouhaud wrote:
>>> I didn't know that the thread must exists on -hackers to be
>>> able to add a commitfest entry, so I transfer the thread here.
>>
>> Please, in the future, also update the title of the thread to
>> something fitting.
>>

Sorry for that.

>>> @@ -539,6 +541,9 @@ ExecInitBitmapHeapScan(BitmapHeapScan
>>> *node, EState *estate, int eflags) { BitmapHeapScanState
>>> *scanstate; Relation currentRelation; +#ifdef USE_PREFETCH +
>>> int new_io_concurrency; +#endif
>>>
>>> /* check for unsupported flags */ Assert(!(eflags &
>>> (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK))); @@ -598,6 +603,25 @@
>>> ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate,
>>> int eflags) */ currentRelation = ExecOpenScanRelation(estate,
>>> node->scan.scanrelid, eflags);
>>>
>>> +#ifdef USE_PREFETCH + /* check if the
>>> effective_io_concurrency has been overloaded for the + *
>>> tablespace storing the relation and compute the
>>> target_prefetch_pages, + * or just get the current
>>> target_prefetch_pages + */ + new_io_concurrency =
>>> get_tablespace_io_concurrency( +
>>> currentRelation->rd_rel->reltablespace); + + +
>>> scanstate->target_prefetch_pages = target_prefetch_pages; + +
>>> if (new_io_concurrency != effective_io_concurrency) + { +
>>> double prefetch_pages; + if
>>> (compute_io_concurrency(new_io_concurrency, &prefetch_pages)) +
>>> scanstate->target_prefetch_pages = rint(prefetch_pages); +
>>> } +#endif
>>
>> Maybe it's just me - but imo there should be as few USE_PREFETCH
>> dependant places in the code as possible. It'll just be 0 when
>> not supported, that's fine?
>
> It's not just you. Dealing with code with plenty of ifdefs is
> annoying - it compiles just fine most of the time, until you
> compile it with some rare configuration. Then it either starts
> producing strange warnings or the compilation fails entirely.
>
> It might make a tiny difference on builds without prefetching
> support because of code size, but realistically I think it's
> noise.
>
>> Especially changing the size of externally visible structs
>> depending ona configure detected ifdef seems wrong to me.
>
> +100 to that
>

I totally agree. I'll remove the ifdefs.

>> Nitpick: Wrong comment style (/* stands on its own line).

I did run pgindent before submitting patch, but apparently I picked
the wrong one. Already fixed in local branch.

>>> + /*---------- + * The user-visible GUC parameter is the
>>> number of drives (spindles), + * which we need to translate
>>> to a number-of-pages-to-prefetch target. + * The target
>>> value is stashed in *extra and then assigned to the actual +
>>> * variable by assign_effective_io_concurrency. + * + *
>>> The expected number of prefetch pages needed to keep N drives
>>> busy is: + * + * drives | I/O requests + *
>>> -------+---------------- + * 1 | 1 + *
>>> 2 | 2/1 + 2/2 = 3 + * 3 | 3/1 + 3/2 + 3/3 = 5
>>> 1/2 + * 4 | 4/1 + 4/2 + 4/3 + 4/4 = 8 1/3 + *
>>> n | n * H(n)
>>
>> I know you just moved this code. But: I don't buy this formula.
>> Like at all. Doesn't queuing and reordering entirely invalidate
>> the logic here?
>
> Well, even the comment right next after the formula says that:
>
> * Experimental results show that both of these formulas aren't *
> aggressiveenough, but we don't really have any better proposals.
>
> That's the reason why users generally either use 0 or some rather
> high value (16 or 32 are the most common values see). The problem
> is that we don't really care about the number of spindles (and not
> just because SSDs don't have them at all), but about the target
> queue length per device. Spinning rust uses TCQ/NCQ to optimize the
> head movement, SSDs are parallel by nature (stacking multiple chips
> with separate channels).
>
> I doubt we can really improve the formula, except maybe for saying
> "we want 16 requests per device" and multiplying the number by
> that. We don't really have the necessary introspection to determine
> better values (and it's not really possible, because the devices
> may be hidden behind a RAID controller (or a SAN). So we can't
> really do much.
>
> Maybe the best thing we can do is just completely abandon the
> "number of spindles" idea, and just say "number of I/O requests to
> prefetch". Possibly with an explanation of how to estimate it
> (devices * queue length).
>
>> I think that'd be a lot better.

+1 for that too.

If everone's ok with this change, I can submit a patch for that too.
Should I split that into two patches, and/or start a new thread?

- --
Julien Rouhaud
http://dalibo.com - http://dalibo.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iQEcBAEBAgAGBQJV50opAAoJELGaJ8vfEpOqve4H/0ZJCoFb0wHtArkGye6ietks
9uahdJy5ixO4J+AZsf2mVxV/DZK7dhK8rWIXt6yS3kfYfPDB79cRFWU5EgjEGAHB
qcB7wXCa5HibqLySgQct3zhVDj3CN3ucKT3LVp9OC9mrH2mnGtAp7PYkjd/HqBwd
AzW05pu21Ivi/z2gUBOdxNEEDxCLu8T1OtYq3WY9l7Yc4HfLG3huYLQD2LoRFRFn
lWwhQifML6uKzz7w3MfZrK4i2ffGGv/r1afHcpZvN3UsB5te1fSzr8KcUeJL7+Zg
xJTKwppiEMHpxokn5lw4LzYkjYD7W1fvwLnJhzRrs7dXGPl6rZtLmasyCld4FVk=
=r2jE
-----END PGP SIGNATURE-----

In response to

Re: Allow a per-tablespace effective_io_concurrency setting at 2015-09-02 16:06:54 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2015-09-02 19:17:15	Re: Fwd: Core dump with nested CREATE TEMP TABLE
Previous Message	Josh Berkus	2015-09-02 19:03:36	Re: Horizontal scalability/sharding

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Jeff Janes	2015-09-02 19:29:07	Re: query with pg_trgm sometimes very slow
Previous Message	Greg Stark	2015-09-02 18:49:13	Re: Allow a per-tablespace effective_io_concurrency setting