Re: [HACKERS] maintenance_work_mem and CREATE INDEX time

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] maintenance_work_mem and CREATE INDEX time
Date: 2013-07-30 00:29:22
Message-ID: CA+HiwqHM4G8meKARFYMATnXKPKz1DVR856Ot7y8kzeZZrsuLcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Hi Jeff,

On Tue, Jul 30, 2013 at 3:25 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Tue, Jul 23, 2013 at 10:56 PM, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>> On Wed, Jul 24, 2013 at 3:20 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>>>
>>> The heap structure used in external sorts is cache-unfriendly. The
>>> bigger the heap used, the more this unfriendliness becomes apparent.
>>> And the bigger maintenance_work_mem, the bigger the heap used.
>>>
>>> The bigger heap also means you have fewer "runs" to merge in the
>>> external sort. However, as long as the number of runs still fits in
>>> the same number of merge passes, this is generally not a meaningful
>>> difference.
>>
>> Does fewer runs mean more time (in whichever phase of external sort)?
>
> That's complicated. In general fewer runs are faster, as the heap
> used at that stage is smaller. But this difference is small. If you
> can get the number of runs down to a level that needs fewer passes
> over the data, that will make things faster. But this is rare. If
> the sort isn't already being done in a single pass, then your sort
> must be huge or your working memory setting is pathologically tiny.
>
> There is a rough conservation of total heap layers between the two
> phases: the initial tuple heap, and the merge stage heap-of-tapes.
> Say for example that by increasing work_mem, you can increase the
> initial heap from 25 layers to 27 layers, while decreasing the merge
> phase heap from 5 layers to 3 layers. The total number of comparisons
> for the entire sort will be about the same, but the comparisons across
> the 27 layer heap are much more likely to need to go to main RAM,
> rather than come from L3 cache (or whatever the cache level is).
>

If I my assumption that fewer runs mean longer runs is plausible, may
it be correct to think that performsort step (performsort_done -
performsort_starting) time increases when such longer runs are created
due to larger workMem?

--
Amit Langote

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Sergey Konoplev 2013-07-30 03:54:25 Re: Fastest Index/Algorithm to find similar sentences
Previous Message Klaus Ita 2013-07-29 22:37:27 Fwd: corrupted files

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-07-30 00:56:27 Re: Bison 3.0 updates
Previous Message Stephen Frost 2013-07-30 00:24:17 Re: [COMMITTERS] pgsql: WITH CHECK OPTION support for auto-updatable VIEWs