Re: Speed up Clog Access by increasing CLOG buffers

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Date: 2015-10-05 01:04:17
Message-ID: CAMkU=1yLzEBi3w-zsAMzyYvDs-FM1p_AiUu9=0d67u0fULWgqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 11, 2015 at 8:01 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:

> On Fri, Sep 11, 2015 at 9:21 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> >
> > On Fri, Sep 11, 2015 at 10:31 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > > > Could you perhaps try to create a testcase where xids are accessed
> that
> > > > are so far apart on average that they're unlikely to be in memory?
> And
> > > > then test that across a number of client counts?
> > > >
> > >
> > > Now about the test, create a table with large number of rows (say
> 11617457,
> > > I have tried to create larger, but it was taking too much time (more
> than a day))
> > > and have each row with different transaction id. Now each transaction
> should
> > > update rows that are at least 1048576 (number of transactions whose
> status can
> > > be held in 32 CLog buffers) distance apart, that way ideally for each
> update it will
> > > try to access Clog page that is not in-memory, however as the value to
> update
> > > is getting selected randomly and that leads to every 100th access as
> disk access.
> >
> > What about just running a regular pgbench test, but hacking the
> > XID-assignment code so that we increment the XID counter by 100 each
> > time instead of 1?
> >
>
> If I am not wrong we need 1048576 number of transactions difference
> for each record to make each CLOG access a disk access, so if we
> increment XID counter by 100, then probably every 10000th (or multiplier
> of 10000) transaction would go for disk access.
>
> The number 1048576 is derived by below calc:
> #define CLOG_XACTS_PER_BYTE 4
> #define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)
>

> Transaction difference required for each transaction to go for disk access:
> CLOG_XACTS_PER_PAGE * num_clog_buffers.
>

That guarantees that every xid occupies its own 32-contiguous-pages chunk
of clog.

But clog pages are not pulled in and out in 32-page chunks, but one page
chunks. So you would only need 32,768 differences to get every real
transaction to live on its own clog page, which means every look up of a
different real transaction would have to do a page replacement. (I think
your references to disk access here are misleading. Isn't the issue here
the contention on the lock that controls the page replacement, not the
actual IO?)

I've attached a patch that allows you set the guc "JJ_xid",which makes it
burn the given number of xids every time one new one is asked for. (The
patch introduces lots of other stuff as well, but I didn't feel like
ripping the irrelevant parts out--if you don't set any of the other gucs it
introduces from their defaults, they shouldn't cause you trouble.) I think
there are other tools around that do the same thing, but this is the one I
know about. It is easy to drive the system into wrap-around shutdown with
this, so lowering autovacuum_vacuum_cost_delay is a good idea.

Actually I haven't attached it, because then the commitfest app will list
it as the patch needing review, instead I've put it here
https://drive.google.com/file/d/0Bzqrh1SO9FcERV9EUThtT3pacmM/view?usp=sharing

I think reducing to every 100th access for transaction status as disk access
> is sufficient to prove that there is no regression with the patch for the
> screnario
> asked by Andres or do you think it is not?
>
> Now another possibility here could be that we try by commenting out fsync
> in CLOG path to see how much it impact the performance of this test and
> then for pgbench test. I am not sure there will be any impact because even
> every 100th transaction goes to disk access that is still less as compare
> WAL fsync which we have to perform for each transaction.
>

You mentioned that your clog is not on ssd, but surely at this scale of
hardware, the hdd the clog is on has a bbu in front of it, no?

But I thought Andres' concern was not about fsync, but about the fact that
the SLRU does linear scans (repeatedly) of the buffers while holding the
control lock? At some point, scanning more and more buffers under the lock
is going to cause more contention than scanning fewer buffers and just
evicting a page will.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2015-10-05 03:20:32 Re: Confusing remark about UPSERT in fdwhandler.sgml
Previous Message Peter Geoghegan 2015-10-05 01:03:48 Re: Less than ideal error reporting in pg_stat_statements