Re: WAL logging problem in 9.4.3?

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: daniel(at)yesql(dot)se
Cc: michael(dot)paquier(at)gmail(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, alvherre(at)2ndquadrant(dot)com, david(at)pgmasters(dot)net, hlinnaka(at)iki(dot)fi, simon(at)2ndquadrant(dot)com, andres(at)anarazel(dot)de, masao(dot)fujii(at)gmail(dot)com, kleptog(at)svana(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WAL logging problem in 9.4.3?
Date: 2017-09-12 04:14:41
Message-ID: 20170912.131441.20602611.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hello,

At Fri, 08 Sep 2017 16:30:01 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170908(dot)163001(dot)53230385(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > >> 2017-04-13 12:11:27.065 JST [85441] t/102_vacuumdb_stages.pl
> > >> STATEMENT: ANALYZE;
> > >> 2017-04-13 12:12:25.766 JST [85492] LOG: BufferNeedsWAL: pendingSyncs
> > >> = 0x0, no_pending_sync = 0
> > >>
> > >> - lsn = XLogInsert(RM_SMGR_ID,
> > >> - XLOG_SMGR_TRUNCATE | XLR_SPECIAL_REL_UPDATE);
> > >> + rel->no_pending_sync= false;
> > >> + rel->pending_sync = pending;
> > >> + }
> > >>
> > >> It seems to me that those flags and the pending_sync data should be
> > >> kept in the context of backend process and not be part of the Relation
> > >> data...
> > >
> > > I understand that the context of "backend process" means
> > > storage.c local. I don't mind the context on which the data is,
> > > but I found only there that can get rid of frequent hash
> > > searching. For pending deletions, just appending to a list is
> > > enough and costs almost nothing, on the other hand pendig syncs
> > > are required to be referenced, sometimes very frequently.
> > >
> > >> +void
> > >> +RecordPendingSync(Relation rel)
> > >> I don't think that I agree that this should be part of relcache.c. The
> > >> syncs are tracked should be tracked out of the relation context.
> > >
> > > Yeah.. It's in storage.c in the latest patch. (Sorry for the
> > > duplicate name). I think it is a kind of bond between smgr and
> > > relation.
> > >
> > >> Seeing how invasive this change is, I would also advocate for this
> > >> patch as only being a HEAD-only change, not many people are
> > >> complaining about this optimization of TRUNCATE missing when wal_level
> > >> = minimal, and this needs a very careful review.
> > >
> > > Agreed.
> > >
> > >> Should I code something? Or Horiguchi-san, would you take care of it?
> > >> The previous crash I saw has been taken care of, but it's been really
> > >> some time since I looked at this patch...
> > >
> > > My point is hash-search on every tuple insertion should be evaded
> > > even if it happens rearely. Once it was a bit apart from your
> > > original patch, but in the latest patch the significant part
> > > (pending-sync hash) is revived from the original one.
> >
> > This patch has followed along since CF 2016-03, do we think we can reach a
> > conclusion in this CF? It was marked as "Waiting on Author”, based on
> > developments since in this thread, I’ve changed it back to “Needs Review”
> > again.
>
> I manged to reload its context into my head. It doesn't apply on
> the current master and needs some amendment. I'm going to work on
> this.

Rebased and slightly modified.

Michael's latest patch on which this patch is piggybacking seems
works perfectly. The motive of my addition is avoiding frequent
(I think specifically per tuple modification) hash accessing
occurs while pending-syncs exist. The hash contains at least 6 or
more entries.

The attached patch emits more log messages that will be removed
in the final shape to see how much the addition reduces the hash
access. As a basis of determining the worthiness of the
additional mechanism, I'll show an example of a set of queries
below.

In the log messages, "r" is relation oid, "b" is buffer number,
"hash" is the pointer to the backend-global hash table for
pending syncs. "ent" is the entry in the hash belongs to the
relation, "neg" is a flag indicates that the existing pending
sync hash doesn't have an entry for the relation.

=# set log_min_message to debug2;
=# begin;
=# create table test1(a text primary key);
> DEBUG: BufferNeedsWAL(r 2608, b 55): hash = (nil), ent=(nil), neg = 0
# relid=2608 buf=55, hash has not been created

=# insert into test1 values ('inserted row');
> DEBUG: BufferNeedsWAL(r 24807, b 0): hash = (nil), ent=(nil), neg = 0
# relid=24807, fist buffer, hash has not bee created

=# copy test1 from '/<somewhere>/copy_data.txt';
> DEBUG: BufferNeedsWAL(r 24807, b 0): hash = 0x171de00, ent=0x171f390, neg = 0
# hash created, pending sync entry linked, no longer needs hash acess
# (repeats for the number of buffers)
COPY 200

=# create table test3(a text primary key);
> DEBUG: BufferNeedsWAL(r 2608, b 55): hash = 0x171de00, ent=(nil), neg = 1
# no pending sync entry for this relation, no longer needs hash access.

=# insert into test3 (select a from generate_series(0, 99) a);
> DEBUG: BufferNeedsWAL(r 24816, b 0): hash = 0x171de00, ent=(nil), neg = 0
> DEBUG: BufferNeedsWAL: accessing hash : not found
> DEBUG: BufferNeedsWAL(r 24816, b 0): hash = 0x171de00, ent=(nil), neg = 1
# This table no longer needs hash access, (repeats for the number of tuples)

=# truncate test3;
=# insert into test3 (select a from generate_series(0, 99) a);
> DEBUG: BufferNeedsWAL(r 24816, b 0): hash = 0x171de00, ent=(nil), neg = 0
> DEBUG: BufferNeedsWAL: accessing hash : found
> DEBUG: BufferNeedsWAL(r 24816, b 0): hash = 0x171de00, ent=0x171f340, neg = 0
# This table has pending sync but no longer needs hash access,
# (repeats for the number of tuples)

The hash is required in the case of relcache invalidation. When
ent=(nil) and neg = 0 but hash != (nil), it tries hash search and
restores the previous state.

This mechanism avoids most of the hash accesses by replacing into
just following a pointer. On the other hand, the hash access
occurs only after relation truncate in the current
transaction. In other words, this won't be in effect unless any
of table truncation, copy, create as, alter table or refresing
matview occurs.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
fix-wal-level-minimal-michael-horiguchi-3.patch text/x-patch 33.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2017-09-12 04:40:00 Re: [Proposal] Allow users to specify multiple tables in VACUUM commands
Previous Message Thomas Munro 2017-09-12 03:58:52 Re: More flexible LDAP auth search filters?