Re: Improving connection scalability: GetSnapshotData()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Bruce Momjian <bruce(at)momjian(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Improving connection scalability: GetSnapshotData()
Date: 2020-08-16 18:30:24
Message-ID: 1006917.1597602624@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> 690 successful runs later, it didn't trigger for me :(. Seems pretty
> clear that there's another variable than pure chance, otherwise it seems
> like that number of runs should have hit the issue, given the number of
> bf hits vs bf runs.

It seems entirely likely that there's a timing component in this, for
instance autovacuum coming along at just the right time. It's not too
surprising that some machines would be more prone to show that than
others. (Note peripatus is FreeBSD, which we've already learned has
significantly different kernel scheduler behavior than Linux.)

> My current plan would is to push a bit of additional instrumentation to
> help narrow down the issue.

+1

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-08-16 19:00:12 Re: Improving connection scalability: GetSnapshotData()
Previous Message Andres Freund 2020-08-16 18:16:04 Re: Improving connection scalability: GetSnapshotData()