Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-12-09 10:00:37
Message-ID: CAA4eK1KjLP7UXY4yo3Eg5S1SnH8UAK57TV7auPu3-H9_FXqFzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 9, 2020 at 2:56 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
>
> Further testing showed it was a file location problem, not a deletion problem.
> The worker tried to open
> base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset/16393-510.changes.0, but these
> were the files actually existing:
>
> [nm(at)power-aix 0:2 2020-12-08T13:56:35 64gcc 0]$ ls -la $(find src/test/subscription/tmp_check -name '*sharedfileset*')
> src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.0.sharedfileset:
> total 408
> drwx------ 2 nm usr 256 Dec 08 03:20 .
> drwx------ 4 nm usr 256 Dec 08 03:20 ..
> -rw------- 1 nm usr 207806 Dec 08 03:20 16393-510.changes.0
>
> src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset:
> total 0
> drwx------ 2 nm usr 256 Dec 08 03:20 .
> drwx------ 4 nm usr 256 Dec 08 03:20 ..
> -rw------- 1 nm usr 0 Dec 08 03:20 16393-511.changes.0
>
> > > I have executed "make check" in the loop with only this file. I have
> > > repeated it 5000 times but no failure, I am wondering shall we try to
> > > execute in the same machine in a loop where it failed once?
> >
> > Yes, that might help. Noah, would it be possible for you to try that
>
> The problem is xidhash using strcmp() to compare keys; it needs memcmp(). For
> this to matter, xidhash must contain more than one element. Existing tests
> rarely exercise the multi-element scenario. Under heavy load, on this system,
> the test publisher can have two active transactions at once, in which case it
> does exercise multi-element xidhash. (The publisher is sensitive to timing,
> but the subscriber is not; once WAL contains interleaved records of two XIDs,
> the subscriber fails every time.) This would be much harder to reproduce on a
> little-endian system, where strcmp(&xid, &xid_plus_one)!=0. On big-endian,
> every small XID has zero in the first octet; they all look like empty strings.
>

Your analysis is correct.

> The attached patch has the one-line fix and some test suite changes that make
> this reproduce frequently on any big-endian system. I'm currently planning to
> drop the test suite changes from the commit, but I could keep them if folks
> like them. (They'd need more comments and timeout handling.)
>

I think it is better to keep this test which can always test multiple
streams on the subscriber.

Thanks for working on this.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-12-09 10:34:47 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Andrey Borodin 2020-12-09 09:47:31 Re: Yet another fast GiST build