From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
Subject: | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |
Date: | 2020-12-09 10:00:37 |
Message-ID: | CAA4eK1KjLP7UXY4yo3Eg5S1SnH8UAK57TV7auPu3-H9_FXqFzg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Dec 9, 2020 at 2:56 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
>
> Further testing showed it was a file location problem, not a deletion problem.
> The worker tried to open
> base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset/16393-510.changes.0, but these
> were the files actually existing:
>
> [nm(at)power-aix 0:2 2020-12-08T13:56:35 64gcc 0]$ ls -la $(find src/test/subscription/tmp_check -name '*sharedfileset*')
> src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.0.sharedfileset:
> total 408
> drwx------ 2 nm usr 256 Dec 08 03:20 .
> drwx------ 4 nm usr 256 Dec 08 03:20 ..
> -rw------- 1 nm usr 207806 Dec 08 03:20 16393-510.changes.0
>
> src/test/subscription/tmp_check/t_015_stream_subscriber_data/pgdata/base/pgsql_tmp/pgsql_tmp9896408.1.sharedfileset:
> total 0
> drwx------ 2 nm usr 256 Dec 08 03:20 .
> drwx------ 4 nm usr 256 Dec 08 03:20 ..
> -rw------- 1 nm usr 0 Dec 08 03:20 16393-511.changes.0
>
> > > I have executed "make check" in the loop with only this file. I have
> > > repeated it 5000 times but no failure, I am wondering shall we try to
> > > execute in the same machine in a loop where it failed once?
> >
> > Yes, that might help. Noah, would it be possible for you to try that
>
> The problem is xidhash using strcmp() to compare keys; it needs memcmp(). For
> this to matter, xidhash must contain more than one element. Existing tests
> rarely exercise the multi-element scenario. Under heavy load, on this system,
> the test publisher can have two active transactions at once, in which case it
> does exercise multi-element xidhash. (The publisher is sensitive to timing,
> but the subscriber is not; once WAL contains interleaved records of two XIDs,
> the subscriber fails every time.) This would be much harder to reproduce on a
> little-endian system, where strcmp(&xid, &xid_plus_one)!=0. On big-endian,
> every small XID has zero in the first octet; they all look like empty strings.
>
Your analysis is correct.
> The attached patch has the one-line fix and some test suite changes that make
> this reproduce frequently on any big-endian system. I'm currently planning to
> drop the test suite changes from the commit, but I could keep them if folks
> like them. (They'd need more comments and timeout handling.)
>
I think it is better to keep this test which can always test multiple
streams on the subscriber.
Thanks for working on this.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2020-12-09 10:34:47 | Re: Parallel INSERT (INTO ... SELECT ...) |
Previous Message | Andrey Borodin | 2020-12-09 09:47:31 | Re: Yet another fast GiST build |