Re: BUG #15967: Sequence generation using NEXTVAL() fails on 64bit systems

From: Christoph Ziegenberg <ziegenberg(at)web(dot)de>
To: Christoph Ziegenberg <ziegenberg(at)web(dot)de>,Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>,pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #15967: Sequence generation using NEXTVAL() fails on 64bit systems
Date: 2019-08-21 13:27:15
Message-ID: D5DC05CF-A23A-4740-B01E-2CE6C2F38F18@web.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Good news: We finally found the error.

It was a trigger on one table that updated the sequence using setval()... Of course this cannot work and I don't know why this was added (perhaps for a migration script?).

This use of setval() via the trigger was missing in the Postgres statement log (perhaps because it was a PERFORM statement?), so we didn't find it earlier.

Sorry and thanks for your help to all.

Christoph

Am 21. August 2019 07:11:16 MESZ schrieb Christoph Ziegenberg <ziegenberg(at)web(dot)de>:
>Good idea with the replacement of the nextval function. This confirms,
>that the problem is not an application or PHP problem.
>
>Here an extract from the logs, showing the error:
>
>---
>
>2019-08-21 06:57:38.812 CEST [5824] WARNUNG:  Got value 2799 from
>public.e2_ws_log_login_seq
>2019-08-21 06:57:38.812 CEST [5824] ZUSAMMENHANG:  PL/pgSQL-Funktion
>log_nextval(text) Zeile 6 bei RAISE
>2019-08-21 06:57:38.872 CEST [16236] WARNUNG:  Got value 2800 from
>public.e2_ws_log_login_seq
>2019-08-21 06:57:38.872 CEST [16236] ZUSAMMENHANG:  PL/pgSQL-Funktion
>log_nextval(text) Zeile 6 bei RAISE
>2019-08-21 06:57:38.873 CEST [1436] WARNUNG:  Got value 2801 from
>public.e2_ws_log_login_seq
>2019-08-21 06:57:38.873 CEST [1436] ZUSAMMENHANG:  PL/pgSQL-Funktion
>log_nextval(text) Zeile 6 bei RAISE
>2019-08-21 06:57:38.910 CEST [12480] WARNUNG:  Got value 2801 from
>public.e2_ws_log_login_seq
>2019-08-21 06:57:38.910 CEST [12480] ZUSAMMENHANG:  PL/pgSQL-Funktion
>log_nextval(text) Zeile 6 bei RAISE
>2019-08-21 06:57:38.915 CEST [12480] FEHLER:  doppelter Schlüsselwert
>verletzt Unique-Constraint »p_e2_ws_log_login«
>2019-08-21 06:57:38.915 CEST [12480] DETAIL:  Schlüssel
>»(log_login_id)=(2801)« existiert bereits.
>2019-08-21 06:57:38.915 CEST [12480] ANWEISUNG:  INSERT INTO
>e2_ws_log_login (log_login_id, client_id, log_action, log_datetime,
>log_ip_address, log_data) VALUES ($1, $2, $3, $4, $5, $6)
>2019-08-21 06:57:39.031 CEST [10672] WARNUNG:  Got value 2802 from
>public.e2_ws_log_login_seq
>2019-08-21 06:57:39.031 CEST [10672] ZUSAMMENHANG:  PL/pgSQL-Funktion
>log_nextval(text) Zeile 6 bei RAISE
>
>---
>
>BTW: I had to change the function a little bit:
>
>CREATE OR REPLACE FUNCTION log_nextval(_Sequence TEXT) RETURNS BIGINT
>AS
>$$
>DECLARE
>    s BIGINT;
>BEGIN
>  s := nextval(_Sequence);
>  RAISE WARNING 'Got value % from %', s, _Sequence;
>  RETURN s;
>END;
>$$ LANGUAGE 'plpgsql';
>
>
>Christoph
>
>
>Am 20. August 2019 18:06:42 MESZ schrieb Merlin Moncure
><mmoncure(at)gmail(dot)com>:
>>On Tue, Aug 20, 2019 at 9:56 AM Christoph Ziegenberg
>><ziegenberg(at)web(dot)de> wrote:
>>> New info:
>>> We could also reproduce it with Postgres 10.10 32bit on Windows
>>Server 2016 and 9.6.8 64bit on Ubuntu (18.x). So also the 32bit
>version
>>is affected, as well as 9.x versions.
>>
>>Ok, that makes sense; if we don't have arch specific reproduction I'd
>>say that lowers the likelihood of having a serious problem within
>>postgres itself (just a hunch).
>>
>>> Current test state:
>>> We (2 of my colleagues and me) are currently trying to create an
>>independent test, but at the moment it only occurrs within the
>specific
>>application.
>>
>>This of course is key. Most people reading this list are going to be
>>suspicious of your application rather than postgres internals causing
>>the issue. Having said that, let's see if we can isolate the problem.
>>
>>> Of course we search for an error in the application first, checked
>>the connection settings, tracked all statements... there is nothing
>>special, only a handful of SELECTS and UPDATES, then the sequence
>>generation and the (sometimes failing) INSERT, and another INSERT (no
>>sequence generate here) and UPDATE following. In all cases we tested
>>with the Postgres default configuration.
>>
>>> We could see that wrapping the selection of NEXTVAL() in a
>>transaction (which shouldn't have any effect?) reduced the problem in
>>one case, but it was still possible to reproduce the error with an
>>increased number of requests.
>>
>>Being in a transaction makes shouldn't make any difference with
>>regards to behavior.
>>
>>> At the moment we don't have a real idea how to proceed. Next step
>>planned is to simulate all of the aforementioned requests done by the
>>application, because they seem to influence the behavior.
>>
>>Isolating in a separate test would be proof. Sans that, we need to be
>>suspicious of your test environment. Hm, how about this: perhaps we
>>can tease the problem out with logging. One possible trick is to do
>>this:
>>CREATE OR REPLACE FUNCTION log_nextval(_Sequence TEXT, s OUT) RETURNS
>>BIGINT AS
>>$$
>>BEGIN
>> s := nextval(_Sequence);
>> RAISE WARNING 'Got value % from %', s, _Sequence;
>> RETURN s;
>>END;
>>$$ LANGUAGE PLGSQL;
>>
>>Then, as an experiment, you can move all your nextval() generation to
>>this function (note: there will be significant performance hit) and
>>attempt to reproduce the problem in your application. If when you
>>did, we could then interrogate the database log to see if the same
>>value >1 times, this would be a smoking gun. If you did not see the
>>value returned >1 times, maybe turn on full statement logging (also a
>>big performance hit) and see where your application might be jacking
>>things up.
>>
>>> Background:
>>> The application is written in PHP (different versions used in the
>>tests, 7.2 - 7.4 I guess), runs on Apache/IIS (FastCGI) and uses the
>>PDO extension for the communication with Postgres.
>>
>>merlin

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Merlin Moncure 2019-08-21 13:39:12 Re: BUG #15967: Sequence generation using NEXTVAL() fails on 64bit systems
Previous Message Keith Fiske 2019-08-21 13:24:38 Re: BUG #15954: Unable to alter partitioned table to set logged