Re: pg_advisory_locks in a multithreaded application context

From: Vincent Ficet <jean-vincent(dot)ficet(at)bull(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: "pgsql-novice(at)postgresql(dot)org" <pgsql-novice(at)postgresql(dot)org>
Subject: Re: pg_advisory_locks in a multithreaded application context
Date: 2011-07-05 07:13:18
Message-ID: 4E12B98E.6000504@bull.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Hello Merlin,

Merlin Moncure wrote:
> On Thu, Jun 30, 2011 at 8:49 AM, Vincent Ficet
> <jean-vincent(dot)ficet(at)bull(dot)net> wrote:
>
>> Merlin Moncure wrote:
>>
>>> On Thu, Jun 30, 2011 at 3:30 AM, Vincent Ficet
>>> <jean-vincent(dot)ficet(at)bull(dot)net> wrote:
>>>
>>>
>>>> Merlin Moncure wrote:
>>>>
>>>>
>>>>> On Wed, Jun 29, 2011 at 10:11 AM, Vincent Ficet
>>>>> <jean-vincent(dot)ficet(at)bull(dot)net> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm having some trouble trying to use postgresql locks. After trying
>>>>>> several options and reading the postgresql online documentation, I still
>>>>>> can't figure out what I'm doing wrong. Here's the use case:
>>>>>>
>>>>>> A multithreaded application collecting adapter firmwares on a network
>>>>>> loads data into the following table:
>>>>>>
>>>>>> CREATE TABLE firmware (
>>>>>> id SERIAL NOT NULL,
>>>>>> type CHARACTER VARYING(32),
>>>>>> version CHARACTER VARYING(30),
>>>>>> build_id INTEGER,
>>>>>> date CHARACTER VARYING(25),
>>>>>> ps_id CHARACTER VARYING(25)
>>>>>> );
>>>>>>
>>>>>> Typically, there are a few hundred adapters, but only 5 firmwares (many
>>>>>> adapters should have the same firmware if the sysadmins did their jobs
>>>>>> properly ;-) ).
>>>>>>
>>>>>> Only a single entry is required per firmware (many separate adapters can
>>>>>> share the same firmware by pointing to the appropriate firmware id field).
>>>>>> To make sure that only one entry is created per firmware, I use the
>>>>>> following trigger:
>>>>>>
>>>>>> CREATE TRIGGER firmware_pre_insert_trigger
>>>>>> BEFORE INSERT ON firmware
>>>>>> FOR EACH ROW
>>>>>> EXECUTE PROCEDURE firmware_pre_insert_trigger_cb();
>>>>>>
>>>>>> CREATE FUNCTION firmware_pre_insert_trigger_cb() RETURNS TRIGGER
>>>>>> AS $_$
>>>>>> DECLARE
>>>>>> fw_id INT;
>>>>>> BEGIN
>>>>>>
>>>>>> SELECT fw.id FROM firmware fw INTO fw_id
>>>>>> WHERE (fw.type = new.type AND
>>>>>> fw.version = new.version AND
>>>>>> fw.build_id = new.build_id AND
>>>>>> fw.date = new.date AND
>>>>>> fw.ps_id = new.ps_id);
>>>>>>
>>>>>> IF fw_id IS NULL THEN
>>>>>> -- create the non-existing firmware
>>>>>> RETURN new;
>>>>>> ELSE
>>>>>> -- skip firmware which already exists
>>>>>> RETURN NULL;
>>>>>> END IF;
>>>>>>
>>>>>> END;
>>>>>> $_$
>>>>>> LANGUAGE PLPGSQL;
>>>>>>
>>>>>> When a thread wishes to add a firmware after discovering one adapter, it
>>>>>> executes the following code:
>>>>>>
>>>>>> PERFORM pg_advisory_lock(1);
>>>>>>
>>>>>> INSERT INTO firmware (type, version, build_id, date, ps_id)
>>>>>> VALUES (chip_type, firm_version, firm_build_id, firm_date, firm_ps_id);
>>>>>>
>>>>>> PERFORM pg_advisory_unlock(1);
>>>>>>
>>>>>>
>>>>>>
>>>>> Advisory lock is not going to work here. You are releasing the lock
>>>>> before the transaction resolves and that leaves a window for second
>>>>> transaction to do the 'select' and not see the data because it hasn't
>>>>> committed yet.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Unfortunately, I still get duplicated entries using advisory locks, and
>>>>>> they don't seem to lock anything at all...
>>>>>>
>>>>>> On the other hand, If I use builtin locks as follows without the
>>>>>> trigger, I get deadlocks in the server logs:
>>>>>>
>>>>>> LOCK TABLE firmware IN SHARE MODE;
>>>>>>
>>>>>>
>>>>>>
>>>>> well a sharelock certainly isn't going to work. share blocks row
>>>>> exclusive (see http://www.postgresql.org/docs/current/static/explicit-locking.html#LOCKING-TABLES),
>>>>> so two transactions can simultaneously get a share lock and wait for
>>>>> each other to to resolve to get the exclusive lock on a row.
>>>>> 'EXCLUSIVE' would be better (although that would effectively serialize
>>>>> the transactions).
>>>>>
>>>>>
>>>>>
>>>>>
>>>> Thanks for the tip.
>>>>
>>>> It now works fine using a SHARE UPDATE EXCLUSIVE lock in the PRE INSERT
>>>> trigger. This does not conflict with the ROW EXCLUSIVE lock which is
>>>> implicitely taken by the INSERT statement in add_firmware() function.
>>>>
>>>> BTW I think there might be a thread safety issue in postgres, as I often
>>>> get a segfault when deadlocks occur (prior to applying the fix I just
>>>> described). For example, the following deadlock situation:
>>>>
>>>> DETAIL: Process 7643 waits for RowExclusiveLock on relation 21060
>>>> of database 20535; blocked by process 7593.
>>>> Process 7593 waits for RowExclusiveLock on relation 21060 of
>>>> database 20535; blocked by process 7643.
>>>> Process 7643: SELECT
>>>> add_firmware('0x08003800013731aa','hca','512.1792.0',0,'0920-10-06','BL_0010030001000');
>>>> Process 7593: SELECT
>>>> add_firmware('0x08003800013734b0','hca','512.1792.0',0,'0920-10-06','BL_0010030001000');
>>>> HINT: See server log for query details.
>>>> QUERY: INSERT INTO firmware (type, version, build_id, date, ps_id)
>>>> VALUES (chip_type, firm_version, firm_build_id,
>>>> firm_date, firm_ps_id)
>>>> CONTEXT: PL/pgSQL function "add_firmware" line 31 at SQL statement
>>>>
>>>> Triggers:
>>>>
>>>> #0 0x0000003c46725742 in __strncpy_ssse3 () from /lib64/libc.so.6
>>>> #1 0x00007ffff5f1617f in pqParseInput3 (conn=0x7ffff0000da0) at
>>>> fe-protocol3.c:209
>>>> #2 0x00007ffff5f0cae7 in parseInput (conn=0x7ffff0000da0) at fe-exec.c:1493
>>>> #3 0x00007ffff5f0cc01 in PQgetResult (conn=0x7ffff0000da0) at
>>>> fe-exec.c:1568
>>>> #4 0x00007ffff5f0d26b in PQexecFinish (conn=0x7ffff0000da0) at
>>>> fe-exec.c:1807
>>>> #5 0x00007ffff5f0cee4 in PQexec (conn=0x7ffff0000da0,
>>>> query=0x7ffff612d340 "SELECT 1") at fe-exec.c:1648
>>>> #6 0x00007ffff612c344 in dbd_ping () from
>>>> /home/vficet/X86_64/usr/lib/dbd/libdbdpgsql.so
>>>>
>>>> Stack traces often occur in different flavours, but can always be
>>>> correlated with messages such as:
>>>>
>>>> DBI error -9: unexpected field count in "D" message
>>>> DBI error -9: message contents do not agree with length in message type
>>>> "T" server sent data ("D" message) without prior row description ("T"
>>>> message)
>>>>
>>>> I'll try to set up a minimalist reproducer and see what can be done for
>>>> that.
>>>> Note that this happens with both postgresql 8.4.7 and 9.0.4.
>>>>
>>>>
>>> is the segfault happening in the client?
>>>
>> Yes, it's on the client side.
>>
>>> it looks like you have a
>>> thread safety issue either in dbd or in your code...
>>>
>>>
>> OK, I'll try to set up a minimalist reproducer. Is there a postgresql
>> non regression test to check for thread safety on the client side ? What
>> do the postgresql developpers use for that ?
>>
>
>
> well, you mainly need to make sure libpq is built for thread safety
> (ldd libpq and check for libpthread) and that the code using libpq is
> not accessing the same connection from > 1 thread at one time -- it's
> up to you to synchronize access to a connection among threads.
>
Yes you were right, this bug was caused by several threads accessing the
same connection in some error handling parts of our code, hence the
segfault.

Thanks for your help in this matter,

Cheers,

Vincent
> merlin
>
>

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Vincent Ficet 2011-07-05 11:01:26 Re: Bit Varying vs New Column
Previous Message Jean-Yves F. Barbier 2011-07-04 19:41:25 Re: Bit Varying vs New Column