Re: pg_advisory_lock problem

From: Rémi Cura <remi(dot)cura(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, John R Pierce <pierce(at)hogranch(dot)com>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_advisory_lock problem
Date: 2014-08-20 08:29:32
Message-ID: CAJvUf_vr909Urm-LwhQyRY7VJX7jU=r98C87dvTJGQHnq8jtNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hey,
just a quick follow-up for archive:

problem solved with pg_try_advisory...
using lock (ROW EXCLUSIVE) does'nt work (same duplicate key value)
using lock (ACCESS EXCLUSIVE) work (no error) but stop the parallelism (one
thread as to wait for another to go on).

The optimal solution would be to slice the big CTE into several successive
temp table creation, using a procedural language allowing control of
transaction.
This would allow to control visibility and use the powerful MVCC features.
(I'm unwilling to do that).

Cheers everybody and many thanks for the help,
Rémi-C

2014-08-12 14:45 GMT+02:00 Merlin Moncure <mmoncure(at)gmail(dot)com>:

> On Tue, Aug 12, 2014 at 5:07 AM, Rémi Cura <remi(dot)cura(at)gmail(dot)com> wrote:
> >
> >
> >
> > 2014-08-11 22:48 GMT+02:00 Kevin Grittner <kgrittn(at)ymail(dot)com>:
> >
> >> Rémi Cura <remi(dot)cura(at)gmail(dot)com> wrote:
> >>
> >> > as you (both?) suggested it works using advisory lock used at the
> >> > beginning and end of the transaction. This way there is no upsert
> >> > at all if the element is locked? (I used general advisory lockbut
> >> > in the same way as transactionnal lock)
> >>
> >> This is too vague to comment on.
> >>
> >> > The issue in this case is simple : I have to use about 100k
> >> > advisory locks, which is a big memory requirement for my hardware
> >> > :-(
> >>
> >> ... and that doesn't seem to make any sense. Either you are not
> >> understanding advisory locks or you are doing something very, very
> >> unusual.
> >>
> >> > Merlin I'm afraid I don't understand what is "vanilla" LOCK
> >> > TABLE.
> >>
> >> See the LOCK TABLE command.
> >>
> >> http://www.postgresql.org/docs/current/interactive/sql-lock.html
> >>
> >>
> http://www.postgresql.org/docs/current/interactive/explicit-locking.html
> >>
> >> > I can't really use a lock table because each query upsert
> >> > sequentially into 3 tables, doing lots of computing between.
> >>
> >> Now *that* I understand. :-) It's not an unusual requirement,
> >> but can be a challenge when using snapshot isolation (where writes
> >> don't block reads and reads don't block anything). There are two
> >> main approaches -- introduce blocking to serialize some of the
> >> operations, or use the SERIALIZABLE transaction isolation level to
> >> ensure that the behavior of all concurrent transactions is
> >> consistent with the behavior you would see if they were run one at
> >> a time. The latter approach doesn't introduce any new blocking,
> >> but it can cause transactions to get an ERROR with a SQLSTATE of
> >> 40001 at just about any point, so you need to be prepared to
> >> recognize that and retry those transactions from the beginning (not
> >> just the last statement of the transaction), ignoring any data read
> >> during the failed attempt.
> >>
> >> You may want to read the entire chapter on concurrency control:
> >>
> >> http://www.postgresql.org/docs/current/interactive/mvcc.html
> >>
> >> If you are considering using SERIALIZABLE transactions, you should
> >> probably review the examples in the Wiki, to get an idea of how it
> >> behaves in various cases:
> >>
> >> http://wiki.postgresql.org/wiki/SSI
> >>
> >> > I use parallel query to compute faster (load dividing). I guess
> >> > it would be very slow with about 8 parallel queries with locks.
> >>
> >> Well, if you introduce blocking you reduce your parallelism, but if
> >> you use serializable transactions and there are actually a lot of
> >> conflicts you can see poor performance because of the errors
> >> rolling back transactions and the need to retry them from the
> >> start. The techniques used to implement serializable transactions
> >> in PostgreSQL are basically a refinement of the Optimistic
> >> Concurrency Control (OCC) techniques, but generally with far fewer
> >> retries needed -- the point being that it optimistically assumes
> >> that there will not be a conflict so that concurrency is better,
> >> but has to cancel things if that optimism proves to be unfounded.
> >>
> >> To make related to changes to multiple tables and maintain coherent
> >> data, you probably will need to do one or the other.
> >>
> >> --
> >> Kevin Grittner
> >> EDB: http://www.enterprisedb.com
> >> The Enterprise PostgreSQL Company
> >
> >
> >
> > Hey, thanks for your detailed answer.
> >
> > The particularity here is that I use a big query with CTE instead of a
> more
> > procedural way.
> > I do sophisticated geometric computing using postGIS. I guess it is a
> hack
> > of both postgres and postgis.
> >
> > I explain better the pg_advisory locks uses I have tried.
> >
> > First classic use of pg_advisory, not working :
> > CTE_1 (find what rows will be upserted in table_1)
> > CTE_2 (find what rows will be upserted in table_2)
> > CTE_3 (find what rows will be upserted in table_3)
> > CTE_4 (compute the result to be upserted into table_1)
> > CTE_5 (upsert into table_1 using custom upsert_function)
> > CTE_6 (compute the result to be upserted into table_2)
> > CTE_7 (upsert into table_2 using custom upsert_function)
> > CTE_8 (compute the result to be upserted into table_2)
> > CTE_9 (upsert into table_2 using custom upsert_function)
> > CTE_10 (end of query)
> > each of the upserting function is plpgsql and do something like
> > pg_advisory_lock(table_number, id of row to be upserted)
> > with updating AS (update table)
> > insert into table if not updated
> > pg_advisory_unlock(table_number,id of row to be upserted)
> >
> > According to what the list said, it doesn't work because of visibility
> > issues : the locking work, so we know each processes will upsert the same
> > thing sequentially. However it will fail because each process has no
> > visibility on the insert done by the others. So when the second process
> > upsert the same thing, it will try to insert , and we get 2 inserts for
> the
> > same row
> >
> > Second non classic use of pg_adivsory, working but too taxing on
> resources
> > CTE_1 (find what rows will be upserted in table_1,
> > pg_try_advisory_lock(1,id1) )
> > CTE_2 (find what rows will be upserted in table_2,
> > pg_try_advisory_lock(2,id2) )
> > CTE_3 (find what rows will be upserted in table_3,
> > pg_try_advisory_lock(3,id3) )
> > CTE_4 (compute the result to be upserted into table_1)
> > CTE_5 (upsert into table_1 using custom upsert_function, only if
> > pg_try_advisory_lock was true )
> > CTE_6 (compute the result to be upserted into table_2)
> > CTE_7 (upsert into table_2 using custom upsert_function, only if
> > pg_try_advisory_lock was true)
> > CTE_8 (compute the result to be upserted into table_2)
> > CTE_9 (upsert into table_2 using custom upsert_function, only if
> > pg_try_advisory_lock was true)
> > CTE_10 (release locks for (1,id1) , (2,id2) , (3,id3)) )
> > CTE_10 (end of query)
> >
> > This works because we don't even try to upsert a row that is already
> being
> > taken care by another process. So there is no visibility issue;
> > The problem is with ressources, I have to use 100k locks for several
> > minutes. It grezatly increase memory use.
> >
> > The difference between the 2 is essentially : in first case we lock each
> > individual upsert statement. In second case we lock at the query level.
> >
> > On the bright side, I don't need to wait for the insert in table_1 to
> > compute table_2, same for table_2 and table_3.
> > However I can't control when the CTE upserting into each table is
> executed
> > (as far as I know, there is no guarantee of execution order when using
> > multiple cte).
> >
> > Now about the lock of tables: I guess it wouldn't work for the same
> reasons
> > at the individual upsert statement.
> > If I use it at the whole query level, other parallel process will be
> waiting
> > for the whole query to finish before being allowed to go on?
> > Moreover, whatever LOCK I use will be within the same big transaction, so
> > the only solution may be :
> >
> > lock table_1,table_2,table_3;
> > execute big querry;
> > unlock table_1, table_2, table_3;
> >
> > I didn't find any "vanilla" lock. Is "vanilla" some kind of postgres
> slang?
>
> 'vanilla' is common english slang for 'regular'. Basically I was
> referring to the built in lock table statement. LOCK has no
> corresponding 'UNLOCK'; you lock a table or tables then locks are
> released for you at the end of transaction. What you are trying to
> do, but isn't really possible, is to 'temporarily' lock an object
> inside a transaction. Transactions group related work and must be
> handled as one unit.
>
> The default lock of LOCK TABLE is very heavy but you can pass an
> optional lockmode if you want to allow readers. Definitely read this
> chapter: http://www.postgresql.org/docs/9.3/static/explicit-locking.html.
> The point I was making is that using advisory locks to block access to
> a table usually isn't a good idea; just use the built in locks.
>
> merlin
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Lori Corbani 2014-08-20 13:25:12 create function : change name of input parameter
Previous Message Michael Shepanski 2014-08-20 06:25:04 Re: New wrapper library: QUINCE