Re: Performance improvements of INSERTs to a partitioned table

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: "Kato, Sho" <kato-sho(at)jp(dot)fujitsu(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Performance improvements of INSERTs to a partitioned table
Date: 2018-11-09 02:19:38
Message-ID: CAKJS1f-7oqb9OEo6z6aQCYqYNUX36f2=XESGv1GmHZGWi1+grA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7 November 2018 at 21:31, Kato, Sho <kato-sho(at)jp(dot)fujitsu(dot)com> wrote:
> AFAIK, When CREATE INDEX on a partition and INSERT to a parent table are
> executed at the same time, this patch causes deadlock.
>
> * partitions information
>
> Partition key: RANGE (a)
> Partitions: a_1 FOR VALUES FROM (1) TO (100),
> a_2 FOR VALUES FROM (100) TO (200)
>
> T1: create index a_1_ix on a_1(a);
> T2: insert into a values(101),(1); locking a_2 and waiting releasing a_1’s
> lock
> T1: create index a_2_ix on a_2(a); ERROR: deadlock detected
>
> I think this situation does not mean unsafe because similar situation will
> occurs at no partitioned tables and DBMS could not prevent this situation.
>
> But, I'm not sure this behavior is correct.

That's one case that will deadlock with the 0002 patch in [1]. Another
command that takes a conflicting lock on the partition only is
TRUNCATE. Many other commands that normally take an AEL can't be run
on a partition, for example, ALTER TABLE ADD COLUMN.

The same would have deadlocked on master if you'd created the indexes
in the reverse order, but I guess the point is that today there's some
defined order that you can create the indexes in that won't cause a
deadlock, but with [1] there is no defined order since the tables are
locked in order that tuples are routed to them.

Robert pointed out something interesting in [2] about UPDATEs of a
partition key perhaps routing a tuple to a partition that's been
excluded by constraint exclusion, so the lock is not taken initially,
but would be taken later in ExecSetupPartitionTupleRouting(). I ran
through that scenario today and discovered it can't happen as excluded
partitions are still in the rangetable, so are still locked either in
the planner or by AcquireExecutorLocks() and the order those are
locked in is well defined from the planner. However, I believe that's
something Amit plans to change in [3], where he proposes to only lock
partitions that survive partition pruning (among many other things).

There are probably a few things we could do here:

a. Have all DDL on partitions obtain the same lock level on all
ancestor partitioned tables taking a lock on the root first and
working down to the leaf.
b. Just lock what we touch and increase the likelihood of deadlocks occurring.
c. Reject [1] and [3] because we don't want to increase the chances of
deadlocks.

I'm highly against doing 'c' since both [1] and [3] are very useful
patches which scale partitioning to work well with very large numbers
of partitions. At the moment we just can bearly do half a dozen
partitions before the performance starts going south. 'a' is not that
ideal a solution either as it means that anyone doing CREATE INDEX or
TRUNCATE on a partition would conflict with queries that are querying
an unrelated part of the partitioned table. While I don't really like
'b' either, I wonder how many people are going to hit deadlocks here.
To hit that, I believe that you'd have to be doing the TRUNCATE or
CREATE INDEX inside a transaction and to hit it where you couldn't hit
it before you'd have had to carefully written your CREATE INDEX /
TRUNCATE statements to ensure they're executed in partition order. I'm
unsure how likely that is, but I certainly can't rule out that doing
'b' won't upset anyone.

Perhaps a GUC is needed to choose between 'a' and 'b'?

I'm adding Amit to this email because [3] is his and Robert because
he's recently been talking about partition locking too.

[1] https://www.postgresql.org/message-id/CAKJS1f-vYAHqqaU878Q4cdZYHwPcv1J_C-mG%3DCs2UwhsD6cqwg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CA%2BTgmoZGJsy-nRFnzurhZQJtHdDh5fzJKvbvhS0byN6_46pB9Q%40mail.gmail.com
[3] https://commitfest.postgresql.org/20/1778/

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-11-09 02:47:31 Removal of unnecessary CommandCounterIncrement() when doing ON COMMIT DELETE ROWS
Previous Message Thomas Munro 2018-11-09 02:04:39 Re: Copy data to DSA area