Re: Adding support for Default partition in partitioning

From: Rahila Syed <rahilasyed90(at)gmail(dot)com>
To: Keith Fiske <keith(at)omniti(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Adding support for Default partition in partitioning
Date: 2017-04-06 11:30:24
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers


Thanks a lot for testing and reporting this. Please find attached an
updated patch with the fix. The patch also contains a fix
regarding operator used at the time of creating expression as default
partition constraint. This was notified offlist by Amit Langote.

Thank you,
Rahila Syed

On Thu, Apr 6, 2017 at 12:21 AM, Keith Fiske <keith(at)omniti(dot)com> wrote:

> On Wed, Apr 5, 2017 at 11:19 AM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
>> On Wed, Apr 5, 2017 at 5:57 AM, Rahila Syed <rahilasyed90(at)gmail(dot)com>
>> wrote:
>> >>Could you briefly elaborate why you think the lack global index support
>> >>would be a problem in this regard?
>> > I think following can happen if we allow rows satisfying the new
>> partition
>> > to lie around in the
>> > default partition until background process moves it.
>> > Consider a scenario where partition key is a primary key and the data
>> in the
>> > default partition is
>> > not yet moved into the newly added partition. If now, new data is added
>> into
>> > the new partition
>> > which also exists(same key) in default partition there will be data
>> > duplication. If now
>> > we scan the partitioned table for that key(from both the default and new
>> > partition as we
>> > have not moved the rows) it will fetch the both rows.
>> > Unless we have global indexes for partitioned tables, there is chance of
>> > data duplication between
>> > child table added after default partition and the default partition.
>> Yes, I think it would be completely crazy to try to migrate the data
>> in the background:
>> - The migration might never complete because of a UNIQUE or CHECK
>> constraint on the partition to which rows are being migrated.
>> - Even if the migration eventually succeeded, such a design abandons
>> all hope of making INSERT .. ON CONFLICT DO NOTHING work sensibly
>> while the migration is in progress, unless the new partition has no
>> UNIQUE constraints.
>> - Partition-wise join and partition-wise aggregate would need to have
>> special case handling for the case of an unfinished migration, as
>> would any user code that accesses partitions directly.
>> - More generally, I think users expect that when a DDL command
>> finishes execution, it's done all of the work that there is to do (or
>> at the very least, that any remaining work has no user-visible
>> consequences, which would not be the case here).
>> IMV, the question of whether we have efficient ways to move data
>> around between partitions is somewhat separate from the question of
>> whether partitions can be defined in a certain way in the first place.
>> The problems that Keith refers to upthread already exist for
>> subpartitioning; you've got to detach the old partition, create a new
>> one, and then reinsert the data. And for partitioning an
>> unpartitioned table: create a replacement table, insert all the data,
>> substitute it for the original table. The fact that we have these
>> limitation is not good, but we're not going to rip out partitioning
>> entirely because we don't have clever ways of migrating the data in
>> those cases, and the proposed behavior here is not any worse.
>> Also, waiting for those problems to get fixed might be waiting for
>> Godot. I'm not really all that sanguine about our chances of coming
>> up with a really nice way of handling these cases. In a designed
>> based on table inheritance, you can leave it murky where certain data
>> is supposed to end up and migrate it on-line and you might get away
>> with that, but a major point of having declarative partitioning at all
>> is to remove that sort of murkiness. It's probably not that hard to
>> come up with a command that locks the parent and moves data around via
>> full table scans, but I'm not sure how far that really gets us; you
>> could do the same thing easily enough with a sequence of commands
>> generated via a script. And being able to do this in a general way
>> without a full table lock looks pretty hard - it doesn't seem
>> fundamentally different from trying to perform a table-rewriting
>> operation like CLUSTER without a full table lock, which we also don't
>> support. The executor is not built to cope with any aspect of the
>> table definition shifting under it, and that includes the set of child
>> tables with are partitions of the table mentioned in the query. Maybe
>> the executor can be taught to survive such definitional changes at
>> least in limited cases, but that's a much different project than
>> allowing default partitions.
>> --
>> Robert Haas
>> EnterpriseDB:
>> The Enterprise PostgreSQL Company
> Confirmed that v5 patch works with examples given in the original post but
> segfaulted when I tried the examples I used in my blog post (taken from the
> documentation at the time I wrote it).
> postgresql-10-built-in-partitioning/
> keith(at)keith=# drop table cities;
> Time: 6.055 ms
> keith(at)keith=# CREATE TABLE cities (
> city_id bigserial not null,
> name text not null,
> population int
> ) PARTITION BY LIST (initcap(name));
> Time: 7.130 ms
> keith(at)keith=# CREATE TABLE cities_west
> PARTITION OF cities (
> CONSTRAINT city_id_nonzero CHECK (city_id != 0)
> ) FOR VALUES IN ('Los Angeles', 'San Francisco');
> Time: 6.690 ms
> keith(at)keith=# CREATE TABLE cities_default
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: WARNING:
> terminating connection because of crash of another server process
> DETAIL: The postmaster has commanded this server process to roll back the
> current transaction and exit, because another server process exited
> abnormally and possibly corrupted shared memory.
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> Failed.
> Time: 387.887 ms
> After reading responses, I think I'd be fine with how Rahila implemented
> this with disallowing the child until the data is removed from the default
> if this would allow it to be included in v10. As was mentioned, there just
> doesn't seem to be a way to easily handle the data conflicts cleanly at
> this time, but I think the value of the default to be able to catch
> accidental data vs returning an error is worth it. It also at least gives a
> slightly easier migration path vs having to migrate to a completely new
> table. Any chance this could be adapted for range partitioning as well? I'd
> be happy to create some pgtap tests with pg_partman for this then to make
> sure it works.
> Only issue I see with this, and I'm not sure if it is an issue, is what
> happens to that default constraint clause when 1000s of partitions start
> getting added? From what I gather the default's constraint is built based
> off the cumulative opposite of all other child constraints. I don't
> understand the code well enough to see what it's actually doing, but if
> there are no gaps, is the method used smart enough to aggregate all the
> child constraints to make a simpler constraint that is simply outside the
> current min/max boundaries? If so, for serial/time range partitioning this
> should typically work out fine since there are rarely gaps. This actually
> seems more of an issue for list partitioning where each child is a distinct
> value or range of values that are completely arbitrary. Won't that check
> and re-evaluation of the default's constraint just get worse and worse as
> more children are added? Is there really even a need for the default to
> have an opposite constraint like this? Not sure on how the planner works
> with partitioning now, but wouldn't it be better to first check all
> non-default children for a match the same as it does now without a default
> and, failing that, then route to the default if one is declared? The
> default should accept any data then so I don't see the need for the
> constraint unless it's required for the current implementation. If that's
> the case, could that be changed?
> Keith

Attachment Content-Type Size
default_partition_v6.patch application/x-download 20.2 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-04-06 11:56:34 Re: Logical decoding on standby
Previous Message Beena Emerson 2017-04-06 11:13:51 Re: increasing the default WAL segment size