RE: Skip partition tuple routing with constant partition key

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: RE: Skip partition tuple routing with constant partition key
Date: 2022-07-28 08:40:47
Message-ID: OS0PR01MB5716317E6AA7C278B9077AA994969@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, July 28, 2022 10:59 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On Thu, 28 Jul 2022 at 00:50, Amit Langote <amitlangote09(at)gmail(dot)com>
> wrote:
> > So, in a way the caching scheme works for LIST partitioning only if
> > the same value appears consecutively in the input set, whereas it does
> > not for *a set of* values belonging to the same partition appearing
> > consecutively. Maybe that's a reasonable restriction for now.
>
> I'm not really seeing another cheap enough way of doing that. Any LIST
> partition could allow any number of values. We've only space to record
> 1 of those values by way of recording which element in the PartitionBound that
> it was located.
>
> > if (part_index < 0)
> > - part_index = boundinfo->default_index;
> > + {
> > + /*
> > + * Since we don't do caching for the default partition or failed
> > + * lookups, we'll just wipe the cache fields back to their initial
> > + * values. The count becomes 0 rather than 1 as 1 means it's the
> > + * first time we've found a partition we're recording for the cache.
> > + */
> > + partdesc->last_found_datum_index = -1;
> > + partdesc->last_found_part_index = -1;
> > + partdesc->last_found_count = 0;
> > +
> > + return boundinfo->default_index;
> > + }
> >
> > I wonder why not to leave the cache untouched in this case? It's
> > possible that erratic rows only rarely occur in the input sets.
>
> I looked into that and I ended up just removing the code to reset the cache. It
> now works similarly to a LIST partitioned table's NULL partition.
>
> > Should the comment update above get_partition_for_tuple() mention
> > something like the cached path is basically O(1) and the non-cache
> > path O (log N) as I can see in comments in some other modules, like
> > pairingheap.c?
>
> I adjusted for the other things you mentioned but I didn't add the big O stuff. I
> thought the comment was clear enough.
>
> I'd quite like to push this patch early next week, so if anyone else is following
> along that might have any objections, could they do so before then?

Thanks for the patch. The patch looks good to me.

Only a minor nitpick:

+ /*
+ * For LIST partitioning, this is the number of times in a row that the
+ * the datum we're looking

It seems a duplicate 'the' word in this comment.
"the the datum".

Best regards,
Hou Zhijie

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-07-28 08:57:23 Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Previous Message 王海洋 2022-07-28 08:10:44 [PATCH] BUG FIX: redo will abort, due to inconsistent page found in BRIN_REGULAR_PAGE