Re: Partial hash index is not used for implied qual.

From: Sergei Glukhov <s(dot)glukhov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Partial hash index is not used for implied qual.
Date: 2025-11-25 11:50:59
Message-ID: cb0b9982-ca1a-4f03-9d7f-f9ca678be35c@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 11/25/25 6:01 AM, Tom Lane wrote:
> I wrote:
>> Wouldn't it be better to handle it more like the is_target_rel logic
>> a few lines further up?
> Actually, after thinking a bit longer, it'd be better to do something
> like the attached so that we don't keep redundant quals unless they'd
> *all* be excluded.
>
> There's definitely something fishy about the costing though.
> I experimented with this variant of Sergei's example:
>
> regression=# CREATE TABLE hash_partial(x) AS SELECT x % 100 as y from generate_series(1, 1000) as x;
> SELECT 1000
> regression=# ANALYZE hash_partial;
> ANALYZE
> regression=# CREATE INDEX partial_idx ON hash_partial USING hash(x) WHERE x = 1;
> CREATE INDEX
> regression=# set enable_seqscan TO 0; -- else we'll go for a seqscan
> SET
> regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
> QUERY PLAN
> ----------------------------------------------------------------------------
> Bitmap Heap Scan on hash_partial (cost=24.08..32.56 rows=10 width=4)
> Recheck Cond: (x = 1)
> -> Bitmap Index Scan on partial_idx (cost=0.00..24.07 rows=10 width=0)
> Index Cond: (x = 1)
> (4 rows)
>
> regression=# drop index partial_idx;
> DROP INDEX
> regression=# CREATE INDEX ON hash_partial USING hash(x);
> CREATE INDEX
> regression=# EXPLAIN SELECT x FROM hash_partial WHERE x = 1;
> QUERY PLAN
> ----------------------------------------------------------------------------------
> Bitmap Heap Scan on hash_partial (cost=4.08..12.56 rows=10 width=4)
> Recheck Cond: (x = 1)
> -> Bitmap Index Scan on hash_partial_x_idx (cost=0.00..4.08 rows=10 width=0)
> Index Cond: (x = 1)
> (4 rows)
>
> Why are we thinking that a non-partial index would be substantially
> cheaper to scan? That seems surely wrong, and it runs counter to my
> intuition about why this fix is incomplete. (I expected an unfair
> bias towards the partial index, not against it.)
>
> regards, tom lane
>

Thanks for the fix. It seems there is another case for investigation:

DROP TABLE hash_partial;
CREATE TABLE hash_partial(x, y) AS
SELECT x, x + x as y from generate_series(1, 1000) as x;
ANALYZE hash_partial;
CREATE INDEX partial_idx  ON hash_partial USING hash(x) WHERE x = 1;
SET enable_seqscan TO 0;
EXPLAIN SELECT x FROM hash_partial WHERE x = 1 and y < 0;
--------------------------------------------------------------------------------
Seq Scan on hash_partial  (cost=0.00..23.00 rows=1 width=4)
   Disabled: true
   Filter: ((y < 0) AND (x = 1))
(3 rows)

 Regarding strangeness of the cost,
 cost is depends on numIndexPages and
 in genericcostestimate() we calulate numIndexPages:

 numIndexPages = ceil(numIndexTuples * index->pages / index->tuples);

 For non-partial index index->pages = 6 and index->tuples = 1000
 and for partial index index->pages = 6 and index->tuples = 10.
 Number of pages depends on index relation size and
 initial size is 6 * BLCKSZ for both, partial and non-partial hash indexes
 Initial size of the hash index relation, in turn,
 depends on total number of tuples in the table.

 Regards,
 Sergei Glukhov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Xuneng Zhou 2025-11-25 11:51:19 Re: Implement waiting for wal lsn replay: reloaded
Previous Message Zhijie Hou (Fujitsu) 2025-11-25 11:48:48 RE: How can end users know the cause of LR slot sync delays?