Re: hashjoins, index loops to retrieve pk/ux constrains in pg12

From: Arturas Mazeika <mazeika(at)gmail(dot)com>
To: Michael Lewis <mlewis(at)entrata(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: hashjoins, index loops to retrieve pk/ux constrains in pg12
Date: 2021-09-27 19:56:12
Message-ID: CAAUL=cFEKJ0KJ35vZRj8ROZKJDDduK_D6haokiKfKBmWAz8DUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I
Hi Michael,

Thanks a lot for having a look at the query once again in more detail. In
short, you are right, I fired the liquibase scripts and observed the exact
query that was hanging in pg_stats_activity. The query was:

SELECT
FK.TABLE_NAME as "TABLE_NAME"
, CU.COLUMN_NAME as "COLUMN_NAME"
, PK.TABLE_NAME as "REFERENCED_TABLE_NAME"
, PT.COLUMN_NAME as "REFERENCED_COLUMN_NAME"
, C.CONSTRAINT_NAME as "CONSTRAINT_NAME"
FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS C
INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS FK ON
C.CONSTRAINT_NAME = FK.CONSTRAINT_NAME
INNER JOIN INFORMATION_SCHEMA.TABLE_CONSTRAINTS PK ON
C.UNIQUE_CONSTRAINT_NAME = PK.CONSTRAINT_NAME
INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE CU ON C.CONSTRAINT_NAME
= CU.CONSTRAINT_NAME
INNER JOIN (
SELECT
i1.TABLE_NAME
, i2.COLUMN_NAME
FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS i1
INNER JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE i2 ON
i1.CONSTRAINT_NAME = i2.CONSTRAINT_NAME
WHERE i1.CONSTRAINT_TYPE = 'PRIMARY KEY'
) PT ON PT.TABLE_NAME = PK.TABLE_NAME WHERE
lower(FK.TABLE_NAME)='secrole_condcollection'

I rerun this query twice. Once with set enable_hashjoin = false; and set
enable_hashjoin = true; . I observed that the join order was very, very
similar between the hash and index plans. I reran the above two queries
with random_page_cost to 2, 1.5, or 1.0 and observed no difference
whatsoever, the planner was always choosing the hashjoins over sort/index
nested loops. the seq_page_cost is set to default value 1. The tables
behind the views do not have more than 10K rows, and do not exceed 400KB of
space. The work_mem parameter is set to 256MB, effective cache is 9GB, the
machine has something around 32-64GB of RAM, SSD as the primary drive, 140
default connections. The query planner, of course thinks that the overall
nested loop including hashes is better:

cost=2174.36..13670.47 (hash)

vs

cost=1736.10..18890.44 (index/sort join)

but I think there's a problem there, cause I don't think that one can reuse
the pre-computed hashes over and over again, while sort/index joins end up
hitting the same buffers, or am I wrong?

More details about the query plans as well as the complete set of settings
can be found in the original email at
https://www.postgresql.org/message-id/CAAUL%3DcFcvUo%3D7b4T-K5PqiqrF6etp59qcgv77DyK2Swa4VhYuQ%40mail.gmail.com

If you could have another look into what's going on, I'd appreciate it a
lot. in postgres 9.6 our setup goes through the liquibase scripts in 5
minutes, and pg12 with hash joins may take up to 1.5 hours.

Cheers,
Arturas

On Mon, Sep 27, 2021 at 4:12 PM Michael Lewis <mlewis(at)entrata(dot)com> wrote:

> I'm unclear what you changed to get the planner to choose one vs the
> other. Did you disable hashjoins? Without the full plan to review, it is
> tough to agre with any conclusion that these particular nodes are
> troublesome. It might be that this was the right choice for that part of
> that plan, but improper estimates at a earlier step were problematic.
>
> What configs have you changed such as work_mem, random_page_cost, and
> such? If random_page_cost & seq_page_cost are still default values,
> then the planner will tend to do more seq scans I believe, and hash them to
> join with large sets of data, rather than do nested loop index scans. I
> think that's how that works. With the lack of flexibility to change the
> query, you might be able to set a few configs for the user that runs these
> schema checks. If you can find changes that make an overall improvement.
>
>
> *Michael Lewis | Database Engineer*
> *Entrata*
>
>>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kim Johan Andersson 2021-09-27 20:02:49 Partial index on enum type is not being used, type issue?
Previous Message ldh@laurent-hasson.com 2021-09-27 16:05:26 RE: Big Performance drop of Exceptions in UDFs between V11.2 and 13.4