From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Silvio Moioli <moio(at)suse(dot)de> |
Cc: | Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Increasing work_mem slows down query, why? |
Date: | 2020-03-30 06:56:59 |
Message-ID: | CAFj8pRDmvK8hZXLHs9hEEyEjVOWYw0=fUSirBKu5OV5qmiu7jg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
po 30. 3. 2020 v 8:47 odesílatel Silvio Moioli <moio(at)suse(dot)de> napsal:
> Dear list,
>
> here is a pretty contrived case where increasing work_mem produces a worse
> plan, with much worse overall query time. I wonder why that is the case.
>
>
> Problem: INSERTing a thousand new rows in a table which can easily have
> one million rows. PK is "id", which comes from a table, and we have two
> columns (called "name" and "version") which do not admit duplicates.
>
> Schema here:
> https://github.com/uyuni-project/uyuni/blob/Uyuni-2020.03/schema/spacewalk/common/tables/rhnPackageCapability.sql
> Indices here:
> https://github.com/uyuni-project/uyuni/blob/Uyuni-2020.03/schema/spacewalk/postgres/tables/rhnPackageCapability_index.sql
>
>
> We want one command that returns IDs given (name, version) couples. If
> they are already in the table, they should be SELECTed, if they are not,
> they should be INSERTed.
>
> Version is NULLable and NULL should be treated as a value.
>
> We use:
>
> WITH wanted_capability(ordering, name, version) AS (
> VALUES (1, 'first_name', '1.0.0'), (2, 'first_name', '1.0.1'), (1,
> 'second_name', '1.0.0'), ...998 more...
> )
> missing_capability AS (
> SELECT wanted_capability.*
> FROM wanted_capability LEFT JOIN rhnPackageCapability
> ON wanted_capability.name = rhnPackageCapability.name
> AND wanted_capability.version IS NOT DISTINCT FROM
> rhnPackageCapability.version
> WHERE rhnPackageCapability.id IS NULL
> ),
> inserted_capability AS (
> INSERT INTO rhnPackageCapability(id, name, version)
> SELECT nextval('rhn_pkg_capability_id_seq'), name, version FROM
> missing_capability ON CONFLICT DO NOTHING
> RETURNING id, name, version
> )
> SELECT wanted_capability.ordering, inserted_capability.id
> FROM wanted_capability JOIN inserted_capability
> ON wanted_capability.name = inserted_capability.name
> AND wanted_capability.version IS NOT DISTINCT FROM
> inserted_capability.version
> UNION (
> SELECT wanted_capability.ordering, rhnPackageCapability.id
> FROM wanted_capability JOIN rhnPackageCapability
> ON wanted_capability.name = rhnPackageCapability.name
> AND wanted_capability.version IS NOT DISTINCT FROM
> rhnPackageCapability.version
> )
> ORDER BY ordering
> ;
>
>
> Behavior at work_mem = 5 MB is pretty good, query finishes in 200ms. Plan:
> https://explain.dalibo.com/plan/4u
>
> Behavior at work_mem = 80 MB seems not equally good, query takes more than
> 13s. Two expensive SORTs and MERGE JOINs are done instead of HASH JOINs.
> Plan: thttps://explain.dalibo.com/plan/ORd
please, can you send explain in text form?
Probably, there is a problem in wrong estimation. What can be expected
because CTE is optimization fence in this version
Regards
Pavel
>
> Adding one more INDEX on rhnCapability.name fixes the issue.
>
> My question is: why are SORTs chosen if more work_mem is available, and
> why can't the planner predict query will be slower that way?
>
> All of the above is reproducible on openSUSE Leap and PostgreSQL 10.12.
>
> Ideas welcome, and thanks in advance!
>
> Regards,
> --
> Silvio Moioli
> SUSE Manager Development Team
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Silvio Moioli | 2020-03-30 08:12:42 | Re: Increasing work_mem slows down query, why? |
Previous Message | Silvio Moioli | 2020-03-30 06:47:05 | Increasing work_mem slows down query, why? |