Re: Increasing work_mem slows down query, why?

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Silvio Moioli <moio(at)suse(dot)de>
Cc: Pgsql Performance <pgsql-performance(at)lists(dot)postgresql(dot)org>
Subject: Re: Increasing work_mem slows down query, why?
Date: 2020-03-30 06:56:59
Message-ID: CAFj8pRDmvK8hZXLHs9hEEyEjVOWYw0=fUSirBKu5OV5qmiu7jg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

po 30. 3. 2020 v 8:47 odesílatel Silvio Moioli <moio(at)suse(dot)de> napsal:

> Dear list,
>
> here is a pretty contrived case where increasing work_mem produces a worse
> plan, with much worse overall query time. I wonder why that is the case.
>
>
> Problem: INSERTing a thousand new rows in a table which can easily have
> one million rows. PK is "id", which comes from a table, and we have two
> columns (called "name" and "version") which do not admit duplicates.
>
> Schema here:
> https://github.com/uyuni-project/uyuni/blob/Uyuni-2020.03/schema/spacewalk/common/tables/rhnPackageCapability.sql
> Indices here:
> https://github.com/uyuni-project/uyuni/blob/Uyuni-2020.03/schema/spacewalk/postgres/tables/rhnPackageCapability_index.sql
>
>
> We want one command that returns IDs given (name, version) couples. If
> they are already in the table, they should be SELECTed, if they are not,
> they should be INSERTed.
>
> Version is NULLable and NULL should be treated as a value.
>
> We use:
>
> WITH wanted_capability(ordering, name, version) AS (
> VALUES (1, 'first_name', '1.0.0'), (2, 'first_name', '1.0.1'), (1,
> 'second_name', '1.0.0'), ...998 more...
> )
> missing_capability AS (
> SELECT wanted_capability.*
> FROM wanted_capability LEFT JOIN rhnPackageCapability
> ON wanted_capability.name = rhnPackageCapability.name
> AND wanted_capability.version IS NOT DISTINCT FROM
> rhnPackageCapability.version
> WHERE rhnPackageCapability.id IS NULL
> ),
> inserted_capability AS (
> INSERT INTO rhnPackageCapability(id, name, version)
> SELECT nextval('rhn_pkg_capability_id_seq'), name, version FROM
> missing_capability ON CONFLICT DO NOTHING
> RETURNING id, name, version
> )
> SELECT wanted_capability.ordering, inserted_capability.id
> FROM wanted_capability JOIN inserted_capability
> ON wanted_capability.name = inserted_capability.name
> AND wanted_capability.version IS NOT DISTINCT FROM
> inserted_capability.version
> UNION (
> SELECT wanted_capability.ordering, rhnPackageCapability.id
> FROM wanted_capability JOIN rhnPackageCapability
> ON wanted_capability.name = rhnPackageCapability.name
> AND wanted_capability.version IS NOT DISTINCT FROM
> rhnPackageCapability.version
> )
> ORDER BY ordering
> ;
>
>
> Behavior at work_mem = 5 MB is pretty good, query finishes in 200ms. Plan:
> https://explain.dalibo.com/plan/4u
>
> Behavior at work_mem = 80 MB seems not equally good, query takes more than
> 13s. Two expensive SORTs and MERGE JOINs are done instead of HASH JOINs.
> Plan: thttps://explain.dalibo.com/plan/ORd

please, can you send explain in text form?

Probably, there is a problem in wrong estimation. What can be expected
because CTE is optimization fence in this version

Regards

Pavel

>
> Adding one more INDEX on rhnCapability.name fixes the issue.
>
> My question is: why are SORTs chosen if more work_mem is available, and
> why can't the planner predict query will be slower that way?
>
> All of the above is reproducible on openSUSE Leap and PostgreSQL 10.12.
>
> Ideas welcome, and thanks in advance!
>
> Regards,
> --
> Silvio Moioli
> SUSE Manager Development Team
>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Silvio Moioli 2020-03-30 08:12:42 Re: Increasing work_mem slows down query, why?
Previous Message Silvio Moioli 2020-03-30 06:47:05 Increasing work_mem slows down query, why?