Multi-column join + aggregate subquery resulting in infinite run time

From: Steve Estes <denzera(at)gmail(dot)com>
To: pgsql-novice(at)lists(dot)postgresql(dot)org
Subject: Multi-column join + aggregate subquery resulting in infinite run time
Date: 2020-07-06 14:59:51
Message-ID: CAJjrZPCQcL1u0GaUXhexwwAt8NWUxnKsr+vjKYuY2H45jv_y4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Hey gang,

I thought I knew what I was doing with queries, but this has more-or-less
ruined my 4th of July holiday so I think it's high time to ask for help. I
imagine it's something trivial and obvious to you.

I've got 3 tables in question here:

*prod_account_details* => 538k records
line_number PK,
acct_id integer, -- nearly unique, but not quite; acct_id PLUS address is
unique
address varchar(100),
entity_name varchar(100),
(... a few other fields ...)
btree index on acct_id, hash index on address, also btree on (acct_id,
address)

*prod_customer_profiles* => 532k records, *nearly* 1:1 with account details
acct_id integer, -- corresponds to account_details.acct_id
match_input_address varchar(100), -- corresponds to account_details.address
(... lots of other fields ...)
btree index on acct_id, hash index on match_input_address, also btree on
(acct_id, match_input_address)

*prod_dunning_letters* => 518k records, 1:M with account_details (covering
~181k acct_ids)
line_number PK,
acct_id integer, -- FK to account_details, not formally via constraint but
semantically
(... a few other fields ...)
btree index on acct_id

--- Queries ---

So firstly, what works:

(1)

SELECT ad.acct_id, ad.entity_name, ad.address, acx.flag_infobase_match

FROM prod_account_details ad

INNER JOIN prod_customer_profiles acx ON ad.acct_id = acx.acct_id

AND ad.address = acx.match_input_address;

...returns the 532k matched records in 3.0 seconds.

Now if we *INNER* join to the dunning_letters table via a grouped subquery:

(2)

SELECT ad.acct_id, ad.entity_name, ad.address, acx.flag_infobase_match,
dl.num_records AS dl_count

FROM prod_account_details ad

INNER JOIN prod_customer_profiles acx ON ad.acct_id = acx.acct_id

AND ad.address = acx.match_input_address

INNER JOIN (SELECT count(*) AS num_records, acct_id FROM
prod_dunning_letters GROUP BY acct_id ) dl ON ad.acct_id = dl.acct_id;

... it returns fine too, 161k records in 1.3 seconds.

But now if I do something as simple as changing that second join (to the
subquery) to a LEFT JOIN, it blows up. i.e. I want all valid accounts,
appended with the count of records in the dunning_letters table if any
exist:

(3)

SELECT ad.acct_id, ad.entity_name, ad.address, acx.flag_infobase_match,
dl.num_records AS dl_count

FROM prod_account_details ad

INNER JOIN prod_customer_profiles acx ON ad.acct_id = acx.acct_id

AND ad.address = acx.match_input_address

LEFT JOIN (SELECT count(*) AS num_records, acct_id FROM
prod_dunning_letters GROUP BY acct_id ) dl ON ad.acct_id = dl.acct_id;

What happens is, this runs for 20+ seconds, and then starts returning ~100
records every 10 seconds, which at the volume I'm talking about (>500k)
would take forever to return. And next up I need to bolt on the counts for
some other, bigger tables too, so we're quickly approaching
heat-death-of-the-universe amounts of runtime here.

Now what's interesting is, if I relax the join on the first two tables,
i.e. drop the address part of the join and only do the acct_id one,

(4)

SELECT ad.acct_id, ad.entity_name, ad.address, acx.flag_infobase_match,
dl.num_records AS dl_count

FROM prod_account_details ad

INNER JOIN prod_customer_profiles acx ON ad.acct_id = acx.acct_id

-- AND ad.address = acx.match_input_address

LEFT JOIN (SELECT count(*) AS num_records, acct_id FROM
prod_dunning_letters GROUP BY acct_id ) dl ON ad.acct_id = dl.acct_id;

...it starts returning immediately and spools 695k records in 4.7 seconds.
The problem is that we need the address part of the join in there, because
there are some duplicate acct_ids on both sides so we get a bunch of
cartesian-join junk.

--- EXPLAIN query plans ---

So here's the Explain for query (1), with just the two tables:

Gather (cost=102995.20..123944.32 rows=1 width=41)
Workers Planned: 2
-> Parallel Hash Join (cost=101995.20..122944.22 rows=1 width=41)
Hash Cond: ((ad.acct_id = acx.acct_id) AND (ad.address =
(acx.match_input_address)::text))
-> Parallel Seq Scan on prod_account_details ad
(cost=0.00..14226.43 rows=224343 width=39)
-> Parallel Hash (cost=97096.88..97096.88 rows=224288 width=26)
-> Parallel Seq Scan on prod_customer_profiles acx
(cost=0.00..97096.88 rows=224288 width=26)

Now, when we add on the third table, its strategy for that join changes.
Here's the Explain for (2), with the inner join that works:

Gather (cost=22500.40..83636.43 rows=1 width=49)
Workers Planned: 2
-> Nested Loop (cost=21500.40..82636.33 rows=1 width=49)
-> Hash Join (cost=21500.40..40427.74 rows=56285 width=54)
Hash Cond: (ad.acct_id = prod_dunning_letters.acct_id)
-> Parallel Seq Scan on prod_account_details ad
(cost=0.00..14226.43 rows=224343 width=39)
-> Hash (cost=19345.40..19345.40 rows=123920 width=15)
-> GroupAggregate (cost=0.42..18106.20 rows=123920
width=15)
Group Key: prod_dunning_letters.acct_id
-> Index Only Scan using
prod_dunning_letters_acct_id_idx on prod_dunning_letters
(cost=0.42..14269.36 rows=519529 width=7)
-> Index Scan using prod_customer_profiles_match_input_address_idx
on prod_customer_profiles acx (cost=0.00..0.74 rows=1 width=26)
Index Cond: ((match_input_address)::text = ad.address)
Filter: (ad.acct_id = acct_id)

But then change it to a LEFT JOIN from query (3), and it changes to:

Nested Loop Left Join (cost=102995.63..144838.72 rows=1 width=49)
Join Filter: (ad.acct_id = prod_dunning_letters.acct_id)
-> Gather (cost=102995.20..123944.32 rows=1 width=41)
Workers Planned: 2
-> Parallel Hash Join (cost=101995.20..122944.22 rows=1 width=41)
Hash Cond: ((ad.acct_id = acx.acct_id) AND (ad.address =
(acx.match_input_address)::text))
-> Parallel Seq Scan on prod_account_details ad
(cost=0.00..14226.43 rows=224343 width=39)
-> Parallel Hash (cost=97096.88..97096.88 rows=224288
width=26)
-> Parallel Seq Scan on prod_customer_profiles acx
(cost=0.00..97096.88 rows=224288 width=26)
-> GroupAggregate (cost=0.42..18106.20 rows=123920 width=15)
Group Key: prod_dunning_letters.acct_id
-> Index Only Scan using prod_dunning_letters_acct_id_idx on
prod_dunning_letters (cost=0.42..14269.36 rows=519529 width=7)

Just reading that, I can't readily tell why #2 finishes fast and #3 runs
without end. If the difference was JUST the use of a left join, then why
does #4 finish easily too, just with a single-column join between the first
two tables?

This has befuddled the rest of my experimentation with it, so I'd really
appreciate some insight.

-Steve

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Tom Lane 2020-07-06 15:20:14 Re: What's the best practice to compare the transaction with the checkpoint?
Previous Message Gerald Cheves 2020-07-06 14:58:40 Re: What's the best practice to compare the transaction with the checkpoint?