Re: BUG #18909: Query creates millions of temporary files and stalls

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sergey Koposov <Sergey(dot)Koposov(at)ed(dot)ac(dot)uk>
Cc: "lepihov(at)gmail(dot)com" <lepihov(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #18909: Query creates millions of temporary files and stalls
Date: 2025-05-03 22:47:33
Message-ID: 2221240.1746312453@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Sergey Koposov <Sergey(dot)Koposov(at)ed(dot)ac(dot)uk> writes:
> wsdb=> set cursor_tuple_fraction TO 1;
> SET
> wsdb=> EXPLAIN DECLARE c CURSOR FOR select phot_g_mean_mag,phot_bp_mean_mag,phot_rp_mean_mag,
> g.source_id,pmra,pmdec,parallax,parallax_error,d.ra,d.dec,d.ebv,d.gmeanpsfmag,rmeanpsfmag,imeanpsfmag,gkronmag,rkronmag from disk_sample1 as d left join
> gaia_edr3_aux.panstarrs1bestneighbour as gaps1 on (gaps1.original_ext_source_id=d.objid) left join gaia_edr3.gaia_source as g on ( g.source_id = gaps1.source_id
> ) ;
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------
> Hash Right Join (cost=112529534.06..422118114.74 rows=65470868 width=100)
> Hash Cond: (g.source_id = gaps1.source_id)
> -> Seq Scan on gaia_source g (cost=0.00..124645680.12 rows=1811786112 width=48)
> -> Hash (cost=111007847.21..111007847.21 rows=65470868 width=60)
> -> Hash Right Join (cost=3231089.53..111007847.21 rows=65470868 width=60)
> Hash Cond: (gaps1.original_ext_source_id = d.objid)
> -> Seq Scan on panstarrs1bestneighbour gaps1 (cost=0.00..17443258.96 rows=951450496 width=16)
> -> Hash (cost=1709402.68..1709402.68 rows=65470868 width=60)
> -> Seq Scan on disk_sample1 d (cost=0.00..1709402.68 rows=65470868 width=60)

Oh, *that* is interesting: now we have two hash joins, and neither one
has panstarrs1bestneighbour as the table to hash, so it's not too
clear which one is going crazy. But you showed the stats for
disk_sample1.objid, and that looked pretty well distributed, so I'm
going to guess that that hash is fine. That leaves the other join
on panstarrs1bestneighbour.source_id as the one under suspicion.
Can we see the stats for that column?

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Sergey Koposov 2025-05-03 23:05:31 Re: BUG #18909: Query creates millions of temporary files and stalls
Previous Message Sergey Koposov 2025-05-03 22:30:19 Re: BUG #18909: Query creates millions of temporary files and stalls