Anti join miscalculates row number?

From: "Jens Reufsteck" <jens(dot)reufsteck(at)staufenbiel(dot)de>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Anti join miscalculates row number?
Date: 2011-10-26 10:23:54
Message-ID: 000d01cc93c9$5d3a8e90$17afabb0$@reufsteck@staufenbiel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I’ve got a lengthy query, that doesn't finish in reasonable time (i.e.
10min+). I suspect, that the query optimizer miscalculates the number of
rows for part of the query.

The suspicious subquery:

SELECT
sv1.sid as sid
FROM stud_vera sv1
LEFT JOIN stud_vera AS sv2
ON (
sv1.sid=sv2.sid
AND sv2.veraid IN ( 109 )
)
WHERE sv1.veraid IN ( 3 )
AND sv2.veraid IS NULL

The whole query:

SELECT count(DISTINCT sid) AS Anzahl FROM (SELECT sid
FROM stud
WHERE (
status IN (1,2)
AND length(vname) > 1
AND length(nname) > 1
AND length(email) > 1

)) AS stud INNER JOIN (SELECT DISTINCT
sid,
cast(created AS date) AS tag,
cast(veradate AS DATE) -
cast(stud_vera.created AS DATE) AS tage,
cast(
floor(
(cast(veradate AS date) -
cast(stud_vera.created AS date))/7
)
AS integer) AS woche,
cast(extract(week from stud_vera.created) AS
integer) AS kalenderwoche,
to_char(stud_vera.created, 'YYYY/MM') AS
monat,
to_char(stud_vera.abgemeldet, 'YYYY/MM') AS
abmeldemonat,
CASE
WHEN newsletterid &1 = 1 THEN 'Flag
1'
WHEN newsletterid &2 = 2 THEN 'Flag
2'
WHEN newsletterid &4 = 2 THEN 'Flag
3'
WHEN newsletterid &8 = 8 THEN 'Flag
4'
WHEN newsletterid &16 = 16 THEN
'Flag 5'
WHEN newsletterid &32 = 32 THEN
'Flag 6'
WHEN newsletterid &64 = 64 THEN
'Flag 7'
WHEN newsletterid &128 = 128 THEN
'Flag 8'
END AS newsletterid
FROM stud_vera
JOIN vera USING (veraid)
WHERE

stud_vera.status > 0
AND abgemeldet is null

AND veraid IN (

3

)) AS vera USING (sid) INNER JOIN (SELECT
sid,
age(date_trunc('MONTH', now()), date_trunc('MONTH',
bis)) || '' AS months
FROM
study
WHERE status = 1

AND

age(date_trunc('MONTH', now()),
date_trunc('MONTH', bis)) < interval '60 months'

AND

age(date_trunc('MONTH', bis),
date_trunc('MONTH', now())) <= interval '-24 months') AS examen USING (sid)
INNER JOIN (SELECT
sv1.sid as sid
FROM stud_vera sv1
LEFT JOIN stud_vera AS sv2
ON (
sv1.sid=sv2.sid
AND sv2.veraid IN ( 109 )
)
WHERE sv1.veraid IN ( 3 )
AND sv2.veraid IS NULL) AS veraAusschluss USING
(sid)

And the explain analyze for the sub query: http://explain.depesz.com/s/8d2
And the explain for the whole query: http://explain.depesz.com/s/GGf
(explain analyze doesn't finish in reasonable time)

What strucks me, is that in the sub query row numbers for sv1 and sv2 are
calculated quite accurat. But the resulting 4 rows after the final join is
far from reality. Shouldn't this be as minimum the number of rows for sv1
minus number of rows for sv2?

If the optimizer knew, that the number is much bigger, it probably wouldn't
choose the nested loop in the next step, which I suspect is the reason for
the performance issues.

We're using postgres 9.0.4. It might be interesting, that the same query
runs smoothly on our test system with postgres 8.3.7.

The tables:

Tabelle »public.stud_vera«
Spalte | Typ |
Attribute
-------------------+-----------------------------+--------------------------
------------------------------------
svid | integer | not null Vorgabewert
nextval('stud_vera_svid_seq'::regclass)
sid | integer |
veraid | integer |
modified | timestamp without time zone | not null Vorgabewert
now()
created | timestamp without time zone | not null Vorgabewert
now()
verastep1 | timestamp without time zone |
kontoinhaber | character varying(64) |
kontonum | character varying(32) |
blz | character varying(32) |
bank | character varying(64) |
betrag | numeric(5,2) |
verastep2 | timestamp without time zone |
deferred | smallint |
verastep3 | timestamp without time zone |
status | smallint | not null
label | character varying(128) |
kanalid | integer |
deferredtxt | character varying(256) |
comment | character varying(64) |
label2 | character varying(128) |
dstid | integer |
abgemeldet | date |
bsid | integer | Vorgabewert 1
newsletterid | integer |
abmeldenewsletter | integer | not null Vorgabewert 0
kanalcomment | character varying(128) |
Indexe:
"stud_vera_pkey" PRIMARY KEY, btree (svid)
"stud_vera_sid_veraid_idx" UNIQUE, btree (sid, veraid)
"stud_vera_sid_idx" btree (sid)
"stud_vera_veraid_idx" btree (veraid)
Fremdschlüssel-Constraints:
"$1" FOREIGN KEY (sid) REFERENCES stud(sid) ON UPDATE CASCADE ON DELETE
CASCADE
"$2" FOREIGN KEY (veraid) REFERENCES vera(veraid) ON UPDATE CASCADE ON
DELETE SET NULL
"stud_vera_dstid" FOREIGN KEY (dstid) REFERENCES datenschutztext(dstid)
ON UPDATE CASCADE ON DELETE SET NULL
Fremdschlüsselverweise von:
TABLE "eingang" CONSTRAINT "eingang_svid_fkey" FOREIGN KEY (svid)
REFERENCES stud_vera(svid) ON UPDATE CASCADE ON DELETE CASCADE

Tabelle »public.stud«
Spalte | Typ |
Attribute
---------------+-----------------------------+------------------------------
--------------------------
sid | integer | not null Vorgabewert
nextval('stud_sid_seq'::regclass)
login | character varying(64) | not null
passwd | character varying(32) |
modified | timestamp without time zone | not null Vorgabewert now()
created | timestamp without time zone | not null Vorgabewert now()
lastlogin | timestamp without time zone |
mow | smallint |
titel | character varying(32) |
vname | character varying(32) |
nname | character varying(32) |
birth | date |
einstieg | date |
blacksheep | integer |
studstatusid | integer |
status | smallint | not null
studmodified | timestamp without time zone |
adminmodified | timestamp without time zone |
comment | character varying(128) |
dstid | integer |
linkid | integer |
beesiteuserid | integer |
ypdate | date |
email | character varying(64) |
flag | smallint |
Indexe:
"stud_pkey" PRIMARY KEY, btree (sid)
"stud_login_idx" UNIQUE, btree (login)
"stud_login_lower" btree (lower(login::text))
"stud_nname_idx" btree (lower(nname::text))
"stud_sid_status_idx" btree (sid, status)
"stud_vname_idx" btree (lower(vname::text))
Check-Constraints:
"birth" CHECK (birth >= '1900-01-01'::date AND birth <=
'1999-12-31'::date)
Fremdschlüssel-Constraints:
"stud_dstid_fkey" FOREIGN KEY (dstid) REFERENCES datenschutztext(dstid)
ON UPDATE CASCADE ON DELETE SET NULL
Fremdschlüsselverweise von:
TABLE "stud_vera" CONSTRAINT "$1" FOREIGN KEY (sid) REFERENCES stud(sid)
ON UPDATE CASCADE ON DELETE CASCADE
TABLE "stud_wdwolle" CONSTRAINT "$1" FOREIGN KEY (sid) REFERENCES
stud(sid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "stud_staats" CONSTRAINT "$1" FOREIGN KEY (sid) REFERENCES
stud(sid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "locking" CONSTRAINT "$1" FOREIGN KEY (sid) REFERENCES stud(sid)
ON UPDATE CASCADE ON DELETE CASCADE
TABLE "stud_ad" CONSTRAINT "$1" FOREIGN KEY (sid) REFERENCES stud(sid)
ON UPDATE CASCADE ON DELETE CASCADE
TABLE "gutschein" CONSTRAINT "$2" FOREIGN KEY (sid) REFERENCES stud(sid)
ON UPDATE CASCADE ON DELETE CASCADE
TABLE "kontakt" CONSTRAINT "kontakt_sid_fkey" FOREIGN KEY (sid)
REFERENCES stud(sid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "stud_ae" CONSTRAINT "stud_ae_sid_fkey" FOREIGN KEY (sid)
REFERENCES stud(sid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "stud_berufsfeld" CONSTRAINT "stud_berufsfeld_fk_sid" FOREIGN KEY
(sid) REFERENCES stud(sid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "stud_einstiegsbereich" CONSTRAINT
"stud_einstiegsbereich_fkey_sid" FOREIGN KEY (sid) REFERENCES stud(sid) ON
UPDATE CASCADE ON DELETE CASCADE
TABLE "stud_vakanzen" CONSTRAINT "stud_vakanzen_sid_fkey" FOREIGN KEY
(sid) REFERENCES stud(sid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "stud_vposition" CONSTRAINT "stud_vposition_sid_fkey" FOREIGN KEY
(sid) REFERENCES stud(sid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "study" CONSTRAINT "study_sid_fkey" FOREIGN KEY (sid) REFERENCES
stud(sid) ON UPDATE CASCADE ON DELETE CASCADE

Tabelle »public.vera«
Spalte | Typ |
Attribute
--------------+-----------------------------+-------------------------------
----------------------------
veraid | integer | not null Vorgabewert
nextval('vera_veraid_seq'::regclass)
vera | character varying(64) |
verakurz | character varying(32) |
vera_e | character varying(8) |
vera_e2 | character varying(8) |
veratyp | smallint |
veradate | date |
veradauer | integer |
veraort | character varying(32) |
veraland | character varying(32) |
veracomment | character varying(255) |
active | smallint |
status | smallint |
landid | integer |
spontandate | date |
spontandate2 | date |
dstid | integer |
xmlconf | character varying(128) |
verakurz2 | character varying(32) |
closingdate | timestamp without time zone |
url | character varying(128) |
urltext | character varying(32) | Vorgabewert 'Zum
Event'::character varying
etflag | integer |
Indexe:
"vera_pkey" PRIMARY KEY, btree (veraid)
"vera_verakurz_unique" UNIQUE, btree (verakurz)
Fremdschlüssel-Constraints:
"vera_dstid" FOREIGN KEY (dstid) REFERENCES datenschutztext(dstid) ON
UPDATE CASCADE ON DELETE SET NULL
Fremdschlüsselverweise von:
TABLE "vera_reihe" CONSTRAINT "$1" FOREIGN KEY (veraid) REFERENCES
vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "gutschein" CONSTRAINT "$1" FOREIGN KEY (veraid) REFERENCES
vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "prod_vera" CONSTRAINT "$2" FOREIGN KEY (veraid) REFERENCES
vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "stud_vera" CONSTRAINT "$2" FOREIGN KEY (veraid) REFERENCES
vera(veraid) ON UPDATE CASCADE ON DELETE SET NULL
TABLE "auswahlevent" CONSTRAINT "auswahlevent_veraid_fkey" FOREIGN KEY
(veraid) REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "inside" CONSTRAINT "fk_inside_veraid" FOREIGN KEY (veraid)
REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "fprofil" CONSTRAINT "fprofil_veraid_fkey" FOREIGN KEY (veraid)
REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "mailversand" CONSTRAINT "mailversand_veraid_fkey" FOREIGN KEY
(veraid) REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "mailvorlage" CONSTRAINT "mailvorlage_veraid_fkey" FOREIGN KEY
(veraid) REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "raum" CONSTRAINT "raum_veraid_fkey" FOREIGN KEY (veraid)
REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "schiene" CONSTRAINT "schiene_veraid_fkey" FOREIGN KEY (veraid)
REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "vakanzen" CONSTRAINT "vakanzen_veraid_fkey" FOREIGN KEY (veraid)
REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE
TABLE "vposition" CONSTRAINT "vposition_veraid_fkey" FOREIGN KEY
(veraid) REFERENCES vera(veraid) ON UPDATE CASCADE ON DELETE CASCADE

Tabelle »public.study«
Spalte | Typ |
Attribute
---------------------+--------------------------------+---------------------
----------------------------------------
studyid | integer | not null Vorgabewert
nextval('study_studyid_seq'::regclass)
sid | integer | not null
modified | timestamp(0) without time zone | not null Vorgabewert
now()
created | timestamp(0) without time zone | not null Vorgabewert
now()
abschlusstypid | integer |
uniid | integer |
von | date |
bis | date |
unisonstige | character varying(128) |
unilandid | integer |
ausrichtungsonstige | character varying(64) |
vertiefungsonstige | character varying(64) |
qnoteid | integer |
status | smallint | not null Vorgabewert
1
Indexe:
"study_pkey" PRIMARY KEY, btree (studyid)
"study_sid_idx" btree (sid)
Fremdschlüssel-Constraints:
"study_sid_fkey" FOREIGN KEY (sid) REFERENCES stud(sid) ON UPDATE
CASCADE ON DELETE CASCADE
Fremdschlüsselverweise von:
TABLE "study_ausrichtung" CONSTRAINT "study_ausrichtung_studyid_fkey"
FOREIGN KEY (studyid) REFERENCES study(studyid) ON UPDATE CASCADE ON DELETE
CASCADE
TABLE "study_vertiefung" CONSTRAINT "study_vertiefung_fkey1" FOREIGN KEY
(studyid) REFERENCES study(studyid) ON UPDATE CASCADE ON DELETE CASCADE

Many thanks

--
Jens Reufsteck

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2011-10-26 12:23:18 Re: CTE vs Subquery
Previous Message Linos 2011-10-26 09:00:48 Re: CTE vs Subquery