Strange query behaviour

From: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
To: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Strange query behaviour
Date: 2019-01-22 17:41:04
Message-ID: CAMsGm5fzAJZYQi83uu75Fy8MhckwhCHgAQcRSUHGXbmC3H441g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm finding a massive difference in query execution time between two
queries that should be identical:

Very slow:
select ... from x natural left join y where y is null

Fast:
select ... from x natural left join y where y.primary_key_column is null

A fact that I suspect is important is that y has a column whose contents is
PDFs with a total size of 35608033659. However, I can query that size
using a query that looks like this:

select sum (length (pdf_field_1) + length (pdf_field_2)) from y

This runs very fast (2.8ms for 2324 rows).

So it is as if checking the whole tuple for NULL requires reading the PDF
bytea columns, but checking just the primary key for NULL or even reading
the lengths of the PDFs does not.

For the moment I'm going to fix it by just using "y.primary_key_column IS
NULL" instead of "y IS NULL" where I want to check whether I have a row
from y corresponding to a given row in x. But this seems like strange
behaviour. I can think of a couple of potential enhancements that this
suggests:

1) when checking an entire row for null, start with a primary key field or
other NOT NULL field. In the common case of checking what happened with a
left join, this is all that needs to be done - either there is a row, in
which case the field cannot be NULL, or there is no row and all the other
fields must also be NULL.

2) when checking a field for NULL, is it really necessary to load the field
contents? It feels like whether or not a value is NULL should be possible
to determine without de-toasting (if I have the right terminology).

Any ideas anybody might have would be much appreciated.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2019-01-22 17:50:19 Re: Thread-unsafe coding in ecpg
Previous Message Tomas Vondra 2019-01-22 17:35:21 Re: COPY FROM WHEN condition