From: | Les <nagylzs(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Slow query, possibly not using index |
Date: | 2023-08-27 17:39:40 |
Message-ID: | CAKXe9UDcDfQxCSJYXh9cNMUdfYhx=R7axicGbcm7s9peXOp-7g@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
> > If I try to select a single unused block this way:
> > explain analyze select id from media.block b where nrefs =0 limit 1
> > then it runs for more than 10 minutes (I'm not sure how long, I cancelled
> > the query after 10 minutes).
>
> Are you sure it isn't blocked on a lock?
>
Yes, I'm sure. I have created a single database instance from a zfs
snapshot and tried the query on that database. It was the only client.
> Another theory is that the index contains many thousands of references
> to now-dead rows, and the query is vainly searching for a live entry.
> Given that EXPLAIN thinks there are only about 2300 live entries,
> and yet you say the index is 400MB, this seems pretty plausible.
>
Nobody ever deleted anything from this table. Since it was created, this
has been a write-only table.
> Have you disabled autovacuum, or something like that? (REINDEX
> could help here, at least till the index gets bloated again.)
>
I did not disable autovacuum.
>
> You might think that even so, it shouldn't take that long ... but
> indexes on UUID columns are a well known performance antipattern.
> The index entry order is likely to have precisely zip to do with
> the table's physical order, resulting in exceedingly-random access
> to the table, which'll be horribly expensive when the table is so
> much bigger than RAM. Can you replace the UUID column with a simple
> serial (identity) column?
>
I'm aware of the problems with random UUID values. I was using this
function to create ulids from the beginning:
CREATE OR REPLACE FUNCTION public.gen_ulid()
RETURNS uuid
LANGUAGE sql
AS $function$
SELECT (lpad(to_hex(floor(extract(epoch FROM clock_timestamp()) * 1000)::
bigint), 12, '0') || encode(gen_random_bytes(10), 'hex'))::uuid;
$function$
;
If I order some rows by id values, I can see that their creation times are
strictly ascending. I did not write this function, it was taken from this
website:
https://blog.daveallie.com/ulid-primary-keys
They have a benchmark section where they show that these ULID values are
slower to generate (at least with this implementation) but much faster to
insert.
I might be able to replace these with int8 values, I need to check.
>
> > I believe it is not actually using the index, because reading a single
> > (random?) entry from an index should not run for >10 minutes.
>
> You should believe what EXPLAIN tells you about the plan shape.
> (Its rowcount estimates are only estimates, though.)
>
All of the 40M rows in this table are live. I'm 100% sure about this,
because nobody ever deleted rows from this table.
I can try to do VACUUM on this table, but I'm limited on resources. I think
it will take days to do this. Maybe I can try to dump the whole database
and restore it on another machine. Would that eliminate dead rows? (Is
there a way to check the number of dead rows?)
Regards,
Laszlo
From | Date | Subject | |
---|---|---|---|
Next Message | Wael Khobalatte | 2023-08-27 18:54:02 | Re: Slow query, possibly not using index |
Previous Message | Tom Lane | 2023-08-27 13:27:05 | Re: Slow query, possibly not using index |