Re: [SQL] OFFSET impact on Performance???

From: PFC <lists(at)boutiquenumerique(dot)com>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: alex(at)neteconomist(dot)com, "Greg Stark" <gsstark(at)mit(dot)edu>, "Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com>, "Andrei Bintintan" <klodoma(at)ar-sd(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-27 13:11:13
Message-ID: opsk9iwzf2th1vuj@musicbox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> for example,
> http://www.sai.msu.su/~megera/postgres/gist/code/7.3/README.intarray
> see OPERATIONS and EXAMPLE USAGE:
>

Thanks, I already know this documentation and have used intarray before
(I find it absolutely fabulous in the right application, it has a great
potential for getting out of tight situations which would involve huge
unmanageable pivot or attributes tables). Its only drawback is that the
gist index creation time is slow and sometimes just... takes forever until
the disk is full.
However, it seems that integer && integer[] does not exist :

> SELECT * FROM table WHERE id && int[]

explain analyze select * from temp t where id &&
( '{1,2,3,4,5,6,7,8,9,10,11,12}'::integer[] );
ERREUR: L'operateur n'existe pas : integer && integer[]
ASTUCE : Aucun operateur correspond au nom donne et aux types d'arguments.
Vous devez ajouter des conversions explicites de type.

I have already used this type of intarray indexes, but you have to create
a special gist index with the gist__int_ops on the column, and the column
has to be an array. In my case the column is just a SERIAL PRIMARY KEY,
and should stay this way, and I don't want to create a functional index in
array[id] just for this feature ; so I guess I can't use the && operator.
Am I mistaken ? My index is the standard btree here.

It would be nice if the =ANY() could use the index just like IN does ;
besides at planning time the length of the array is known which makes it
behave quite just like IN().

So I'll use either an EXECUTE'd plpgsql-generated query (IN (....)) ,
which I don't like because it's a kludge ; or this other solution which I
find more elegant :

CREATE OR REPLACE FUNCTION tools.array_srf( INTEGER[] )
RETURNS SETOF INTEGER RETURNS NULL ON NULL INPUT
LANGUAGE plpgsql AS
$$
DECLARE
_data ALIAS FOR $1;
_i INTEGER;
BEGIN
FOR _i IN 1..icount(_data) LOOP
RETURN NEXT _data[_i];
END LOOP;
RETURN;
END;
$$;

-----------------------------------------------------------------------------------
explain analyze select * from temp t where id
=ANY( '{1,2,3,4,5,6,7,8,9,10,11,12}' );
Seq Scan on "temp" t (cost=0.00..5165.52 rows=65536 width=8) (actual
time=0.030..173.319 rows=12 loops=1)
Filter: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12}'::integer[]))
Total runtime: 173.391 ms

-----------------------------------------------------------------------------------
explain analyze select * from temp t where id
IN( 1,2,3,4,5,6,7,8,9,10,11,12 );
Index Scan using temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey,
temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey, temp_pkey,
temp_pkey on "temp" t (cost=0.00..36.49 rows=12 width=8) (actual
time=0.046..0.137 rows=12 loops=1)
Index Cond: ((id = 1) OR (id = 2) OR (id = 3) OR (id = 4) OR (id = 5)
OR (id = 6) OR (id = 7) OR (id = 8) OR (id = 9) OR (id = 10) OR (id = 11)
OR (id = 12))
Total runtime: 0.292 ms

-----------------------------------------------------------------------------------
explain analyze select * from temp t where id in (select * from
tools.array_srf('{1,2,3,4,5,6,7,8,9,10,11,12}'));
Nested Loop (cost=15.00..620.20 rows=200 width=8) (actual
time=0.211..0.368 rows=12 loops=1)
-> HashAggregate (cost=15.00..15.00 rows=200 width=4) (actual
time=0.160..0.173 rows=12 loops=1)
-> Function Scan on array_srf (cost=0.00..12.50 rows=1000
width=4) (actual time=0.127..0.139 rows=12 loops=1)
-> Index Scan using temp_pkey on "temp" t (cost=0.00..3.01 rows=1
width=8) (actual time=0.010..0.012 rows=1 loops=12)
Index Cond: (t.id = "outer".array_srf)
Total runtime: 0.494 ms

-----------------------------------------------------------------------------------
explain analyze select * from temp t, (select * from
tools.array_srf('{1,2,3,4,5,6,7,8,9,10,11,12}')) foo where foo.array_srf =
t.id;

Merge Join (cost=62.33..2824.80 rows=1000 width=12) (actual
time=0.215..0.286 rows=12 loops=1)
Merge Cond: ("outer".id = "inner".array_srf)
-> Index Scan using temp_pkey on "temp" t (cost=0.00..2419.79
rows=131072 width=8) (actual time=0.032..0.056 rows=13 loops=1)
-> Sort (cost=62.33..64.83 rows=1000 width=4) (actual
time=0.169..0.173 rows=12 loops=1)
Sort Key: array_srf.array_srf
-> Function Scan on array_srf (cost=0.00..12.50 rows=1000
width=4) (actual time=0.127..0.139 rows=12 loops=1)
Total runtime: 0.391 ms

-----------------------------------------------------------------------------------
Note that the meaning is different ; the IN removes duplicates in the
array but the join does not.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Oleg Bartunov 2005-01-27 13:44:04 Re: [SQL] OFFSET impact on Performance???
Previous Message Greg Stark 2005-01-27 12:57:15 Re: [SQL] OFFSET impact on Performance???