Re: GIN index always doing Re-check condition, postgres 9.1

From: Andrey Osenenko <andrey(dot)osenenko(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: GIN index always doing Re-check condition, postgres 9.1
Date: 2015-11-02 08:19:22
Message-ID: CAEB3-FhUJQDrv9oSH=f7ygXPLL7zPMK+TQiaKj0iM1zbfqdPOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thank you.

That's really sad news. This means that even though there is an index that
lets you find rows you want almost immediately, to retrieve primary keys,
you still have to do a lot of disk io.

I created a new table that contains only primary key and tsvector value,
and (at least that's how I'm interpreting it) since there is less data to
read per row, it returns same results about 2 times as quickly (I restarted
computer to make sure nothing is in memory).

Original table:
Bitmap Heap Scan on fulldata (cost=266.79..39162.57 rows=23069 width=8)
(actual time=113.472..18368.769 rows=23132 loops=1)
Recheck Cond: (to_tsvector('russian'::regconfig, plaintext) @@
'''москв'''::tsquery)
Buffers: shared hit=1 read=13114
-> Bitmap Index Scan on fulldata_plaintext_idx (cost=0.00..261.02
rows=23069 width=0) (actual time=90.859..90.859 rows=23132 loops=1)
Index Cond: (to_tsvector('russian'::regconfig, plaintext) @@
'''москв'''::tsquery)
Buffers: shared hit=1 read=23
Total runtime: 18425.265 ms

Table with only key and vector:
Bitmap Heap Scan on fts (cost=273.67..27903.61 rows=23441 width=8) (actual
time=219.896..10095.159 rows=23132 loops=1)
Recheck Cond: (vector @@ '''москв'''::tsquery)
Buffers: shared hit=1 read=10877
-> Bitmap Index Scan on fts_vector_idx (cost=0.00..267.81 rows=23441
width=0) (actual time=204.631..204.631 rows=23132 loops=1)
Index Cond: (vector @@ '''москв'''::tsquery)
Buffers: shared hit=1 read=23
Total runtime: 10103.858 ms

It also looks like if there was a way to create a table with just primary
key and add an index to it that indexes data from another table, it would
work much, much faster since there would be very little to read from disk
after index lookup. But looks like there isn't.

So am I correct in assumption that as the amount of rows grows, query times
for rows that are not in memory (and considering how many of them there
are, most won't be) will grow linearly?

On Mon, Nov 2, 2015 at 11:14 AM, Andrey Osenenko <andrey(dot)osenenko(at)gmail(dot)com>
wrote:

> Thank you.
>
> That's really sad news. This means that even though there is an index that
> lets you find rows you want almost immediately, to retrieve primary keys,
> you still have to do a lot of disk io.
>
> I created a new table that contains only primary key and tsvector value,
> and (at least that's how I'm interpreting it) since there is less data to
> read per row, it returns same results about 2 times as quickly (I restarted
> computer to make sure nothing is in memory).
>
> Original table:
> Bitmap Heap Scan on fulldata (cost=266.79..39162.57 rows=23069 width=8)
> (actual time=113.472..18368.769 rows=23132 loops=1)
> Recheck Cond: (to_tsvector('russian'::regconfig, plaintext) @@
> '''москв'''::tsquery)
> Buffers: shared hit=1 read=13114
> -> Bitmap Index Scan on fulldata_plaintext_idx (cost=0.00..261.02
> rows=23069 width=0) (actual time=90.859..90.859 rows=23132 loops=1)
> Index Cond: (to_tsvector('russian'::regconfig, plaintext) @@
> '''москв'''::tsquery)
> Buffers: shared hit=1 read=23
> Total runtime: 18425.265 ms
>
> Table with only key and vector:
> Bitmap Heap Scan on fts (cost=273.67..27903.61 rows=23441 width=8)
> (actual time=219.896..10095.159 rows=23132 loops=1)
> Recheck Cond: (vector @@ '''москв'''::tsquery)
> Buffers: shared hit=1 read=10877
> -> Bitmap Index Scan on fts_vector_idx (cost=0.00..267.81 rows=23441
> width=0) (actual time=204.631..204.631 rows=23132 loops=1)
> Index Cond: (vector @@ '''москв'''::tsquery)
> Buffers: shared hit=1 read=23
> Total runtime: 10103.858 ms
>
> It also looks like if there was a way to create a table with just primary
> key and add an index to it that indexes data from another table, it would
> work much, much faster since there would be very little to read from disk
> after index lookup. But looks like there isn't.
>
> So am I correct in assumption that as the amount of rows grows, query
> times for rows that are not in memory (and considering how many of them
> there are, most won't be) will grow linearly?
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Guido Niewerth 2015-11-02 10:15:35 Slow query in trigger function
Previous Message Jeff Janes 2015-11-02 07:22:47 Re: GIN index always doing Re-check condition, postgres 9.1