Re: Index with all necessary columns - Postgres vs MSSQL

From: Gudmundur Johannesson <gudmundur(dot)johannesson(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Index with all necessary columns - Postgres vs MSSQL
Date: 2012-02-01 18:50:09
Message-ID: CAHNvtn_T2A46Yi6jVHt9SvGjTpZ6F6_bWvuduWkoohAnD1EMPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

Here are the answers to your questions:
1) I change the select statement so I am refering to 1 day at a time. In
that case the response time is similar. Basically, the data is not in
cache when I do that and the response time is about 23 seconds.

2) The list of IDs is provided by the middle layer and represents a logical
group.
btw: There are about 360 devices there. The distribution of dtStamp is
approx 200.000.000 rows / 360 devices / (4 months) which gives approx 4600
dtStamp values per device per day.

3) The query takes 23 sec vs 1 sec or lower in mssql.

We never update/delete and therefore the data is alway correct in the index
(never dirty). Therefore, Postgres could have used the data in it.

I started to add columns into indexes in Oracle for approx 15 years ago and
it was a brilliant discovery. This looks like a show stopper for me but I
will

Thanks,
- Gummi

On Wed, Feb 1, 2012 at 5:52 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Wed, Feb 1, 2012 at 11:10 AM, Gudmundur Johannesson
> <gudmundur(dot)johannesson(at)gmail(dot)com> wrote:
> > Hi,
> >
> > I have a table in Postgres like:
> > CREATE TABLE test
> > (
> > id integer,
> > dtstamp timestamp without time zone,
> > rating real
> > )
> > CREATE INDEX test_all
> > ON test
> > USING btree
> > (id , dtstamp , rating);
> >
> > My db has around 200M rows and I have reduced my test select statement
> down
> > to:
> > SELECT count(1) FROM test
> > WHERE id in (58,83,88,98,124,141,170,195,202,252,265,293,305,331,348)
> > AND dtstamp between cast('2011-10-19 08:00:00' as timestamp) and
> > cast('2011-10-19 16:00:00' as timestamp)
> >
> > In Postgres this takes about 23 sec.
> > In MSSQL this takes about 1 sec.
> >
> > MSSQL only accesses the index and does not access the table it self (uses
> > only index scan)
> >
> > Postgres has the following plan:
> > "Aggregate (cost=130926.24..130926.25 rows=1 width=0)"
> > " -> Bitmap Heap Scan on test (cost=1298.97..130832.92 rows=37330
> > width=0)"
> > " Recheck Cond: ((id = ANY
> > ('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[]))
> > AND (dtstamp >= '2011-10-19 08:00:00'::timestamp without time zone) AND
> > (dtstamp <= '2011-10-19 16:00:00'::timestamp without time zone))"
> > " -> Bitmap Index Scan on test_all (cost=0.00..1289.64
> rows=37330
> > width=0)"
> > " Index Cond: ((id = ANY
> > ('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[]))
> > AND (dtstamp >= '2011-10-19 08:00:00'::timestamp without time zone) AND
> > (dtstamp <= '2011-10-19 16:00:00'::timestamp without time zone))"
> >
> > The results are disappointing since I want to switch to Postgres but I
> have
> > not been able to force Postgres to only use the index :-(
> >
> > Any hints that may lead me back on track?
>
> *) are the times in postgres stable across calls?
> *) where is the 'id list' coming from?
> *) how long does this query take?
>
> SELECT count(1) FROM test WHERE id = 202 AND AND dtstamp between
> '2011-10-19 08:00:00'::timestamp and '2011-10-19
> 16:00:00'::timestamp; ?
>
> The feature you're looking for in postgres is called 'index only
> scans' and an 9.2 will contain an implementation of that feature (see:
> http://rhaas.blogspot.com/2011/10/index-only-scans-weve-got-em.html).
>
> merlin
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2012-02-01 18:53:06 Re: From Simple to Complex
Previous Message Alessandro Gagliardi 2012-02-01 18:48:33 Re: From Simple to Complex