Re: Scaling 10 million records in PostgreSQL table

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Navaneethan R <nava(at)gridlex(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Scaling 10 million records in PostgreSQL table
Date: 2012-10-09 21:49:04
Message-ID: CAAZKuFaiCn6qy7810aZ5r8Dp8sojquZGw=pG5yffruMewwDV2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Oct 8, 2012 at 1:25 PM, Navaneethan R <nava(at)gridlex(dot)com> wrote:
> On Tuesday, October 9, 2012 1:40:08 AM UTC+5:30, Steve Crawford wrote:
>> On 10/08/2012 08:26 AM, Navaneethan R wrote:
>>
>> > Hi all,
>>
>> >
>>
>> > I have 10 million records in my postgres table.I am running the database in amazon ec2 medium instance. I need to access the last week data from the table.
>>
>> > It takes huge time to process the simple query.So, i throws time out exception error.
>>
>> >
>>
>> > query is :
>>
>> > select count(*) from dealer_vehicle_details where modified_on between '2012-10-01' and '2012-10-08' and dealer_id=270001;
>>
>> >
>>
>> > After a lot of time it responds 1184 as count
>>
>> >
>>
>> > what are the ways i have to follow to increase the performance of this query?
>>
>> >
>>
>> > The insertion also going parallel since the daily realtime updation.
>>
>> >
>>
>> > what could be the reason exactly for this lacking performace?
>>
>> >
>>
>> >
>>
>> What version of PostgreSQL? You can use "select version();" and note
>>
>> that 9.2 has index-only scans which can result in a substantial
>>
>> performance boost for queries of this type.
>>
>>
>>
>> What is the structure of your table? You can use "\d+
>>
>> dealer_vehicle_details" in psql.
>>
>>
>>
>> Have you tuned PostgreSQL in any way? If so, what?
>>
>>
>>
>> Cheers,
>>
>> Steve
>>
>>
>>
>>
>>
>> --
>>
>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>>
>> To make changes to your subscription:
>>
>> http://www.postgresql.org/mailpref/pgsql-performance
>
>
> version():
>
> PostgreSQL 8.4.8 on i686-pc-linux-gnu, compiled by GCC gcc-4.5.real (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 32-bit
>
> Desc:
> Table "public.dealer_vehicle_details"
> Column | Type | Modifiers | Storage | Description
> ----------------+--------------------------+-------------------------------------------------------------------------+---------+-------------
> id | integer | not null default nextval('dealer_vehicle_details_new_id_seq'::regclass) | plain |
> vin_id | integer | not null | plain |
> vin_details_id | integer | | plain |
> price | integer | | plain |
> mileage | double precision | | plain |
> dealer_id | integer | not null | plain |
> created_on | timestamp with time zone | not null | plain |
> modified_on | timestamp with time zone | not null | plain |
> Indexes:
> "dealer_vehicle_details_pkey" PRIMARY KEY, btree (id)
> "idx_dealer_sites_id" UNIQUE, btree (id) WHERE dealer_id = 270001
> "idx_dealer_sites_id_526889" UNIQUE, btree (id) WHERE dealer_id = 526889
> "idx_dealer_sites_id_9765" UNIQUE, btree (id, vin_id) WHERE dealer_id = 9765
> "idx_dealer_sites_id_9765_all" UNIQUE, btree (id, vin_id, price, mileage, modified_on, created_on, vin_details_id) WHERE dealer_id = 9765
> "mileage_idx" btree (mileage)
> "price_idx" btree (price)
> "vehiclecre_idx" btree (created_on)
> "vehicleid_idx" btree (id)
> "vehiclemod_idx" btree (modified_on)
> "vin_details_id_idx" btree (vin_details_id)
> "vin_id_idx" btree (vin_id)
> Foreign-key constraints:
> "dealer_vehicle_master_dealer_id_fkey" FOREIGN KEY (dealer_id) REFERENCES dealer_dealer_master(id) DEFERRABLE INITIALLY DEFERRED
> "dealer_vehicle_master_vehicle_id_fkey" FOREIGN KEY (vin_id) REFERENCES dealer_vehicle(id) DEFERRABLE INITIALLY DEFERRED
> "dealer_vehicle_master_vin_details_id_fkey" FOREIGN KEY (vin_details_id) REFERENCES vin_lookup_table(id) DEFERRABLE INITIALLY DEFERRED
> Has OIDs: no
>
>
> After created the index for WHERE clause "WHERE dealer_id = 270001"..It is performing better.I have more dealer ids Should I do it for each dealer_id?

You seem to have created a partial index. Normally, that's not what
you want. You just want an index on the field "dealer_id", without
the conditional index. Conditional indexes are useful when you have a
lot of queries with the same WHERE clause entry, such as "WHERE
deleted_at IS NULL" or whatnot where most of the table has been
soft-deleted.

Here's a recent blog post discussing the topic that doesn't presume a
lot of familiarity with database performance, geared towards
application developers writing OLTP applications, which this seems
like one of:

http://www.craigkerstiens.com/2012/10/01/understanding-postgres-performance/

--
fdr

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2012-10-09 22:21:31 Why am I getting great/terrible estimates with these CTE queries?
Previous Message Samuel Gendler 2012-10-09 21:47:07 Re: Scaling 10 million records in PostgreSQL table