Re: From Simple to Complex

From: Alessandro Gagliardi <alessandro(at)path(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: From Simple to Complex
Date: 2012-01-31 22:53:46
Message-ID: CAAB3BBJwKV2UpzHFfXXMxLAfOx4QiCpJ3r1QE0BpQeLSeZ7PjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I changed the query a bit so the results would not change over the
course of the day to:

SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
JOIN emotions USING (moment_id)
WHERE moments.inserted BETWEEN 'yesterday' AND 'today' AND
moments.tableoid = pg_class.oid
GROUP BY relname, emotion ORDER BY relname, emotion;

Adding the indices means that I am now doing index scans instead of
seq scans but it doesn't seem to help with speed. Here are the new
EXPLAIN ANALYZE results:

"Sort (cost=174432.85..174433.58 rows=1460 width=94) (actual
time=73440.079..73440.088 rows=25 loops=1)"
" Sort Key: pg_class.relname, emotions.emotion"
" Sort Method: quicksort Memory: 20kB"
" -> HashAggregate (cost=174412.39..174417.50 rows=1460 width=94)
(actual time=73437.905..73437.940 rows=25 loops=1)"
" -> Merge Join (cost=27888.98..172032.86 rows=1586355
width=94) (actual time=65563.027..72763.848 rows=245917 loops=1)"
" Merge Cond: (emotions.moment_id = public.moments.moment_id)"
" -> Index Scan using emotions_moment_id_idx on emotions
(cost=0.00..135759.78 rows=4077358 width=55) (actual
time=1.283..43894.799 rows=3841095 loops=1)"
" -> Sort (cost=27888.98..28083.07 rows=388184
width=89) (actual time=16556.348..17384.537 rows=521025 loops=1)"
" Sort Key: public.moments.moment_id"
" Sort Method: quicksort Memory: 60865kB"
" -> Hash Join (cost=9.90..20681.81 rows=388184
width=89) (actual time=2.612..4309.131 rows=396594 loops=1)"
" Hash Cond: (public.moments.tableoid = pg_class.oid)"
" -> Append (cost=0.00..19216.22
rows=388184 width=29) (actual time=2.066..2851.885 rows=396594
loops=1)"
" -> Seq Scan on moments
(cost=0.00..0.00 rows=1 width=104) (actual time=0.002..0.002 rows=0
loops=1)"
" Filter: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
thoughts_inserted_idx on thoughts moments (cost=0.00..6146.96
rows=136903 width=29) (actual time=2.063..606.584 rows=130884
loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
photos_inserted_idx on photos moments (cost=0.00..4975.46 rows=109900
width=29) (actual time=1.542..836.063 rows=128286 loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
music_inserted_idx on music moments (cost=0.00..3102.69 rows=40775
width=29) (actual time=0.756..308.031 rows=41176 loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
people_inserted_idx on people moments (cost=0.00..4.07 rows=1
width=29) (actual time=0.015..0.015 rows=0 loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
places_inserted_idx on places moments (cost=0.00..4125.65 rows=96348
width=29) (actual time=0.066..263.853 rows=92756 loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Bitmap Heap Scan on videos
moments (cost=29.56..835.20 rows=3660 width=29) (actual
time=3.122..87.889 rows=3492 loops=1)"
" Recheck Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Bitmap Index Scan on
videos_inserted_idx (cost=0.00..29.37 rows=3660 width=0) (actual
time=0.696..0.696 rows=3492 loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Seq Scan on facebook_people
moments (cost=0.00..1.04 rows=1 width=104) (actual time=0.040..0.040
rows=0 loops=1)"
" Filter: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
address_people_inserted_idx on address_people moments
(cost=0.00..4.06 rows=1 width=29) (actual time=0.017..0.017 rows=0
loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
path_people_inserted_idx on path_people moments (cost=0.00..17.03
rows=593 width=29) (actual time=1.758..1.758 rows=0 loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Index Scan using
sleep_inserted_idx on sleep moments (cost=0.00..4.06 rows=1 width=29)
(actual time=0.012..0.012 rows=0 loops=1)"
" Index Cond: ((inserted >=
'2012-01-30 00:00:00'::timestamp without time zone) AND (inserted <=
'2012-01-31 00:00:00'::timestamp without time zone))"
" -> Hash (cost=8.88..8.88 rows=292
width=68) (actual time=0.520..0.520 rows=334 loops=1)"
" Buckets: 1024 Batches: 1 Memory Usage: 29kB"
" -> Seq Scan on pg_class
(cost=0.00..8.88 rows=292 width=68) (actual time=0.007..0.257 rows=334
loops=1)"
"Total runtime: 73511.072 ms"

Please let me know if there is any way to make this more efficient.

Thank you,
-Alessandro

On Tue, Jan 31, 2012 at 2:10 PM, Alessandro Gagliardi
<alessandro(at)path(dot)com> wrote:
>
> Looks like I missed a key sentence in http://www.postgresql.org/docs/9.0/static/ddl-inherit.html which states: "A serious limitation of the inheritance feature is that indexes (including unique constraints) and foreign key constraints only apply to single tables, not to their inheritance children."
> I should have realized that as I exploited that "limitation" in three of my tables. Gradually adding those indices now; will report on what kind of difference it makes....
>
> On Tue, Jan 31, 2012 at 1:22 PM, Alessandro Gagliardi <alessandro(at)path(dot)com> wrote:
>>
>> My slow query today is somewhat more complex than yesterday's, but I'm
>> hopeful it can be improved. Here's the query:
>>
>> SELECT relname, emotion, COUNT(feedback_id) FROM pg_class, moments
>> JOIN emotions USING (moment_id)
>> WHERE moments.inserted > 'today' AND moments.tableoid = pg_class.oid
>> GROUP BY relname, emotion ORDER BY relname, emotion;
>>
>> As you'll see below, moments is inherited by a number of other tables
>> and the purpose of relname is to see which one. Meanwhile, emotions
>> inherits feedback.
>>
>> Here's the Full Table and Index Schema:
>>
>> CREATE TABLE moments
>> (
>>   moment_id character(24) NOT NULL DEFAULT to_char(now(), 'JHH24MISSUS'::text),
>>   block_id character(24) NOT NULL,
>>   inserted timestamp without time zone NOT NULL DEFAULT now(),
>>   CONSTRAINT moments_pkey PRIMARY KEY (moment_id )
>> )
>> WITH (
>>   OIDS=FALSE
>> );
>>
>> CREATE INDEX moments_block_id_idx
>>   ON moments
>>   USING btree
>>   (block_id );
>>
>> CREATE INDEX moments_inserted_idx
>>   ON moments
>>   USING btree
>>   (inserted );
>>
>> CREATE TABLE feedback
>> (
>>  feedback_id character(24) NOT NULL,
>>  user_id character(24) NOT NULL,
>>  moment_id character(24) NOT NULL,
>>  created timestamp without time zone,
>>  inserted timestamp without time zone NOT NULL DEFAULT now(),
>>  lnglat point,
>>  CONSTRAINT feedback_pkey PRIMARY KEY (feedback_id )
>> )
>> WITH (
>>  OIDS=FALSE
>> );
>>
>> CREATE INDEX feedback_lnglat_idx
>>  ON feedback
>>  USING gist
>>  (lnglat );
>>
>> CREATE INDEX feedback_moment_id_idx
>>  ON feedback
>>  USING btree
>>  (moment_id );
>>
>> CREATE TABLE emotions
>> (
>> -- Inherited from table feedback:  feedback_id character(24) NOT NULL,
>> -- Inherited from table feedback:  user_id character(24) NOT NULL,
>> -- Inherited from table feedback:  moment_id character(24) NOT NULL,
>> -- Inherited from table feedback:  created timestamp without time zone,
>> -- Inherited from table feedback:  inserted timestamp without time
>> zone NOT NULL DEFAULT now(),
>>  emotion character varying NOT NULL,
>> -- Inherited from table :  lnglat point,
>>  CONSTRAINT emotions_pkey PRIMARY KEY (feedback_id )
>> )
>> INHERITS (feedback)
>> WITH (
>>  OIDS=FALSE
>> );
>>
>> CREATE INDEX emotions_emotion_idx
>>  ON emotions
>>  USING btree
>>  (emotion );
>>
>> Here's the results from EXPLAIN ANALYZE:
>>
>> "Sort  (cost=309717.70..309718.43 rows=1460 width=94) (actual
>> time=60462.534..60462.544 rows=25 loops=1)"
>> "  Sort Key: pg_class.relname, emotions.emotion"
>> "  Sort Method:  quicksort  Memory: 20kB"
>> "  ->  HashAggregate  (cost=309697.24..309702.35 rows=1460 width=94)
>> (actual time=60462.457..60462.476 rows=25 loops=1)"
>> "        ->  Hash Join  (cost=133154.62..308963.70 rows=489024
>> width=94) (actual time=26910.488..60031.589 rows=194642 loops=1)"
>> "              Hash Cond: (public.moments.tableoid = pg_class.oid)"
>> "              ->  Hash Join  (cost=133144.72..307119.96 rows=489024
>> width=34) (actual time=26909.984..59434.137 rows=194642 loops=1)"
>> "                    Hash Cond: (public.moments.moment_id = emotions.moment_id)"
>> "                    ->  Append  (cost=0.00..114981.64 rows=119665
>> width=29) (actual time=883.153..21696.939 rows=357565 loops=1)"
>> "                          ->  Seq Scan on moments  (cost=0.00..0.00
>> rows=1 width=104) (actual time=0.000..0.000 rows=0 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on thoughts moments
>> (cost=0.00..38856.88 rows=44388 width=29) (actual
>> time=883.150..9040.959 rows=115436 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on photos moments
>> (cost=0.00..29635.78 rows=194 width=29) (actual
>> time=5329.700..5827.447 rows=116420 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on music moments
>> (cost=0.00..9371.88 rows=19070 width=29) (actual time=354.147..383.266
>> rows=37248 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on people moments
>> (cost=0.00..5945.26 rows=27 width=29) (actual time=185.393..185.393
>> rows=0 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on places moments
>> (cost=0.00..24551.03 rows=54961 width=29) (actual
>> time=5224.044..5324.517 rows=85564 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on videos moments
>> (cost=0.00..981.31 rows=734 width=29) (actual time=21.075..28.735
>> rows=2897 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on facebook_people moments
>> (cost=0.00..10.84 rows=80 width=104) (actual time=0.001..0.001 rows=0
>> loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on address_people moments
>> (cost=0.00..10.84 rows=80 width=104) (actual time=0.005..0.005 rows=0
>> loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on path_people moments
>> (cost=0.00..5606.79 rows=30 width=29) (actual time=211.166..211.166
>> rows=0 loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                          ->  Seq Scan on sleep moments
>> (cost=0.00..11.05 rows=100 width=104) (actual time=0.002..0.002 rows=0
>> loops=1)"
>> "                                Filter: (inserted > '2012-01-31
>> 00:00:00'::timestamp without time zone)"
>> "                    ->  Hash  (cost=79292.49..79292.49 rows=4059496
>> width=55) (actual time=25757.998..25757.998 rows=4058642 loops=1)"
>> "                          Buckets: 262144  Batches: 4  Memory Usage: 75211kB"
>> "                          ->  Seq Scan on emotions
>> (cost=0.00..79292.49 rows=4059496 width=55) (actual
>> time=0.012..15969.981 rows=4058642 loops=1)"
>> "              ->  Hash  (cost=8.88..8.88 rows=292 width=68) (actual
>> time=0.487..0.487 rows=319 loops=1)"
>> "                    Buckets: 1024  Batches: 1  Memory Usage: 28kB"
>> "                    ->  Seq Scan on pg_class  (cost=0.00..8.88
>> rows=292 width=68) (actual time=0.013..0.234 rows=319 loops=1)"
>> "Total runtime: 60601.612 ms"
>>
>> Postgres version: is still 9.0.5
>>
>> History: N/A (This is the first time I've run this query.)
>>
>> Hardware: 1.7 GB Cache and other things you'd expect from a Ronin
>> instance of a Heroku Postgres database.
>>
>> Maintenance Setup: What Heroku does. As before, vacuum should not be
>> relevant as there are no deletes or even updates (just inserts and
>> selects)
>>
>> WAL Configuration: I still don't know. Heroku hosts the database on
>> Amazon's servers, so maybe that answers the question?
>>
>> GUC Settings: As per the yesterday's discussion, I reduced
>> random_page_cost to 2. Other than that, it's all default.
>>
>> Bonus question: If that was too simple, here's something even more
>> complex I'd like to do: I have another table that inherits feedback
>> called "comments". Ideally, rather than an "emotion" column coming
>> out, I would like to have a "feedback_type" column that would be
>> either the value in the emotion column of the emotions table, or
>> "comment" if it's from the comments table. I'm thinking I'm going to
>> have to simply put that together on the client, but if I can do that
>> in a single query (that doesn't take an hour to run) that would be
>> super cool. But that's definitely secondary compared to getting the
>> above query to run faster.
>>
>> Thank you very much for any help!
>> -Alessandro Gagliardi
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Alessandro Gagliardi 2012-01-31 23:43:10 Re: From Simple to Complex
Previous Message Alessandro Gagliardi 2012-01-31 22:10:11 Re: From Simple to Complex