Quick Links

REVIEW: pg_stat_statements with query tree based normalization

From:	Kääriäinen Anssi <anssi(dot)kaariainen(at)thl(dot)fi>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc:	"peter(at)2ndquadrant(dot)com" <peter(at)2ndquadrant(dot)com>
Subject:	REVIEW: pg_stat_statements with query tree based normalization
Date:	2012-01-21 00:24:01
Message-ID:	BC19EF15D84DC143A22D6A8F2590F0A78A9846AA11@EXMAIL.stakes.fi
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

This is a short review of pg_stat_statements based on quick testing of the feature.

1. Installation: after managing to actually build PostgreSQL and contrib modules + changing
shared_preload_libraries to include pg_stat_statements I got this error:
FATAL: could not create shared memory segment: Invalid argument
DETAIL: Failed system call was shmget(key=5431001, size=34627584, 03600)

So, I needed to rise my SMH limits. I guess there isn't anything surprising or erroneous about
this but figured this is worth mentioning.

2. Usability:
- If you have two similarly named tables in different schemas, for example public.tbl and
some_schema.tbl these tables will get different entries in pg_stat_statements. However, the
table names are not schema-qualified, so it is impossible to see which table is which.

# select query, calls from pg_stat_statements where query like 'select%test%';
query | calls
---------------------+-------
select * from test; | 4
select * from test; | 2

# select * from tmp.test;
# select query, calls from pg_stat_statements where query like 'select%test%';
query | calls
---------------------+-------
select * from test; | 5
select * from test; | 2

# select * from test;
# select query, calls from pg_stat_statements where query like 'select%test%';
query | calls
---------------------+-------
select * from test; | 5
select * from test; | 3

- It would be nice from user perspective to transform "where id in (list of values)" to
"where id in(?)" always, regardless of the length of the list. Now "where id in (1, 2)" is
grouped to different pool than "where id in (1, 2, 3)".

3. I tried to run Django's test suite a few times and see if there would be any unexpected
behavior. Some results (note that I haven't tried to reproduce this error on master
without the patch):

test_django_testdb_default=# SELECT "aggregation_publisher"."id", "aggregation_publisher"."name", "aggregation_publisher"."num_awards", MIN("aggregation_book"."pubdate") AS "earliest_book" FROM "aggregation_publisher" LEFT OUTER JOIN "aggregation_book" ON ("aggregation_publisher"."id" = "aggregation_book"."publisher_id") GROUP BY "aggregation_publisher"."id", "aggregation_publisher"."name", "aggregation_publisher"."num_awards" HAVING NOT (MIN("aggregation_book"."pubdate") IS NULL) ORDER BY "earliest_book" ASC;

ERROR: unrecognized node type for havingclause node: 315
test_django_testdb_default=# \d aggregation_publisher
Table "public.aggregation_publisher"
Column | Type | Modifiers
------------+------------------------+--------------------------------------------------------------------
id | integer | not null default nextval('aggregation_publisher_id_seq'::regclass)
name | character varying(255) | not null
num_awards | integer | not null
Indexes:
"aggregation_publisher_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "aggregation_book" CONSTRAINT "aggregation_book_publisher_id_fkey" FOREIGN KEY (publisher_id) REFERENCES aggregation_publisher(id) DEFERRABLE INITIALLY DEFERRED

The time used for insert statements seems suspiciously low. Maybe PostgreSQL is just faster than I thought :)

query | INSERT INTO "django_content_type" ("id", "name", "app_label", "model") VALUES (?, ?, ?, ?)
calls | 5490
total_time | 0.823119000000003

Multi-values inserts do not seem to be normalized:
query | INSERT INTO "custom_pk_business_employees" ("business_id", "employee_id") VALUES ('Sears', 456), ('Sears', 123)
calls | 1256
total_time | 0.619693

I did not see any noticeable difference in runtimes with pg_stat_statements installed or uninstalled (as extension).
Not tested on master without the patch at all.

Overall the feature seems to be really useful.

- Anssi

Responses

Re: REVIEW: pg_stat_statements with query tree based normalization at 2012-01-21 02:16:34 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Noah Misch	2012-01-21 01:33:30	Re: Measuring relation free space
Previous Message	Jaime Casanova	2012-01-21 00:03:22	Re: Measuring relation free space