Re: OT: Data structure design question: How do they count

From: Richard Huxton <dev(at)archonet(dot)com>
To: Brendan Duddridge <brendan(at)clickspace(dot)com>
Cc: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: OT: Data structure design question: How do they count
Date: 2006-04-10 09:23:48
Message-ID: 443A2424.4020702@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Brendan Duddridge wrote:
>
> Now, initially I thought they would just pre-compute these counts, but
> the problem is, when you click on any of the above attribute values,
> they reduce the remaining possible set of matching products (and set of
> possible remaining attributes and attribute values) by the amount
> displayed next to the attribute value selected. You can click on any
> combination of attribute values to filter down the remaining set of
> matching products, so there's a large combination of paths you can take
> to arrive at a set of products you might be interested in.
>
> Do you think they are pre-computed? Or do you think they might use a
> query similar to the following?:

Pre-computed almost certainly, but at what level of granularity? And
with application-level caching?

> select pav.attribute_value_id, count(p.product_id)
> from product_attribute_value pav,
> attribute a,
> product p
> where a.attribute_id in (some set of attribute ids) and
> pav.product_id = p.product_id and
> pav.attribute_id = a.attribute_id and p.product_id in
> (select product_id
> from category_product
> where category_id = some category id) and
> p.is_active = 'true'
> group by pav.attribute_value_id;
>
> It would seem to me that although the above query suggests a normalized
> database structure, that joining with 3 tables plus a 4th table in the
> sub-query with an IN qualifier and grouping to get the product counts
> would take a VERY long time, especially on a possible result set of
> 1,260,658 products.

Hmm - I'm not sure I'd say this was necessarily normalised. In the
example you gave there were three definite types of attribute:
1. Price range (< 20, 20-50, ...)
2. Product type (lighting, rugs, ...)
3. Store (art.com, homeannex, ...)
Your example discards this type information.

I'm also not sure it lets store A sell widgets for 19.99 and B for 25.99

So - let's look at how we might break this down into simple relations:
product_types (product_id, prod_type, prod_subtype)
product_availability (product_id, store_id, price_range)
and so on for each set of parameters.

Then, if PG isn't calculating fast enough I'd be tempted to throw in a
summary table:
product_counts(store_id, price_range, prod_type, prod_subtype, ...,
num_products)
Then total over this for the top-level queries.

I'd also cache common top-level queries at the applicaton level anyway.

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Simon Riggs 2006-04-10 09:26:49 Re: pg 8.1.3, AIX, huge box, painfully slow.
Previous Message Richard Huxton 2006-04-10 09:06:41 Re: pg 8.1.3, AIX, huge box, painfully slow.