Performant queries on table with many boolean columns

From: Rob Imig <rimig88(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performant queries on table with many boolean columns
Date: 2016-04-20 18:41:54
Message-ID: CANcrS5pR1P1Tj=e-RQQ=FF3WPAy_fyruS0YJer-+iJHxR1JAiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hey all,

New to the lists so please let me know if this isn't the right place for
this question.

I am trying to understand how to structure a table to allow for optimal
performance on retrieval. The data will not change frequently so you can
basically think of it as static and only concerned about optimizing reads
from basic SELECT...WHERE queries.

The data:

- ~20 million records
- Each record has 1 id and ~100 boolean properties
- Each boolean property has ~85% of the records as true

The retrieval will always be something like "SELECT id FROM <table> WHERE
<conditions>.

<conditions> will be some arbitrary set of the ~100 boolean columns and you
want the ids that match all of the conditions (true for each boolean
column). Example:
WHERE prop1 AND prop18 AND prop24

The obvious thing seems to make a table with ~100 columns, with 1 column
for each boolean property. Though, what type of indexing strategy would one
use on that table? Doesn't make sense to do BTREE. Is there a better way to
structure it?

Any and all advice/tips/questions appreciated!

Thanks,
Rob

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Teodor Sigaev 2016-04-20 18:54:54 Re: Performant queries on table with many boolean columns
Previous Message Tory M Blue 2016-04-20 04:01:27 Re: Clarification on using pg_upgrade