From: | Marco Colli <collimarco91(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Michael Lewis <mlewis(at)entrata(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: Slow "not in array" operation |
Date: | 2019-11-13 11:56:22 |
Message-ID: | CAFvCgN6cqaF28M3RWFKQ__pBn-igMFmZ6-gSwjnNGGeCJHWLeA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
> the answer is that is because it is a GIN index. Make the same index only
as btree, and you should get good performance as it can filter the tags
within a given project without visiting the table.
Currently I have this GIN index:
"index_subscriptions_on_project_id_and_tags" gin (project_id, tags)
WHERE trashed_at IS NULL
It uses the btree_gin extension and works perfectly for tag search, except
for the "NOT" operator. I don't understand why it doesn't use the GIN index
also for the "NOT" operator.
The problem is that I cannot create the same index with BTree, because PG
doesn't support BTree on array :(
On Wed, Nov 13, 2019 at 12:30 PM Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Wed, Nov 13, 2019 at 4:20 AM Marco Colli <collimarco91(at)gmail(dot)com>
> wrote:
>
>> Replying to the previous questions:
>> - work_mem = 64MB (there are hundreds of connections)
>> - the project 123 has more than 7M records, and those that don't have the
>> tag 'en' are 4.8M
>>
>>
>>> What was the plan for the one that took 500ms?
>>
>>
>> This is the query / plan without the filter on tags:
>>
>> SELECT COUNT(*) FROM "subscriptions" WHERE "subscriptions"."project_id" =
>> 123 AND "subscriptions"."trashed_at" IS NULL;
>>
>> QUERY PLAN
>>
>>
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Finalize Aggregate (cost=291342.67..291342.68 rows=1 width=8) (actual
>> time=354.556..354.556 rows=1 loops=1)
>> -> Gather (cost=291342.05..291342.66 rows=6 width=8) (actual
>> time=354.495..374.305 rows=7 loops=1)
>> Workers Planned: 6
>> Workers Launched: 6
>> -> Partial Aggregate (cost=290342.05..290342.06 rows=1
>> width=8) (actual time=349.799..349.799 rows=1 loops=7)
>> -> Parallel Index Only Scan using
>> index_subscriptions_on_project_id_and_uid on subscriptions
>> (cost=0.56..287610.27 rows=1092713 width=0) (actual time=0.083..273.018
>> rows=1030593 loops=7)
>> Index Cond: (project_id = 123)
>> Heap Fetches: 280849
>> Planning Time: 0.753 ms
>> Execution Time: 374.483 ms
>> (10 rows)
>>
>
> My previous comment about the bitmap index scan taking half the time was a
> slip of the eye, I was comparing *cost* of the bitmap index scan to the
> *time* of the overall plan. But then the question is, why isn't it doing
> an index-only scan on "index_subscriptions_on_project_id_and_tags"? And
> the answer is that is because it is a GIN index. Make the same index only
> as btree, and you should get good performance as it can filter the tags
> within a given project without visiting the table.
>
> Cheers,
>
> Jeff
>
>>
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2019-11-13 12:18:21 | Re: Slow "not in array" operation |
Previous Message | Jeff Janes | 2019-11-13 11:30:10 | Re: Slow "not in array" operation |