From: | Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> |
---|---|
To: | Amir Rohan <amir(dot)rohan(at)mail(dot)com> |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL www <pgsql-www(at)postgresql(dot)org>, magnus(at)hagander(dot)net, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Subject: | Re: No easy way to join discussion in existing thread when not subscribed |
Date: | 2015-09-30 06:53:28 |
Message-ID: | 560B86E8.4020600@kaltenbrunner.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-www |
On 09/30/2015 03:27 AM, Amir Rohan wrote:
> On 09/29/2015 10:51 PM, Stefan Kaltenbrunner wrote:
>> On 09/29/2015 09:34 PM, Amir Rohan wrote:
>>
>> for most accesses to the archives the string for the basic auth reply
>> quotes the "archives" and "password" strings with ' - see
>
> Fixed.
I think you missed at least one spot in the code you added and also at
least one occurance in existing code.
>
>> we have a number of current issues where data in the archives gets
>> mangled/corrupted we are looking into. We are currently working on some
>> infrastructure to "test" parsing fixes across all the messages in the
>> archives to get a better understanding of what kind effect a change has.
>> For this specific message I'm curious of how you found it though?
>>
>
> I made a prototype before looking at the repo, using
> python's 'mailbox' parser module, and some asserts failed
> when some messages parsed out as lacking Message-ID. I had
> also read the mbox spec in order to write the patch, and
> put the two together.
ah - nice effort!
>
>>>> <...>
>>>> Have you done any (approximate) measurements on what the additional
>>>> in-memory overhead in both pg (to build the response) and in django is
>>>> compared to the resulting mbox?
>>>>
>>>>> Amir Wrote:
>>>>> <some napkins and mitigations>
>> My concern mostly stems from operational
>> experience(on the sysadmin team) that some operations on the archives
>> currently are fairly computational and memory intensive causing issues
>> with availability and we would want to not add more vectors that can
>> cause that.
>>
>
> You're right to be concerned, I raised the issue myself to begin with.
> We can solve any particular problem, but how to optimize depends too
> much on particulars I don't have.
>
> If you have both cpu and memory shortage, we could trade storage.
> You already serve monthly mbox's, having per thread mboxes which are
> updated in batch (say hourly) could be managable, and that code
> is practically written already. Serving static is as cheap as it gets
> on noth cpu and memory.
yeah that is what I was thinking - though I dont think we want hourly.
Went went a long way to actually get the current system to be "almost
instant" in terms of having the archives in sync with the lists(at least
for the basic stuff). What I was thinking is doing the mbox creating
during the import - we already serialize the process (on the MTA/LDA
side) there to have only one message imported concurrently so there is
way less risk of overwhelming the box.
>
> But for now, see attached patch, which adds a tweakable for setting a
> cap on the max size of the response. It still gets everything
> from the database at once, so it may not be of much help except
> perhaps as a metric for you to easily monitor.
>
> There's also an EJECT button that turns all thread mbox requests into
> 403, so you can just throw this in production and flip the switch
> if a problem appears. Also fixes the quoting in the message.
thanks for the updated patch - will take a look and see whether I can
find out what the worst case is in the archives later today.
Stefan
From | Date | Subject | |
---|---|---|---|
Next Message | Amir Rohan | 2015-09-30 06:55:21 | Re: SEO for documentation |
Previous Message | Amir Rohan | 2015-09-30 01:43:22 | Re: No easy way to join discussion in existing thread when not subscribed |