Mailing List Archive

optimizing publishing
Hi!
I'm trying to otpimize publishing for speed, now. Using
bric_queued leaves apache for UI, which is good. There are
left two main processes working on publish: perl (bric_queued) and
postgres. Interestingly, the two grabs all 100% of one CPU, while
leaving the other lose. There are two in my box.
System time is very low: 3.3%.
Seems quite optimal. Are there any better practices?

Regards, Zdravko
Re: optimizing publishing [ In reply to ]
On Jan 26, 2012, at 6:40 AM, Zdravko Balorda wrote:

> I'm trying to otpimize publishing for speed, now. Using
> bric_queued leaves apache for UI, which is good. There are
> left two main processes working on publish: perl (bric_queued) and
> postgres. Interestingly, the two grabs all 100% of one CPU, while
> leaving the other lose. There are two in my box.
> System time is very low: 3.3%.
> Seems quite optimal. Are there any better practices?

Not really. The best way to minimize publish time is to keep your templates simple, and don’t over-call pubish_another().

Best,

David
Re: optimizing publishing [ In reply to ]
On Jan 26, 2012, at 6:40 AM, Zdravko Balorda wrote:

> I'm trying to otpimize publishing for speed, now. Using
> bric_queued leaves apache for UI, which is good. There are
> left two main processes working on publish: perl (bric_queued) and
> postgres. Interestingly, the two grabs all 100% of one CPU, while
> leaving the other lose. There are two in my box.
> System time is very low: 3.3%.
> Seems quite optimal. Are there any better practices?

Not really. The best way to minimize publish time is to keep your templates simple, and don’t over-call pubish_another().

Best,

David
Re: optimizing publishing [ In reply to ]
I started a thread on this topic a while ago relating specifically to the API:

http://www.gossamer-threads.com/lists/bricolage/users/38869

-Matt

On Jan 26, 2012, at 12:03 PM, David E. Wheeler wrote:

> On Jan 26, 2012, at 6:40 AM, Zdravko Balorda wrote:
>
>> I'm trying to otpimize publishing for speed, now. Using
>> bric_queued leaves apache for UI, which is good. There are
>> left two main processes working on publish: perl (bric_queued) and
>> postgres. Interestingly, the two grabs all 100% of one CPU, while
>> leaving the other lose. There are two in my box.
>> System time is very low: 3.3%.
>> Seems quite optimal. Are there any better practices?
>
> Not really. The best way to minimize publish time is to keep your templates simple, and don’t over-call pubish_another().
>
> Best,
>
> David
>
Re: optimizing publishing [ In reply to ]
Hi, Matthew!
I remeber, yes. here, I am chasing speed of a whole republish. When you
have 25000 stories, and if it takes 2-3seconds on a story it takes a while. :)
But not much can be done, except perhaps in templates. I use publish another
in every story, unfortunately. Fanny thing though, perl takes 70% of CPU time.
What is it doing? :)
Zdravko

Matthew Rolf wrote:
> I started a thread on this topic a while ago relating specifically to the API:
>
> http://www.gossamer-threads.com/lists/bricolage/users/38869
>
> -Matt
>
> On Jan 26, 2012, at 12:03 PM, David E. Wheeler wrote:
>
>> On Jan 26, 2012, at 6:40 AM, Zdravko Balorda wrote:
>>
>>> I'm trying to otpimize publishing for speed, now. Using
>>> bric_queued leaves apache for UI, which is good. There are
>>> left two main processes working on publish: perl (bric_queued) and
>>> postgres. Interestingly, the two grabs all 100% of one CPU, while
>>> leaving the other lose. There are two in my box.
>>> System time is very low: 3.3%.
>>> Seems quite optimal. Are there any better practices?
>> Not really. The best way to minimize publish time is to keep your templates simple, and don’t over-call pubish_another().
>>
>> Best,
>>
>> David
>>
>
>


--
Zdravko Balorda
Med.Over.Net
Jurc(kova 229, Ljubljana

Tel.: +386 (0)1 520 50 50

Obišc(ite sistem zdravstvenih nasvetov Med.Over.Net
Re: optimizing publishing [ In reply to ]
Hi Zdravko,

If your templates call publish_another() in every story, that's probably
causing a good portion of the slowness you're experiencing.

What happens if you temporarily disable that?

All the best,

Bret

On Mon, 2012-01-30 at 08:15 +0100, Zdravko Balorda wrote:
> Hi, Matthew!
> I remeber, yes. here, I am chasing speed of a whole republish. When you
> have 25000 stories, and if it takes 2-3seconds on a story it takes a while. :)
> But not much can be done, except perhaps in templates. I use publish another
> in every story, unfortunately. Fanny thing though, perl takes 70% of CPU time.
> What is it doing? :)
> Zdravko
>
> Matthew Rolf wrote:
> > I started a thread on this topic a while ago relating specifically to the API:
> >
> > http://www.gossamer-threads.com/lists/bricolage/users/38869
> >
> > -Matt
> >
> > On Jan 26, 2012, at 12:03 PM, David E. Wheeler wrote:
> >
> >> On Jan 26, 2012, at 6:40 AM, Zdravko Balorda wrote:
> >>
> >>> I'm trying to otpimize publishing for speed, now. Using
> >>> bric_queued leaves apache for UI, which is good. There are
> >>> left two main processes working on publish: perl (bric_queued) and
> >>> postgres. Interestingly, the two grabs all 100% of one CPU, while
> >>> leaving the other lose. There are two in my box.
> >>> System time is very low: 3.3%.
> >>> Seems quite optimal. Are there any better practices?
> >> Not really. The best way to minimize publish time is to keep your templates simple, and don’t over-call pubish_another().
> >>
> >> Best,
> >>
> >> David
> >>
> >
> >
>
>

--
Bret Dawson
Producer
Pectopah Productions Inc.
(416) 895-7635
bret@pectopah.com
www.pectopah.com
Re: optimizing publishing [ In reply to ]
Hi, Bret!

Sure, every story calls publish_another. Maybe even more than once.
I do massive republish by bric_soap setting jobs to lowest priority.
This way users can overtake when publishing their stories via UI on
higher priority. Works good.
Now, I wonder, could template figure out its own job priority, and based
on that publish_another could be avoided while republish, while it
would normally publish cover pages when ran on higher priority.

Regards, Zdravko

Bret Dawson wrote:
> Hi Zdravko,
>
> If your templates call publish_another() in every story, that's probably
> causing a good portion of the slowness you're experiencing.
>
> What happens if you temporarily disable that?
>
> All the best,
>
> Bret
>
> On Mon, 2012-01-30 at 08:15 +0100, Zdravko Balorda wrote:
>> Hi, Matthew!
>> I remeber, yes. here, I am chasing speed of a whole republish. When you
>> have 25000 stories, and if it takes 2-3seconds on a story it takes a while. :)
>> But not much can be done, except perhaps in templates. I use publish another
>> in every story, unfortunately. Fanny thing though, perl takes 70% of CPU time.
>> What is it doing? :)
>> Zdravko
>>
>> Matthew Rolf wrote:
>>> I started a thread on this topic a while ago relating specifically to the API:
>>>
>>> http://www.gossamer-threads.com/lists/bricolage/users/38869
>>>
>>> -Matt
>>>
>>> On Jan 26, 2012, at 12:03 PM, David E. Wheeler wrote:
>>>
>>>> On Jan 26, 2012, at 6:40 AM, Zdravko Balorda wrote:
>>>>
>>>>> I'm trying to otpimize publishing for speed, now. Using
>>>>> bric_queued leaves apache for UI, which is good. There are
>>>>> left two main processes working on publish: perl (bric_queued) and
>>>>> postgres. Interestingly, the two grabs all 100% of one CPU, while
>>>>> leaving the other lose. There are two in my box.
>>>>> System time is very low: 3.3%.
>>>>> Seems quite optimal. Are there any better practices?
>>>> Not really. The best way to minimize publish time is to keep your templates simple, and don’t over-call pubish_another().
>>>>
>>>> Best,
>>>>
>>>> David
>>>>
>>>
>>
>


--
Zdravko Balorda
Med.Over.Net
Jurčkova 229, Ljubljana

Tel.: +386 (0)1 520 50 50

Obiščite sistem zdravstvenih nasvetov Med.Over.Net
Re: optimizing publishing [ In reply to ]
Hi!

Let me get back on this issue. I wonder how far fetched
is an idea of having more bric_queue daemons running
in parallel?

Best regards, Zdravko.
Re: optimizing publishing [ In reply to ]
Hi,

The main problem with doing that is that you get distribution issues. If
you could make sure that the same queued doing the burn for a given asset
is the one doing the distribution as well then you'll be fine.

I think it would probably work if you have a solid distributed file-system
as well, because then it wouldn't matter which queued burned the asset, the
files would be there for any to distribute.

-mark

____________________________________
From: Zdravko Balorda
Sent: Tue, Feb 14, 2012 at 09:03:36AM +0100
To: To users@lists.bricolagecms.org
Subject: Re: optimizing publishing

>
> Hi!
>
> Let me get back on this issue. I wonder how far fetched
> is an idea of having more bric_queue daemons running
> in parallel?
>
> Best regards, Zdravko.

--
..........................................................................
: Mark Jaroski
: Room 9016
: World Health Organization
: +41 22 791 16 65
:
..........................................................................
There are two kinds of cryptography in this world: cryptography that will
stop your kid sister from reading your files, and cryptography that will
stop major governments from reading your files
-- Bruce Schneier
..........................................................................
Re: optimizing publishing [ In reply to ]
Hi, Mark!

I'd leave only one distributing queued. And keep running several
publishing daemons. This would do for me, though. Publishing long
cover pages forces other users to wait until it's done. It makes users
think that something is wrong with the system.
Zdravko

JAROSKI, Mark Andrew wrote:
> Hi,
>
> The main problem with doing that is that you get distribution issues. If
> you could make sure that the same queued doing the burn for a given asset
> is the one doing the distribution as well then you'll be fine.
>
> I think it would probably work if you have a solid distributed file-system
> as well, because then it wouldn't matter which queued burned the asset, the
> files would be there for any to distribute.
>
> -mark
>
> ____________________________________
> From: Zdravko Balorda
> Sent: Tue, Feb 14, 2012 at 09:03:36AM +0100
> To: To users@lists.bricolagecms.org
> Subject: Re: optimizing publishing
>
>> Hi!
>>
>> Let me get back on this issue. I wonder how far fetched
>> is an idea of having more bric_queue daemons running
>> in parallel?
>>
>> Best regards, Zdravko.
>


--
Zdravko Balorda
Med.Over.Net
Jurčkova 229, Ljubljana

Tel.: +386 (0)1 520 50 50

Obiščite sistem zdravstvenih nasvetov Med.Over.Net
Re: optimizing publishing [ In reply to ]
On Feb 14, 2012, at 12:39 AM, Zdravko Balorda wrote:

> I'd leave only one distributing queued. And keep running several
> publishing daemons. This would do for me, though. Publishing long
> cover pages forces other users to wait until it's done. It makes users
> think that something is wrong with the system.

+1 Offhand I don’t see why this could not be done.

David
Re: optimizing publishing [ In reply to ]
David E. Wheeler wrote:
> On Feb 14, 2012, at 12:39 AM, Zdravko Balorda wrote:
>
>> I'd leave only one distributing queued. And keep running several
>> publishing daemons. This would do for me, though. Publishing long
>> cover pages forces other users to wait until it's done. It makes users
>> think that something is wrong with the system.
>
> +1 Offhand I don’t see why this could not be done.
>
> David

I've gave it some thought. $job->execute_me also does job locking.
$job->locking should be separated from execution so that unlocked
jobs could be collected by parent and then sent to execution by child.
Instead of $job->execute_me one should call $job->set_executing; $job->do_it.
But this is an internal function. What would you suggest?

Best regards, Zdravko
Re: optimizing publishing [ In reply to ]
____________________________________
From: Zdravko Balorda
Sent: Tue, Feb 14, 2012 at 09:39:41AM +0100
To: To users@lists.bricolagecms.org
Subject: Re: optimizing publishing

> Hi, Mark!
>
> I'd leave only one distributing queued. And keep running several
> publishing daemons. This would do for me, though. Publishing long
> cover pages forces other users to wait until it's done. It makes users
> think that something is wrong with the system.
> Zdravko

Ah, I see. I had the notion that you'd be running the daemons on different
machines. If it's the same file-system it will just work.

--
..........................................................................
: Mark Jaroski
: Room 9016
: World Health Organization
: +41 22 791 16 65
:
..........................................................................
There are two kinds of cryptography in this world: cryptography that will
stop your kid sister from reading your files, and cryptography that will
stop major governments from reading your files
-- Bruce Schneier
..........................................................................
Re: optimizing publishing [ In reply to ]
Zdravko Balorda wrote...

>> I'd leave only one distributing queued. And keep running several
>> publishing daemons. This would do for me, though. Publishing long
>> cover pages forces other users to wait until it's done. It makes users
>> think that something is wrong with the system.
>> Zdravko

We run /etc/bric_queued.sh 5 times when the system starts, and then use cron to execute /home/bric/bin/cron_clean_bric_queued every 30 minutes. This works well for us in keeping our job queue running smoothly. We publish quite a few stories and autopublish many feeds and distributed modules every 5-15 minutes, and Bric seems to keep up just fine. However, we have noticed that there is a tipping point when it comes to using publish_another, or when the templates are too complex, or when the story->list for feeds has a limit that is too high. Tweaking these things keeps us running smoothly. I am curious, however, about the comment from Zdravko above... is it possible to separate the bric_queue daemons into publishing daemons and distributing daemons? If so, I did not know this and I don't know how. I'd like to see more discussion about not only how to do this, but what the effects are, and what the recommended ratio is.

Thanks,
Michael Fletcher
Re: optimizing publishing [ In reply to ]
Fletcher, Michael wrote:
> Zdravko Balorda wrote...
>
>>> I'd leave only one distributing queued. And keep running several
>>> publishing daemons. This would do for me, though. Publishing long
>>> cover pages forces other users to wait until it's done. It makes users
>>> think that something is wrong with the system.
>>> Zdravko
>
> We run /etc/bric_queued.sh 5 times when the system starts, and then use cron

Hi, Michael!

Are you saying that you have 5 bric_queued-aemons?

Zdravko
RE: optimizing publishing [ In reply to ]
Zdravko Balorda wrote...
> Fletcher, Michael wrote:
>> Zdravko Balorda wrote...
>>
>>>> I'd leave only one distributing queued. And keep running several
>>>> publishing daemons. This would do for me, though. Publishing long
>>>> cover pages forces other users to wait until it's done. It makes
>>>> users think that something is wrong with the system.
>>>> Zdravko
>>
>> We run /etc/bric_queued.sh 5 times when the system starts, and then
>> use cron
>
> Hi, Michael!
>
> Are you saying that you have 5 bric_queued-aemons?
>
> Zdravko

Yes, but maybe 10! When our system starts, we execute the bric_queued.sh
command 5 times, but in actuality, it appears to start two processes for
each one. I see the total of 10 running when I use ps. However, it seems
like there is a parent and child process for each one, so I'm not really
sure whether it is 10 or 5. Here is the output of my ps...

ps -ef | grep bric_q
nobody 7371 1 7 Feb15 00:36:09 /usr/bin/perl -w /usr/local/bricolage/bin/bric_queued --username x --password x
nobody 7372 7371 0 Feb15 00:00:26 /usr/bin/perl -w /usr/local/bricolage/bin/bric_queued --username x --password x
nobody 9016 1 4 Feb15 00:19:22 /usr/bin/perl -w /usr/local/bricolage/bin/bric_queued --username x --password x
nobody 9017 9016 0 Feb15 00:00:19 /usr/bin/perl -w /usr/local/bricolage/bin/bric_queued --username x --password x
nobody 9045 1 4 Feb15 00:20:05 /usr/bin/perl -w /usr/local/bricolage/bin/bric_queued --username x --password x
nobody 9046 9045 0 Feb15 00:00:19 /usr/bin/perl -w /usr/local/bricolage/bin/bric_queued --username x --password x
nobody 10030 1 3 00:45 00:11:59 /usr/bin/perl -w /usr/local/bricolage/bin/bric_queued --username x --password x
nobody 10031 10030 0 00:45 00:00:13 /usr/bin/perl -w /usr/local/bricolage/bin/bric_queued --username x --password x
nobody 17125 1 9 05:45 00:04:52 /usr/bin/perl -w /usr/local/bricolage/bin/bric_queued --username x --password x
nobody 17126 17125 0 05:45 00:00:04 /usr/bin/perl -w /usr/local/bricolage/bin/bric_queued --username x --password x

Michael Fletcher
Re: optimizing publishing [ In reply to ]
Fletcher, Michael wrote:
>> Are you saying that you have 5 bric_queued-aemons?
>>
>> Zdravko
>
> Yes, but maybe 10! When our system starts, we execute the bric_queued.sh
> command 5 times, but in actuality, it appears to start two processes for
> each one. I see the total of 10 running when I use ps. However, it seems
> like there is a parent and child process for each one, so I'm not really

Ten daemons doesn't sound right. Daemons are stepping over each other, stories gets
published more than once, conflicts are detected, ...
This is why I'd like to truly parallelize bric_queued, but I'm not sure if
I'm not breaking any other concepts with my pproach since $job->execute_me is an atomic
function. Hopefully, Mark Andrew will throw some light on this issue. It's expected to
boost publishing performance at least by the number of cores in CPU and even
more importantly it would improve responsiveness of the system.

Best regards, Zdravko
Re: optimizing publishing [ In reply to ]
David E. Wheeler wrote:
> On Feb 14, 2012, at 12:39 AM, Zdravko Balorda wrote:
>
>> I'd leave only one distributing queued. And keep running several
>> publishing daemons. This would do for me, though. Publishing long
>> cover pages forces other users to wait until it's done. It makes users
>> think that something is wrong with the system.
>
> +1 Offhand I don’t see why this could not be done.
>
> David

Well, this doesn't seem to make much of an impression. So i won't
touch anything. :)
It's just so happened that I have opened a big site here, 25000 stories
which needs a lot of republishing and I get quite a few complaints
from people trying publishing as they usually do.

Best regards, Zdravko
Re: optimizing publishing [ In reply to ]
On Feb 21, 2012, at 5:52 AM, Zdravko Balorda wrote:

>> +1 Offhand I don’t see why this could not be done.
>> David
>
> Well, this doesn't seem to make much of an impression. So i won't
> touch anything. :)
> It's just so happened that I have opened a big site here, 25000 stories
> which needs a lot of republishing and I get quite a few complaints
> from people trying publishing as they usually do.

I don’t understand. Are you saying you’re not going to try to improve the parellelization of publishing?

David
Re: optimizing publishing [ In reply to ]
David E. Wheeler wrote:

>> It's just so happened that I have opened a big site here, 25000 stories
>> which needs a lot of republishing and I get quite a few complaints
>> from people trying publishing as they usually do.
>
> I don’t understand. Are you saying you’re not going to try to improve the parellelization of publishing?
>
> David

I would. It's that I think I need to break Job::execute_me being atomic function, into
two separate operations: set_executing and do_it. These two are private functions, too.
I hate to break others people concepts without asking first. So, please, let me know
if this is ok, and perhaps, do suggest the best way to do it.

Best regards, Zdravko
Re: optimizing publishing [ In reply to ]
On Feb 21, 2012, at 10:30 PM, Zdravko Balorda wrote:

>> I don’t understand. Are you saying you’re not going to try to improve the parellelization of publishing?
>> David
>
> I would. It's that I think I need to break Job::execute_me being atomic function, into
> two separate operations: set_executing and do_it. These two are private functions, too.
> I hate to break others people concepts without asking first. So, please, let me know
> if this is ok, and perhaps, do suggest the best way to do it.

Please, have at it.

In fact, better still would be to eliminate the need for separate transactions and, instead of updating a row in a table, use advisory locks:

http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS

Er, except I guess that wouldn’t work on MySQL. Bah!

David
Re: optimizing publishing [ In reply to ]
David E. Wheeler wrote:
> On Feb 21, 2012, at 10:30 PM, Zdravko Balorda wrote:
>
>>> I don’t understand. Are you saying you’re not going to try to improve the parellelization of publishing?
>>> David
>> I would. It's that I think I need to break Job::execute_me being atomic function, into
>> two separate operations: set_executing and do_it. These two are private functions, too.
>> I hate to break others people concepts without asking first. So, please, let me know
>> if this is ok, and perhaps, do suggest the best way to do it.
>
> Please, have at it.
>
> In fact, better still would be to eliminate the need for separate transactions and, instead of updating a row in a table, use advisory locks:
>
> http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS
>
> Er, except I guess that wouldn’t work on MySQL. Bah!
>
> David
>

Ok. i'll let you know about the progress.
Zdravko
Re: optimizing publishing [ In reply to ]
Hi,
there appear to be one more issue regarding parallel publishing:
publish_another() builds its own story queue, in memory. This requires
some form of IPC between daemons or via database.
I went for database approach:
publish_another now creates a new job for a document if such a job
doesn't exist already. Otherwise it merely pushes further its schedule
time to current time. Story will then be published last. No need for
flush_another_queue() anymore.
Also, publish_another inherits job priority from a calling job, instead
setting it to document priority.
And, Bulk publish via UI should also schedule jobs under lowest priority.

Please, let's hear some criticism.

Best regards!
Zdravko
Re: optimizing publishing [ In reply to ]
On Feb 28, 2012, at 4:59 AM, Zdravko Balorda wrote:

> there appear to be one more issue regarding parallel publishing:
> publish_another() builds its own story queue, in memory. This requires
> some form of IPC between daemons or via database.
> I went for database approach:
> publish_another now creates a new job for a document if such a job
> doesn't exist already. Otherwise it merely pushes further its schedule
> time to current time. Story will then be published last. No need for
> flush_another_queue() anymore.

Well, it should go out as quickly as possible. IIRC the reason for the in-memory queue was so that it would iterate over the jobs and publish them immediately. But maybe not?

At any rate, your proposal seems quite sane to me.

> Also, publish_another inherits job priority from a calling job, instead
> setting it to document priority.

Yeah, it should probably be supported as a parameter, but should still default to the priority of the current job (or the highest priority job that calls publish_another() to schedule it).

> And, Bulk publish via UI should also schedule jobs under lowest priority.

Why lowest priority?

Thanks,

David
Re: optimizing publishing [ In reply to ]
David E. Wheeler wrote:
>
>> Also, publish_another inherits job priority from a calling job, instead
>> setting it to document priority.
>
> Yeah, it should probably be supported as a parameter, but should still default to the
> priority of the current job (or the highest priority job that calls publish_another()
> to schedule it).

It is a parameter to a $burner->publish, but not to publish_another(). Maybe it's even
beter, because of proper handling the default priority. For publish_another() it's
probably the best to always go with calling job priority. If not defined otherwise
it will be the same as the publishing story priority, anyway.

>
>> And, Bulk publish via UI should also schedule jobs under lowest priority.
>
> Why lowest priority?

I consider Bulk Publish to be another way of massive publishing. If forced to low priority
it won't interfere with other people working.

And one more thing about publish another: a new job should be created also if an existing
job has different priority than the calling job. It probably comes from another queue or
from UI publish.

Occasionally I get an error: Cannot change scheduled time on executing job. This seems to
be the price for parallel publishing. It should be extremely rare, though.


Best regards, Zdravko

1 2  View All