Mailing List Archive

Bricolage and large site
Hi,
I'd like to hear comments on using Bricolage for a large site
with hundreeds and more of stories and media. How user interface cope with
associating an image to a story when there are so many images? Are there any
complaints by users and things like that. Any directions about how
to approach such a beast are wellcome.

Thank you, Zdravko.
Re: Bricolage and large site [ In reply to ]
] Bricolage is managing approximately:

11,473 pages of main site content.
1,990 pages of blog content.
18,636 jpegs.
18,716 gifs.
51 RSS feeds for the main site.
1,086 RSS feeds for the blog.

For the user interface it makes very little difference. The only thing
that is noticeably different is doing a 'bulk_publish'. : - )

-a










2009/11/30 Zdravko Balorda <zdravko.balorda@siix.com>:
>
> Hi,
> I'd like to hear comments on using Bricolage for a large site
> with hundreeds and more of stories and media. How user interface cope with
> associating an image to a story when there are so many images? Are there any
> complaints by users and things like that. Any directions about how
> to approach such a beast are wellcome.
>
> Thank you, Zdravko.
>
>
Re: Bricolage and large site [ In reply to ]
I think Bricolage works especially well for managing a large site. You need to take care when setting up desks, workflows, and categories, but if you're able to divide up the content in a way that is digestible by users, it should work well.

At Denison we had around the same amount of content as Adam just mentioned, and users would search for media via title or category - sometimes both. It worked out.

-Matt

On Nov 30, 2009, at 5:08 AM, Zdravko Balorda wrote:

>
> Hi,
> I'd like to hear comments on using Bricolage for a large site
> with hundreeds and more of stories and media. How user interface cope with
> associating an image to a story when there are so many images? Are there any
> complaints by users and things like that. Any directions about how
> to approach such a beast are wellcome.
>
> Thank you, Zdravko.
>
Re: Bricolage and large site [ In reply to ]
> At Denison we had around the same amount of content as Adam just mentioned, and users
> would search for media via title or category - sometimes both. It worked out.

Well, Adam came with some impressive numbers! I knew that Bricolage is good, but this!

Matt, using title on media puzzles me a bit. Usually I have no clue as to what to
name a media? Would using story title for media title be a good advice to users?

Regards, Zdravko
Re: Bricolage and large site [ In reply to ]
On 2-Dec-09, at 5:08 AM, Zdravko Balorda wrote:

>> At Denison we had around the same amount of content as Adam just
>> mentioned, and users
>> would search for media via title or category - sometimes both. It
>> worked out.
>
> Well, Adam came with some impressive numbers! I knew that Bricolage
> is good, but this!

There's also a note from Bret on the lists about Sportsnet reaching
100,000 stories:
http://www.gossamer-threads.com/lists/bricolage/users/34396#34396

--
Phillip Smith // Simplifier of Technology // COMMUNITY BANDWIDTH
www.communitybandwidth.ca // www.phillipadsmith.com
Re: Bricolage and large site [ In reply to ]
Hi everybody,

Yeah, Phillip's right. Sportsnet reached about 230,000 stories before
the recent redesign, and was still humming along smoothly.

(Interestingly, the FULL_SEARCH directive was turned on the whole time,
and it was always quick to search for stories.)

With the redesign, we only transferred Sportsnet's own content and
didn't migrate wire stories. So they're back down to about 130,000
again. There are about more than 30,000 media assets in the system too,
down from 160,000 in the pre-redesign system.

While we're dropping numbers, IFEX has something like 70,000 stories as
well. (Not much media in that one, though.)

So I'd say this without hesitating: Bricolage is excellent for large
sites. Better than any other CMS I've encountered.


Cheers,

Bret




On Wed, 2009-12-02 at 10:58 -0300, Phillip Smith wrote:
> On 2-Dec-09, at 5:08 AM, Zdravko Balorda wrote:
>
> >> At Denison we had around the same amount of content as Adam just
> >> mentioned, and users
> >> would search for media via title or category - sometimes both. It
> >> worked out.
> >
> > Well, Adam came with some impressive numbers! I knew that Bricolage
> > is good, but this!
>
> There's also a note from Bret on the lists about Sportsnet reaching
> 100,000 stories:
> http://www.gossamer-threads.com/lists/bricolage/users/34396#34396
>
> --
> Phillip Smith // Simplifier of Technology // COMMUNITY BANDWIDTH
> www.communitybandwidth.ca // www.phillipadsmith.com
>
>
>
>
>
>
>
>


--
Bret Dawson
Producer
Pectopah Productions Inc.
(416) 895-7635
bret@pectopah.com
www.pectopah.com
Re: Bricolage and large site [ In reply to ]
On Dec 2, 2009, at 3:08 AM, Zdravko Balorda wrote:

> Well, Adam came with some impressive numbers! I knew that Bricolage is good, but this!
>
> Matt, using title on media puzzles me a bit. Usually I have no clue as to what to
> name a media? Would using story title for media title be a good advice to users?


What I've found works best is to have an effective naming convention before the media get uploaded to Bricolage, then search based on that (or site, or category, or whatever). I would agree that adding in titles has not been all that useful in my experience. Others may have a different opinion.

-Matt
Re: Bricolage and large site [ In reply to ]
On Dec 2, 2009, at 6:37 AM, Bret Dawson wrote:

> (Interestingly, the FULL_SEARCH directive was turned on the whole time,
> and it was always quick to search for stories.)

How much RAM you go?

> With the redesign, we only transferred Sportsnet's own content and
> didn't migrate wire stories. So they're back down to about 130,000
> again. There are about more than 30,000 media assets in the system too,
> down from 160,000 in the pre-redesign system.

You deleted stories and media from Bricolage??

Best,

David
Re: Bricolage and large site [ In reply to ]
On Dec 2, 2009, at 9:07 AM, Matthew Rolf wrote:

> What I've found works best is to have an effective naming convention before the media get uploaded to Bricolage, then search based on that (or site, or category, or whatever). I would agree that adding in titles has not been all that useful in my experience. Others may have a different opinion.

I usually just use the file name.

Best,

David
Re: Bricolage and large site [ In reply to ]
On Wed, 2009-12-02 at 09:28 -0800, David E. Wheeler wrote:
> On Dec 2, 2009, at 6:37 AM, Bret Dawson wrote:
>
> > (Interestingly, the FULL_SEARCH directive was turned on the whole time,
> > and it was always quick to search for stories.)
>
> How much RAM you go?

It's 8GB. One of Alex's workhorse machines.

> > With the redesign, we only transferred Sportsnet's own content and
> > didn't migrate wire stories. So they're back down to about 130,000
> > again. There are about more than 30,000 media assets in the system too,
> > down from 160,000 in the pre-redesign system.
>
> You deleted stories and media from Bricolage??

Not exactly. Just didn't bother SOAPing many of the old ones over. They
were totally stale and full of links to off-site news sources, which
themselves had mostly expired.


Cheers,

Bret



> Best,
>
> David
>


--
Bret Dawson
Producer
Pectopah Productions Inc.
(416) 895-7635
bret@pectopah.com
www.pectopah.com
Re: Bricolage and large site [ In reply to ]
On Dec 2, 2009, at 10:37 AM, Bret Dawson wrote:

>> How much RAM you go?
>
> It's 8GB. One of Alex's workhorse machines.

Nice.

>> You deleted stories and media from Bricolage??
>
> Not exactly. Just didn't bother SOAPing many of the old ones over. They
> were totally stale and full of links to off-site news sources, which
> themselves had mostly expired.

Oh, you migrated to a new Bricolage install? That was…brave.

Best,

David
Re: Bricolage and large site [ In reply to ]
On Dec 2, 2009, at 2:32 PM, David E. Wheeler wrote:

> Oh, you migrated to a new Bricolage install? That was…brave.

We considered this at Denison (they may still do it) but I always regarded it as a "last resort".

-Matt
Re: Bricolage and large site [ In reply to ]
Went back over this thread and started thinking, is there a theoretical limit as to how large a Bricolage installation could grow and still run well? Let's assume some kind of optimal modern hardware setup and straight forward template construction. DB running on it's own multi-core box with 16GB RAM, same with Bricolage itself.

In my mind there's two relevant metrics - number of stories/versions, and number of concurrent users. Regarding the first, would an install be able to reach 1 million stories? 2 million? More? Would the bottleneck be software (mod_perl, queries) or hardware (procesor, I/O)? Given that Postgresql can handle 32TB tables, I doubt sheer data volume would be an issue for a very long time in Bric years. But would it slow down considerably as the DB grew?

Regarding the second, at Denison the interface itself was never a problem, even though the DB and Bric were on the same box and shared 4GB RAM. Bric could take 10 concurrent users hitting it hard (the most we were ever able to conclusively test), and the box never even blinked. The biggest bottleneck was ssh bulk publishing and the times when we borked our templates (with the former never impacting the interface since we used bric_queued). My sense is that Bricolage could easily handle lots more users at the same time.

Anyway, just curious. I'll be interested to hear what others have to say.

-Matt
Re: Bricolage and large site [ In reply to ]
On Jan 20, 2010, at 10:56 AM, Matt Rolf wrote:

> In my mind there's two relevant metrics - number of stories/versions, and number of concurrent users. Regarding the first, would an install be able to reach 1 million stories? 2 million? More? Would the bottleneck be software (mod_perl, queries) or hardware (procesor, I/O)? Given that Postgresql can handle 32TB tables, I doubt sheer data volume would be an issue for a very long time in Bric years. But would it slow down considerably as the DB grew?

No, though it depends on the database tuning. Recent stories and versions should be cached for fast access, but in general, the larger the database, the more RAM you want.

Where it would pose a problem is if you need to republish the entire site. Then all the stories would be loaded, best done with --chunks. That will take a while with 2m stories. You'll want to avoid that. For that kind of scale, I'd recommend that the vast majority of published stories be as simple as possible, and included in the overall style of the site via SSI (PHP, Mason, whatever), so that the format of published stories isn't dependent on the design of the site.

Failing that, I recommend using a content delivery server to serve the content from its own database. That has its own issues, though.

> Regarding the second, at Denison the interface itself was never a problem, even though the DB and Bric were on the same box and shared 4GB RAM. Bric could take 10 concurrent users hitting it hard (the most we were ever able to conclusively test), and the box never even blinked. The biggest bottleneck was ssh bulk publishing and the times when we borked our templates (with the former never impacting the interface since we used bric_queued). My sense is that Bricolage could easily handle lots more users at the same time.

I believe WHO handles 100 concurrent users fairly often. Users don't often impact performance, much, frankly (unless they run a very expensive search that returns all 2m stories or something). It's publishing that tends to be expensive, IME.

Best,

David
Re: Bricolage and large site [ In reply to ]
On Jan 20, 2010, at 2:23 PM, David E. Wheeler wrote:

> No, though it depends on the database tuning.

I assume you mean standard postgresql configuration as opposed to anything Bricolage oriented?

> Where it would pose a problem is if you need to republish the entire site. Then all the stories would be loaded, best done with --chunks. That will take a while with 2m stories. You'll want to avoid that.

It would take about an hour and a half for Denison's server to burn through the whole site, and longer still to publish it all. I'm not going to pretend that we did things the most efficient way possible, but even if you assume various performance tweaking in the templates or hardware, you're still talking a while for that many stories.

We did chunk everything at "1". I'm guessing that helps the memory issue, but adds some kind of overhead.

> For that kind of scale, I'd recommend that the vast majority of published stories be as simple as possible, and included in the overall style of the site via SSI (PHP, Mason, whatever), so that the format of published stories isn't dependent on the design of the site.

Yes, that makes a lot of sense.

> Failing that, I recommend using a content delivery server to serve the content from its own database. That has its own issues, though.

Wow, are you saying a custom dynamic front end that would tap the Bric DB, and thus get around the whole burn/move process?

> Users don't often impact performance, much, frankly (unless they run a very expensive search that returns all 2m stories or something). It's publishing that tends to be expensive, IME.

This jives very much with my own experience.

In other words, if you're going to go large on a conventional install, do it in a way that limits site-wide bulk publishes and make sure the templates are clean. But beyond that, it sounds like you're saying the DB could handle a lot more content in everyday use.

Does WHO have a lot of stories (in other words, more than what Bret mentioned) or just users, versions, and elements?

-Matt
Re: Bricolage and large site [ In reply to ]
On Jan 20, 2010, at 12:17 PM, Matt Rolf wrote:

> I assume you mean standard postgresql configuration as opposed to anything Bricolage oriented?

Correct, though there still might be the odd slow query to diagnose.

> It would take about an hour and a half for Denison's server to burn through the whole site, and longer still to publish it all. I'm not going to pretend that we did things the most efficient way possible, but even if you assume various performance tweaking in the templates or hardware, you're still talking a while for that many stories.

Absolutely, no way around it.

> We did chunk everything at "1". I'm guessing that helps the memory issue, but adds some kind of overhead.

Yes, but that actually makes things slower. I would do a minimum of 10, and probably 50 (unless you have templates that load a huge amount of stuff from the database).

>> Failing that, I recommend using a content delivery server to serve the content from its own database. That has its own issues, though.
>
> Wow, are you saying a custom dynamic front end that would tap the Bric DB, and thus get around the whole burn/move process?

No. You have Bricolage publish to your dynamic front-end, either by inserting and updating the stories via an API, or by dropping the files in an appropriate location (as is done with the Drupal Bricolage plugin).

> In other words, if you're going to go large on a conventional install, do it in a way that limits site-wide bulk publishes and make sure the templates are clean. But beyond that, it sounds like you're saying the DB could handle a lot more content in everyday use.

No question.

> Does WHO have a lot of stories (in other words, more than what Bret mentioned) or just users, versions, and elements?

WHO'ers?

Best,

David
Re: Bricolage and large site [ In reply to ]
On Jan 20, 2010, at 5:07 PM, David E. Wheeler wrote:

> Yes, but that actually makes things slower. I would do a minimum of 10, and probably 50 (unless you have templates that load a huge amount of stuff from the database).

We chunked on 1 to make sure that as much as possible got published. When SOAP would come across a story or piece of media that was checked out, on a desk, or otherwise corrupted, it would cause the rest of the chunk to fail. We would then have no way of knowing what we'd need to go republish, and that was a pain anyway. 1 at least got as much as possible out there.

Maybe this isn't an issue anymore with some of the patches that have come through, but it was a big concern for us.

> No. You have Bricolage publish to your dynamic front-end, either by inserting and updating the stories via an API, or by dropping the files in an appropriate location (as is done with the Drupal Bricolage plugin).

Ok, duh.

-Matt
Re: Bricolage and large site [ In reply to ]
On Jan 21, 2010, at 7:58 AM, Matt Rolf wrote:

> We chunked on 1 to make sure that as much as possible got published. When SOAP would come across a story or piece of media that was checked out, on a desk, or otherwise corrupted, it would cause the rest of the chunk to fail. We would then have no way of knowing what we'd need to go republish, and that was a pain anyway. 1 at least got as much as possible out there.

Understood. One thing I've done in the past is to republish an entire site a bunch of times until I got all of those issues eliminated. Well not all, but as many as showed up when there were only a few hundred stories to publish, rather than when there are tens of thousands later.s

Best,

David
Re: Bricolage and large site [ In reply to ]
On Jan 20, 2010, at 2:23 PM, David E. Wheeler wrote:

> It's publishing that tends to be expensive, IME.

I've been ruminating some more on this topic. In regards to the publish process, it's not only expensive from the perspective of database activity, but also from the perspective of executing the template transforms (publishing) and then moving the finished product to some other server for final display (distribution).

To me, the database activity is the least of these issues, particularly if postgresql is on a large, multi-core box with oodles of ram, a competent OS, and a nice disk array. Tune it up, let it fly.

Unfortunately, the other two parts fall to bric_queued, which 1) has to be run on the same box as Bricolage (although I suppose you could put it on another box and then network share the needed Bric libraries) and 2) is confined to two processes, one for publishing and one for distribution for the *entire* bricolage install. In an app that can see both postgres and apache spawn as many processes as are needed and ram allows to keep up with increased use, it seems like this is a pretty glaring scalability weak point.

I was stunned to read in the bric_queued script that "distribution is normally an order of magnitude faster than publishing". Now, I can believe that in regards to file system moves, but FTP and SFTP? Using Net::SSH2, Bricolage could transfer about a file a second, but for 10,000 files, that's a mighty long time to wait. I'm assuming FTP connections have similar latency.

Regardless of which is faster, bric_queued is probably going to be a choke point for large installs long before db activity. Thousands of stories or media flying out at once are simply going to over-whelm it.

Is there any way bric_queued could be used or easily altered to improve scalability? Running the script more than once to spawn multiple pub and dist thread pairs doesn't seem to help - one pair will grab the jobs queue and the others will just sit there. It seems to me that the possibility of it running on it's own box and using multiple processes to deal with a large file load would be a good start, but maybe I'm thinking about the problem wrong.

-Matt
Re: Bricolage and large site [ In reply to ]
On Jan 26, 2010, at 12:53 PM, Matt Rolf wrote:

> To me, the database activity is the least of these issues, particularly if postgresql is on a large, multi-core box with oodles of ram, a competent OS, and a nice disk array. Tune it up, let it fly.

Well, yes, except that the slowness of publishing is not due to the speed of the database, but to the amount of data loaded from the database. So if you republish your home page, and the template loads the 25 most recent stories from the database, and then loads any of the elements from those 25 stories (such as a teaser field), then that's a lot of data that gets queried.

> Unfortunately, the other two parts fall to bric_queued, which 1) has to be run on the same box as Bricolage (although I suppose you could put it on another box and then network share the needed Bric libraries)

It's not the libraries, which can be copied, but the templates, which cannot.

> and 2) is confined to two processes, one for publishing and one for distribution for the *entire* bricolage install. In an app that can see both postgres and apache spawn as many processes as are needed and ram allows to keep up with increased use, it seems like this is a pretty glaring scalability weak point.

Agreed. I've seen bric_queued choke on a lot of stories before.

> I was stunned to read in the bric_queued script that "distribution is normally an order of magnitude faster than publishing". Now, I can believe that in regards to file system moves, but FTP and SFTP? Using Net::SSH2, Bricolage could transfer about a file a second, but for 10,000 files, that's a mighty long time to wait. I'm assuming FTP connections have similar latency.

It should not take a second to do the transfer. That's way too slow.

> Regardless of which is faster, bric_queued is probably going to be a choke point for large installs long before db activity. Thousands of stories or media flying out at once are simply going to over-whelm it.

Anyone from WHO want to chime in on this?

> Is there any way bric_queued could be used or easily altered to improve scalability? Running the script more than once to spawn multiple pub and dist thread pairs doesn't seem to help - one pair will grab the jobs queue and the others will just sit there. It seems to me that the possibility of it running on it's own box and using multiple processes to deal with a large file load would be a good start, but maybe I'm thinking about the problem wrong.

It could absolutely be made to scale better. It could spawn separate jobs for each story it publishes, for example, up to n at a time. Other stuff too, I'm sure. Are you offering to take on such a task?

Best,

David
Re: Bricolage and large site [ In reply to ]
Hi,

On Wed Jan 20 02:07:48, David E. Wheeler wrote:
> On Jan 20, 2010, at 12:17 PM, Matt Rolf wrote:
> > Does WHO have a lot of stories (in other words, more than what Bret
> > mentioned) or just users, versions, and elements?
>
> WHO'ers?

Can't speak to the publish experience (other then it works), but for
numbers, there are about 80k stories with 400k revisions.

Cheers,

Alex
--
Alex Krohn <alex@gossamer-threads.com>
Re: Bricolage and large site [ In reply to ]
Hi all,

Publishing works pretty quickly, but we've got kind-of a special
architecture. We're running with a cluster.

-mark

____________________________________
From: Alex Krohn
Sent: Wed, Jan 27, 2010 at 09:41:20AM +0100
To: To users@lists.bricolage.cc
Subject: Re: Bricolage and large site

> Hi,
>
> On Wed Jan 20 02:07:48, David E. Wheeler wrote:
> > On Jan 20, 2010, at 12:17 PM, Matt Rolf wrote:
> > > Does WHO have a lot of stories (in other words, more than what Bret
> > > mentioned) or just users, versions, and elements?
> >
> > WHO'ers?
>
> Can't speak to the publish experience (other then it works), but for
> numbers, there are about 80k stories with 400k revisions.
>
> Cheers,
>
> Alex
> --
> Alex Krohn <alex@gossamer-threads.com>

--
.................................................................
: Mark Jaroski
: Room 9016
: World Health Organization
: +41 22 791 16 65
:
.................................................................
Always remember: unencrypted email is not private. Be careful.
.................................................................
Re: Bricolage and large site [ In reply to ]
On Jan 28, 2010, at 5:45 AM, Mark Jaroski wrote:

> Hi all,
>
> Publishing works pretty quickly, but we've got kind-of a special
> architecture. We're running with a cluster.

I totally understand if you don't want to go into more detail, but I'm curious to know more about the set up. Is it Failover? Load-Balancing? Both? Does that mean you've got more than 1 Bric instance running and/or talking to just one db?

-Matt
Re: Bricolage and large site [ In reply to ]
On Jan 26, 2010, at 6:20 PM, David E. Wheeler wrote:

> It could absolutely be made to scale better. It could spawn separate jobs for each story it publishes, for example, up to n at a time. Other stuff too, I'm sure. Are you offering to take on such a task?

That might be something I could take a look at in the near future.

-Matt
Re: Bricolage and large site [ In reply to ]
Matt Rolf wrote:
> I totally understand if you don't want to go into more detail, but I'm
> curious to know more about the set up. Is it Failover? Load-Balancing?
> Both? Does that mean you've got more than 1 Bric instance running and/or
> talking to just one db?

Both. We have two db servers with Slony replication. They're configured in
a master slave setup, with manual failover.

The application servers running Apache/Bricolage are load-balanced for the
most part, with persistent connections provided by a cookie which means we
don't have to worry about replicating session data.

I added a call to rsync in the media upload function which replicates the
appropriate sub-directory of media. bric_queued runs on only one of the
app servers because we don't yet have a way of attaching a distribution job
to a particular server. If the server it runs on goes down it switches to
running on the other.

The whole thing is governed by Monit and Heartbeat (Linux HA.) We have on
average about 3 fail-over events on the application servers per year, and
one on the db servers.

If you like I'll dig up some diagrams.

all the best,

-mark

--
--
=================================================================
-- mark at geekhive dot net --