Mailing List Archive: Paging Ash Berlin

Paging Ash Berlin

dbix-class at trout

Oct 8, 2008, 5:53 PM

Post #1 of 10 (5600 views)

On Wed, Oct 08, 2008 at 08:25:59PM +0200, Bruno Czekay wrote:
> Hi Matt
>
> WiadomoÅ›Ä‡ napisana w dniu 2008-10-05, o godz. 12:42, przez Matt S Trout:
>
> >That's why MooseX::JobQueue was written - but ash didn't have time to
> >clean it up and release it before he left Shadowcat to join a startup.
> >
> >Volunteers to help finish it up would be -very- welcome.
>
> Lately I started some deeper digging into Moose, so... if you don't
> have any better volunteer around, maybe I could try to do something
> useful?

That would be awesome. Well volunteered :D

Ash, please assist this man :)

--
Matt S Trout Need help with your Catalyst or DBIx::Class project?
Technical Director http://www.shadowcat.co.uk/catalyst/
Shadowcat Systems Ltd. Want a managed development or deployment platform?
http://chainsawblues.vox.com/ http://www.shadowcat.co.uk/servers/

_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev

Re: Paging Ash Berlin [ In reply to ]

ash_cpan at firemirror

Oct 8, 2008, 6:15 PM

Post #2 of 10 (5480 views)

On 9 Oct 2008, at 01:53, Matt S Trout wrote:

> On Wed, Oct 08, 2008 at 08:25:59PM +0200, Bruno Czekay wrote:
>> Hi Matt
>>
>> WiadomoÅ›Ä‡ napisana w dniu 2008-10-05, o godz. 12:42, przez Matt S
>> Trout:
>>
>>> That's why MooseX::JobQueue was written - but ash didn't have time
>>> to
>>> clean it up and release it before he left Shadowcat to join a
>>> startup.
>>>
>>> Volunteers to help finish it up would be -very- welcome.
>>
>> Lately I started some deeper digging into Moose, so... if you don't
>> have any better volunteer around, maybe I could try to do something
>> useful?
>
> That would be awesome. Well volunteered :D
>
> Ash, please assist this man :)

Whut? I'm awake? Who's president?

The code as it stands is in http://code2.0beta.co.uk/moose/svn/MooseX-JobQueue/

Things that need doing

1) Renaming to App::JobQueue (since its not really a Moose eXtension)
2) Reading over the docs to check they still make sense
3) Seeing if its at all useful to what you want it to do :)
4) I've got a couple of small patches sitting somewhere to apply.
5) Possibly move to a different SVN space. Of minor impotantce tho.
6) ?
7) profit!

-ash
_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev

Re: Paging Ash Berlin [ In reply to ]

jjn1056 at yahoo

Oct 9, 2008, 6:09 AM

Post #3 of 10 (5479 views)

--- On Wed, 10/8/08, Ash Berlin <ash_cpan@firemirror.com> wrote:

> From: Ash Berlin <ash_cpan@firemirror.com>
> Subject: Re: [Catalyst-dev] Paging Ash Berlin
> To: "Development of the elegant MVC web framework" <catalyst-dev@lists.scsys.co.uk>
> Date: Wednesday, October 8, 2008, 9:15 PM
> On 9 Oct 2008, at 01:53, Matt S Trout wrote:
>
> > On Wed, Oct 08, 2008 at 08:25:59PM +0200, Bruno Czekay
> wrote:
> >> Hi Matt
> >>
> >> WiadomoÅ›Ä‡ napisana w dniu 2008-10-05, o godz.
> 12:42, przez Matt S
> >> Trout:
> >>
> >>> That's why MooseX::JobQueue was written -
> but ash didn't have time
> >>> to
> >>> clean it up and release it before he left
> Shadowcat to join a
> >>> startup.
> >>>
> >>> Volunteers to help finish it up would be
> -very- welcome.
> >>
> >> Lately I started some deeper digging into Moose,
> so... if you don't
> >> have any better volunteer around, maybe I could
> try to do something
> >> useful?
> >
> > That would be awesome. Well volunteered :D
> >
> > Ash, please assist this man :)
>
> Whut? I'm awake? Who's president?
>
> The code as it stands is in
> http://code2.0beta.co.uk/moose/svn/MooseX-JobQueue/
>
> Things that need doing
>
> 1) Renaming to App::JobQueue (since its not really a Moose
> eXtension)
> 2) Reading over the docs to check they still make sense
> 3) Seeing if its at all useful to what you want it to do :)
> 4) I've got a couple of small patches sitting somewhere
> to apply.
> 5) Possibly move to a different SVN space. Of minor
> impotantce tho.
> 6) ?
> 7) profit!
>
> -ash

Count me in on this as well. We are using a version of this at $work and probably have some useful feedback. I'm not sure what the current state is, but stuff we'd like to have (ie I can work on) would be better prioritization of jobs and possibly detangling this from DBIC so that DBIC would be one of many possible storage engines.

Other, lower priority stuff would include some sort of admin deamon to help collect reporting and to upgrade or query stats.

Sincerely,
John Napiorkowski
> _______________________________________________
> Catalyst-dev mailing list
> Catalyst-dev@lists.scsys.co.uk
> http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev

_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev

Re: Paging Ash Berlin [ In reply to ]

jshirley at gmail

Oct 10, 2008, 12:46 PM

Post #4 of 10 (5471 views)

On Thu, Oct 9, 2008 at 6:09 AM, John Napiorkowski <jjn1056@yahoo.com> wrote:
>
>
>
> --- On Wed, 10/8/08, Ash Berlin <ash_cpan@firemirror.com> wrote:
>
>> From: Ash Berlin <ash_cpan@firemirror.com>
>> Subject: Re: [Catalyst-dev] Paging Ash Berlin
>> To: "Development of the elegant MVC web framework" <catalyst-dev@lists.scsys.co.uk>
>> Date: Wednesday, October 8, 2008, 9:15 PM
>> On 9 Oct 2008, at 01:53, Matt S Trout wrote:
>>
>> > On Wed, Oct 08, 2008 at 08:25:59PM +0200, Bruno Czekay
>> wrote:
>> >> Hi Matt
>> >>
>> >> Wiadomo¶æ napisana w dniu 2008-10-05, o godz.
>> 12:42, przez Matt S
>> >> Trout:
>> >>
>> >>> That's why MooseX::JobQueue was written -
>> but ash didn't have time
>> >>> to
>> >>> clean it up and release it before he left
>> Shadowcat to join a
>> >>> startup.
>> >>>
>> >>> Volunteers to help finish it up would be
>> -very- welcome.
>> >>
>> >> Lately I started some deeper digging into Moose,
>> so... if you don't
>> >> have any better volunteer around, maybe I could
>> try to do something
>> >> useful?
>> >
>> > That would be awesome. Well volunteered :D
>> >
>> > Ash, please assist this man :)
>>
>> Whut? I'm awake? Who's president?
>>
>> The code as it stands is in
>> http://code2.0beta.co.uk/moose/svn/MooseX-JobQueue/
>>
>> Things that need doing
>>
>> 1) Renaming to App::JobQueue (since its not really a Moose
>> eXtension)
>> 2) Reading over the docs to check they still make sense
>> 3) Seeing if its at all useful to what you want it to do :)
>> 4) I've got a couple of small patches sitting somewhere
>> to apply.
>> 5) Possibly move to a different SVN space. Of minor
>> impotantce tho.
>> 6) ?
>> 7) profit!
>>
>> -ash
>
> Count me in on this as well. We are using a version of this at $work and probably have some useful feedback. I'm not sure what the current state is, but stuff we'd like to have (ie I can work on) would be better prioritization of jobs and possibly detangling this from DBIC so that DBIC would be one of many possible storage engines.
>
> Other, lower priority stuff would include some sort of admin deamon to help collect reporting and to upgrade or query stats.
>
> Sincerely,
> John Napiorkowski

I really really really don't want to start a (very much so offtopic)
flamewar, but I would like to get a discussion going about this versus
TheSchwartz. It seems roughly similar (at least in function).

Here are the features that TheSchwartz has that I didn't see in
MooseX::JobQueue (and yes, please name it something other than
MooseX::JobQueue)

The following are handled because of Data::ObjectDriver, but want to
list them as features anyway:
1. Partitioning of jobs in the database
2. Built-in replication handling

General stuff:
3. Doesn't rely on POE; just has its own loop. I can see the
benefits to using POE but it seems like unnecessary overhead for a
jobqueue that has workers that should simply do work and nothing else.
It seems the scripts to control workers would "have more" in them
(not sure if this is a bad thing, just want to start a discussion on
pros/cons)
4. Why does the job get a ResultSet? This seems like a very odd and
strict tie into DBIC that doesn't seem to make much sense, tbh. As
John said, tying it to DBIC limits its applications. Could the job
just not get a HashRef inflated struct and iterate over it without
objects? Performance hits should be limited as much as possible, IMO.

-J

Re: Paging Ash Berlin [ In reply to ]

ash_cpan at firemirror

Oct 10, 2008, 2:22 PM

Post #5 of 10 (5480 views)

On 10 Oct 2008, at 20:46, J. Shirley wrote:

>
>
> I really really really don't want to start a (very much so offtopic)
> flamewar, but I would like to get a discussion going about this versus
> TheSchwartz. It seems roughly similar (at least in function).

TBH one of the reasons I avoided TheSchwartz was that I couldn't work
out what was going on. I did feel kinda iffy about wheel re-invention
here, but there was something about TheSchwartz when i looked at that
didn't sit well with me. Can't remember what it was anymore.

>
>
> Here are the features that TheSchwartz has that I didn't see in
> MooseX::JobQueue (and yes, please name it something other than
> MooseX::JobQueue)
>
> The following are handled because of Data::ObjectDriver, but want to
> list them as features anyway:
> 1. Partitioning of jobs in the database
> 2. Built-in replication handling

Not really sure what these two things are? Shouldn't replication be
done at a DB level? Partitioning - as having jobs live in two
different tables/DBs? If so then App::JobQueue (lets call it that for
lack of a better alternative) does that.

>
>
> General stuff:
> 3. Doesn't rely on POE; just has its own loop. I can see the
> benefits to using POE but it seems like unnecessary overhead for a
> jobqueue that has workers that should simply do work and nothing else.
> It seems the scripts to control workers would "have more" in them
> (not sure if this is a bad thing, just want to start a discussion on
> pros/cons)

Code reuse? An excuse to learn POE? Doesn't need to use it at all,
just does.

It could be quite easily made to not work processes to run certain
classes of small jobs due to POE, which is a nice benefit i think. /me
did not just think of that on the spot >_>

>
> 4. Why does the job get a ResultSet? This seems like a very odd and
> strict tie into DBIC that doesn't seem to make much sense, tbh. As
> John said, tying it to DBIC limits its applications. Could the job
> just not get a HashRef inflated struct and iterate over it without
> objects? Performance hits should be limited as much as possible, IMO.

Yes, being tied into DBIC is def something that needs to be addressed
at one point or another. For now its not a huge show stopper tho.

But yes, overall I am aware that this is Yet Another Job Queue.

_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev

Re: Paging Ash Berlin [ In reply to ]

jshirley at gmail

Oct 10, 2008, 3:17 PM

Post #6 of 10 (5608 views)

On Fri, Oct 10, 2008 at 2:22 PM, Ash Berlin <ash_cpan@firemirror.com> wrote:
>
> On 10 Oct 2008, at 20:46, J. Shirley wrote:
>
>>
>>
>> I really really really don't want to start a (very much so offtopic)
>> flamewar, but I would like to get a discussion going about this versus
>> TheSchwartz. It seems roughly similar (at least in function).
>
> TBH one of the reasons I avoided TheSchwartz was that I couldn't work out
> what was going on. I did feel kinda iffy about wheel re-invention here, but
> there was something about TheSchwartz when i looked at that didn't sit well
> with me. Can't remember what it was anymore.
>
>>
>>
>> Here are the features that TheSchwartz has that I didn't see in
>> MooseX::JobQueue (and yes, please name it something other than
>> MooseX::JobQueue)
>>
>> The following are handled because of Data::ObjectDriver, but want to
>> list them as features anyway:
>> 1. Partitioning of jobs in the database
>> 2. Built-in replication handling
>
> Not really sure what these two things are? Shouldn't replication be done at
> a DB level? Partitioning - as having jobs live in two different tables/DBs?
> If so then App::JobQueue (lets call it that for lack of a better
> alternative) does that.
>

Well, I mean horizontal partitioning. So, automatic partitioning
based on some algorithm (like "if job->id % 2 => use this cluster").

I didn't realize it did that... couldn't find that bit.

As far as replication goes, DBIC handles some replication schemes but
there isn't the same support that D::OD has. I'm not championing
D::OD at all here, I prefer DBIC for all things; however D::OD has a
lot of code to support multiplexing and caching that DBIC hasn't
culled yet.

So, while replication happens at the database layer, the interactions
there require client side behaviors. Such as reading from slaves,
write to masters, etc. DBIC already has basic slave/master support
but without support for slave read-delay (which is unfortunately
application specific in most cases) App::JobQueue won't have that...

Which means worse replication support than TheSchwartz.

>>
>>
>> General stuff:
>> 3. Doesn't rely on POE; just has its own loop. I can see the
>> benefits to using POE but it seems like unnecessary overhead for a
>> jobqueue that has workers that should simply do work and nothing else.
>> It seems the scripts to control workers would "have more" in them
>> (not sure if this is a bad thing, just want to start a discussion on
>> pros/cons)
>
> Code reuse? An excuse to learn POE? Doesn't need to use it at all, just
> does.
>

Ok, good enough :)

> It could be quite easily made to not work processes to run certain classes
> of small jobs due to POE, which is a nice benefit i think. /me did not just
> think of that on the spot >_>
>
>>
>> 4. Why does the job get a ResultSet? This seems like a very odd and
>> strict tie into DBIC that doesn't seem to make much sense, tbh. As
>> John said, tying it to DBIC limits its applications. Could the job
>> just not get a HashRef inflated struct and iterate over it without
>> objects? Performance hits should be limited as much as possible, IMO.
>
>
> Yes, being tied into DBIC is def something that needs to be addressed at one
> point or another. For now its not a huge show stopper tho.
>
> But yes, overall I am aware that this is Yet Another Job Queue.
>

Well, don't get me wrong... I -want- to use it, but I have working
TheSchwartz code right now so just wondering if I want to switch over.
If it weren't tied to DBIC but had a compat layer to use D::OD I
could, in theory, test them side by side.

Thanks for the response... other folks using it care to chime in?
Particularly on the POE bits... I'm starting to think that the POE
bits may be nice for an admin interface to push/pull status...

-J

_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev

Re: Paging Ash Berlin [ In reply to ]

jgoulah at gmail

Oct 11, 2008, 12:31 PM

Post #7 of 10 (5484 views)

On Fri, Oct 10, 2008 at 6:17 PM, J. Shirley <jshirley@gmail.com> wrote:
> On Fri, Oct 10, 2008 at 2:22 PM, Ash Berlin <ash_cpan@firemirror.com> wrote:
>>
>> On 10 Oct 2008, at 20:46, J. Shirley wrote:
>>
>>>
>>>
>>> I really really really don't want to start a (very much so offtopic)
>>> flamewar, but I would like to get a discussion going about this versus
>>> TheSchwartz. It seems roughly similar (at least in function).
>>
>> TBH one of the reasons I avoided TheSchwartz was that I couldn't work out
>> what was going on. I did feel kinda iffy about wheel re-invention here, but
>> there was something about TheSchwartz when i looked at that didn't sit well
>> with me. Can't remember what it was anymore.
>>
>>>
>>>
>>> Here are the features that TheSchwartz has that I didn't see in
>>> MooseX::JobQueue (and yes, please name it something other than
>>> MooseX::JobQueue)
>>>
>>> The following are handled because of Data::ObjectDriver, but want to
>>> list them as features anyway:
>>> 1. Partitioning of jobs in the database
>>> 2. Built-in replication handling
>>
>> Not really sure what these two things are? Shouldn't replication be done at
>> a DB level? Partitioning - as having jobs live in two different tables/DBs?
>> If so then App::JobQueue (lets call it that for lack of a better
>> alternative) does that.
>>
>
> Well, I mean horizontal partitioning. So, automatic partitioning
> based on some algorithm (like "if job->id % 2 => use this cluster").
>
> I didn't realize it did that... couldn't find that bit.
>
> As far as replication goes, DBIC handles some replication schemes but
> there isn't the same support that D::OD has. I'm not championing
> D::OD at all here, I prefer DBIC for all things; however D::OD has a
> lot of code to support multiplexing and caching that DBIC hasn't
> culled yet.
>
> So, while replication happens at the database layer, the interactions
> there require client side behaviors. Such as reading from slaves,
> write to masters, etc. DBIC already has basic slave/master support
> but without support for slave read-delay (which is unfortunately
> application specific in most cases) App::JobQueue won't have that...
>
> Which means worse replication support than TheSchwartz.

This is correct. When it was put in production on several servers
under a replicated DBIC things went a bit haywire with the job locking
I believe when slaves got delayed and we had to point all queries at
the master. Otherwise it does scale beautifully to multiple machines.
I wonder what the best solution is here.

John

_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev

Re: Paging Ash Berlin [ In reply to ]

jshirley at gmail

Oct 11, 2008, 1:03 PM

Post #8 of 10 (5459 views)

On Sat, Oct 11, 2008 at 12:31 PM, John Goulah <jgoulah@gmail.com> wrote:
> On Fri, Oct 10, 2008 at 6:17 PM, J. Shirley <jshirley@gmail.com> wrote:
>> On Fri, Oct 10, 2008 at 2:22 PM, Ash Berlin <ash_cpan@firemirror.com> wrote:
>>>
>>> On 10 Oct 2008, at 20:46, J. Shirley wrote:
>>>
>>>>
>>>>
>>>> I really really really don't want to start a (very much so offtopic)
>>>> flamewar, but I would like to get a discussion going about this versus
>>>> TheSchwartz. It seems roughly similar (at least in function).
>>>
>>> TBH one of the reasons I avoided TheSchwartz was that I couldn't work out
>>> what was going on. I did feel kinda iffy about wheel re-invention here, but
>>> there was something about TheSchwartz when i looked at that didn't sit well
>>> with me. Can't remember what it was anymore.
>>>
>>>>
>>>>
>>>> Here are the features that TheSchwartz has that I didn't see in
>>>> MooseX::JobQueue (and yes, please name it something other than
>>>> MooseX::JobQueue)
>>>>
>>>> The following are handled because of Data::ObjectDriver, but want to
>>>> list them as features anyway:
>>>> 1. Partitioning of jobs in the database
>>>> 2. Built-in replication handling
>>>
>>> Not really sure what these two things are? Shouldn't replication be done at
>>> a DB level? Partitioning - as having jobs live in two different tables/DBs?
>>> If so then App::JobQueue (lets call it that for lack of a better
>>> alternative) does that.
>>>
>>
>> Well, I mean horizontal partitioning. So, automatic partitioning
>> based on some algorithm (like "if job->id % 2 => use this cluster").
>>
>> I didn't realize it did that... couldn't find that bit.
>>
>> As far as replication goes, DBIC handles some replication schemes but
>> there isn't the same support that D::OD has. I'm not championing
>> D::OD at all here, I prefer DBIC for all things; however D::OD has a
>> lot of code to support multiplexing and caching that DBIC hasn't
>> culled yet.
>>
>> So, while replication happens at the database layer, the interactions
>> there require client side behaviors. Such as reading from slaves,
>> write to masters, etc. DBIC already has basic slave/master support
>> but without support for slave read-delay (which is unfortunately
>> application specific in most cases) App::JobQueue won't have that...
>>
>> Which means worse replication support than TheSchwartz.
>
>
> This is correct. When it was put in production on several servers
> under a replicated DBIC things went a bit haywire with the job locking
> I believe when slaves got delayed and we had to point all queries at
> the master. Otherwise it does scale beautifully to multiple machines.
> I wonder what the best solution is here.
>
> John
>

I've spent a great deal of time thinking about it in the past and the
best solution I ever came up with was wrapping it in transactions when
you do a write and need to read the up-to-date information (meaning
that in a transaction, the read source is always the write source,
period.)

It does restrict some flexibility in the application, but I believe
that it is worth it for a few reasons. Mostly, it keeps the
application structure sane (and also thins controllers naturally).
You can put an intermediate "caching" layer (or, rather, data access)
that gets updated in a single API, so you have better testability. It
ends up being slightly more code, which is slightly slower, but it
scales near-linearly that way.

In the context of a job queue, the slave needs to access the most
up-to-date information on the job status (to make sure that there
isn't competition) so there will always be a read on the master to
determine the job state. After that, to query any other information
you could query a slave and disregard any read-delay, since in theory
once the job is assigned to a worker, it shouldn't be written to
except by that worker (or the master that marks the worker as
stalled/dead).

One other problem I ran into with TheSchwartz is that the job
execution time would occasionally hang, triggering jobs that stack on
top of each other. So, sending a SIG to notify the working child that
their execution time is up would be very nice. That way it can back
out/stop working, and exit gracefully rather than have two competing
workers on the same resource (just thinking of parallelization cases
for master/slave scaling)

-J

_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev

Re: Paging Ash Berlin [ In reply to ]

orasnita at gmail

Oct 12, 2008, 7:16 AM

Post #9 of 10 (5463 views)

From: "J. Shirley" <jshirley@gmail.com>
> I really really really don't want to start a (very much so offtopic)
> flamewar, but I would like to get a discussion going about this versus
> TheSchwartz. It seems roughly similar (at least in function).
>
> Here are the features that TheSchwartz has that I didn't see in
> MooseX::JobQueue (and yes, please name it something other than
> MooseX::JobQueue)

Is MooseX::JobQueue very different than MooseX::TheSchwartz?

I've seen that I can install MooseX::TheSchwartz even under Windows so it
could be a good replacement for TheSchwartz.

Thanks.

Octavian

_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev

Re: Paging Ash Berlin [ In reply to ]

jshirley at gmail

Oct 12, 2008, 7:28 AM

Post #10 of 10 (5444 views)

On Sun, Oct 12, 2008 at 7:16 AM, Octavian Rasnita <orasnita@gmail.com> wrote:
> From: "J. Shirley" <jshirley@gmail.com>
>>
>> I really really really don't want to start a (very much so offtopic)
>> flamewar, but I would like to get a discussion going about this versus
>> TheSchwartz. It seems roughly similar (at least in function).
>>
>> Here are the features that TheSchwartz has that I didn't see in
>> MooseX::JobQueue (and yes, please name it something other than
>> MooseX::JobQueue)
>
> Is MooseX::JobQueue very different than MooseX::TheSchwartz?
>
> I've seen that I can install MooseX::TheSchwartz even under Windows so it
> could be a good replacement for TheSchwartz.
>

It is API compatible but does not use Data::ObjectDriver.

_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev