Mailing List Archive

1 2 3  View All
Re: [patch 05/11] syslets: core code [ In reply to ]
* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, 15 Feb 2007, Linus Torvalds wrote:
> >
> > So I think that a good implementation just does everything up-front,
> > and doesn't _need_ a user buffer that is live over longer periods,
> > except for the actual results. Exactly because the whole
> > alloc/teardown is nasty.
>
> Btw, this doesn't necessarily mean "not supporting multiple atoms at
> all".
>
> I think the batching of async things is potentially a great idea. I
> think it's quite workable for "open+fstat" kind of things, and I agree
> that it can solve other things too (the
> "socket+bind+connect+sendmsg+rcv" kind of complex setup things).
>
> But I suspect that if we just said:
> - we limit these atom sequences to just linear sequences of max "n" ops
> - we read them all in in a single go at startup
>
> we actually avoid several nasty issues. Not just the memory allocation
> issue in user space (now it's perfectly ok to build up a sequence of
> ops in temporary memory and throw it away once it's been submitted),
> but also issues like the 32-bit vs 64-bit compatibility stuff (the
> compat handlers would just convert it when they do the initial
> copying, and then the actual run-time wouldn't care about user-level
> pointers having different sizes etc).
>
> Would it make the interface less cool? Yeah. Would it limit it to just
> a few linked system calls (to avoid memory allocation issues in the
> kernel)? Yes again. But it would simplify a lot of the interface
> issues.
>
> It would _also_ allow the "sys_aio_read()" function to build up its
> *own* set of atoms in kernel space to actually do the read, and there
> would be no impact of the actual run-time wanting to read stuff from
> user space. Again - it's actually the same issue as with the compat
> system call: by making the interfaces do things up-front rather than
> dynamically, it becomes more static, but also easier to do interface
> translations. You can translate into any arbitrary internal format
> _once_, and be done with it.
>
> I dunno.

[. hm. I again wrote a pretty long email for you to read. Darn! ]

regarding the API - i share most of your concerns, and it's all a
function of how widely we want to push this into user-space.

My initial thought was for syslets to be used by glibc as small, secure
kernel-side 'syscall plugins' mainly - so that it can do things like
'POSIX AIO signal notifications' (which are madness in terms of
performance, but which applications rely on) /without/ having to burden
the kernel-side AIO with such requirements: glibc just adds an enclosing
sys_kill() to the syslet and it will do the proper signal notification,
asynchronously. (and of course syslets can be used for the Tux type of
performance sillinesses as well ;-)

So a sane user API (all used at the glibc level, not at application
level) would use simple syslets, while more broken ones would have to
use longer ones - but nobody would have the burden of having to
synchronize back to the issuer context. Natural selection will gravitate
application use towards the APIs with the shorter syslets. (at least so
i hope)

In this model syslets arent really user-programmable entities but rather
small plugins available to glibc to build up more complex, more
innovative (or just more broken) APIs than what the kernel wants to
provide - without putting any true new ABI dependency on the kernel,
other than the already existing syscall ABIs.

But if we'd like glibc to provide this to applications in some sort of
standardized /programmable/ manner, with a wide range of atom selections
(not directly coded syscall numbers, but rather as function pointers to
actual glibc functions, which glibc could translate to syscall numbers,
argument encodings, etc.), then i agree that doing the compat things and
making it 32/64-bit agnostic (and much more) is pretty much a must. If
90% of this current job is finished then sorting those out will at least
be another 90% of the work ;-)

and actually this latter model scares me, and i think that model scared
the hell out of you as well.

But i really have no strong opinion about which one we want yet, without
having walked the path. Somewhere inside me i'd of course like syslets
to become a widely available interface - but my fear is that it might
just not be 'human' enough to make sense - and we'd just not want to tie
us down with an ABI that's not used. I dont want this to become another
sys_sendfile - much talked about and _almost_ useful but in practice
seldom used due to its programmability and utility limitations.

OTOH, the syslet concept right now already looks very ubiquitous, and
the main problem with AIO use in applications wasnt just even its broken
API or its broken performance, but the fundamental lack of all Linux IO
disciplines supporting AIO, and the lack of significantly parallel
hardware. We have kaio that is centered around block drivers - then we
have epoll that works best with networking, and inotify that deals with
some (but not all) VFS events - but neither supports every IO and event
disciple well, at once. My feeling is that /this/ is the main
fundamental problem with AIO in general, not just its programmability
limitations.

Right now i'm concentrating on trying to build up something on the
scheduling side that shows the issues in practice, shows the limitations
and shows the possibilities. For example the easy ability to turn a
cachemiss thread back into a user thread (and then back into a cachemiss
thread) was a true surprise to me which increased utility quite a bit. I
couldnt have designed it into the concept because it just didnt occur to
me in the early stages. The notification ring related limitations you
noticed is another important thing to fix - and these issues go to the
core scheduling model of the concept and affect everything.

Thirdly, while Tux does not matter much to us, at least to me it is
pretty clear what it takes to get performance up to the levels of Tux -
and i dont see any big fundamental compromise possible on that front.
Syslets are partly Tux repackaged into something generic - they are
probably a bit slower than straight kernel code Tux, but not by much and
it's also not behaving fundamentally differently. And if we dont offer
at least something close to those possibilities then people will
re-start trying to add those special-purpose state machine APIs again,
and the whole "we need /true/ async IO" game starts again.

So if we accept "make parallelism easier to program" and "get somewhat
close to Tux's performance and scalability" as a premise (which you
might not agree with in that form), then i dont think there's much
choice we have: either we use kernel threads, synchronous system calls
and the scheduler intelligently (and the scheduling/threading bits of
syslets are pretty much the most intelligent kernel thread based
approach i can imagine at the moment =B-) or we use a special-purpose
KAIO state machine subsystem, avoiding most of the existing synchronous
infrastructure, painfully coding it into every IO discipline - and this
will certainly haunt us until the end of times.

So that's why i'm not /that/ much worried about the final form of the
API at the moment - even though i agree that it is /the/ most important
decision factor in the end: i see various unavoidable externalities
forcing us very much, and in the end we either like the result and make
it available to programmers, or we dont, and limit it to system-glue
glibc use - or we throw it away altogether. I'm curious about the end
result even if it gets limited or gets thrown away (joining 4:4 on the
way to the bit bucket ;) and while i'm cautiously optimistic that
something useful can come out of this, i cannot know it for sure at the
moment.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/11] syslets: core code [ In reply to ]
On Fri, Feb 16, 2007 at 01:28:06PM +0100, Ingo Molnar (mingo@elte.hu) wrote:
> OTOH, the syslet concept right now already looks very ubiquitous, and
> the main problem with AIO use in applications wasnt just even its broken
> API or its broken performance, but the fundamental lack of all Linux IO
> disciplines supporting AIO, and the lack of significantly parallel
> hardware. We have kaio that is centered around block drivers - then we
> have epoll that works best with networking, and inotify that deals with
> some (but not all) VFS events - but neither supports every IO and event
> disciple well, at once. My feeling is that /this/ is the main
> fundamental problem with AIO in general, not just its programmability
> limitations.

That is quite dissapointing to hear when weekely released kevent can
solve that problem already more than year ago - it was designed specially to
support every possible notification types and does support file
descriptor ones, VFS (dropped in current releases to reduce size) and
tons of other including POSIX times, signals, own high-performance AIO
(which was created as a a bit complex state machine over internals of
page population code) and essentially everything one can ever imagine
with quite a bit of code needed for new type.

I was requested to add waiting for futex through kevent queue - that is
quite simple task, but having complete lack of feedback and ignorance of
the project even from people who asked about its features, it looks like
there is no need for that at all.

--
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/11] syslets: core code [ In reply to ]
On Fri, 16 Feb 2007, Evgeniy Polyakov wrote:
>
> Interfaces can be created and destroyed - they do not affect overall
> system design in anyway (well, if they do, something is broken).

I'm sorry, but you've obviously never maintained any piece of software
that actually has users.

As long as you think that interfaces can change, this discussion is
pointless.

So go away, ponder things.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/11] syslets: core code [ In reply to ]
On Fri, Feb 16, 2007 at 07:54:22AM -0800, Linus Torvalds (torvalds@linux-foundation.org) wrote:
> > Interfaces can be created and destroyed - they do not affect overall
> > system design in anyway (well, if they do, something is broken).
>
> I'm sorry, but you've obviously never maintained any piece of software
> that actually has users.

Strong. But saying for others usualy tends to show own problems.

> As long as you think that interfaces can change, this discussion is
> pointless.

That is too cool phrase to be heared - if you will make me a favour and
reread what was written you will (hopefully) detect that there were no
words about interfaces being changed after put into the wild - talk was
only about time when system is designed and implemented, and there is
time for discussion about its rough edges - if its design is good, then
interface can be changed in a moment without any problem - that is what
we see with syslets right now - they are designed and implemented (the
formed was done several years ago), and it is time to shape its edges -
like change userspace API - it is easy, but you do not (want/like to)
see that.

> So go away, ponder things.

But my above words are too lame for self-hearing Olympus liver.
Definitely.

> Linus

--
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/11] syslets: core code [ In reply to ]
On 2/16/07, Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:
> if its design is good, then
> interface can be changed in a moment without any problem

This isn't always the case. Sometimes the interface puts requirements
(contract-like) upon the implementation. Case in point in the kernel,
dnotify versus inotify. dnotify is a steaming pile of worthlessness,
because it's userspace interface is so bad (meaning inefficient) as to
be nearly unusable.

inotify has a different interface, one that supplies details about
events rather that mere notice that an event occurred, and therefore
has different requirements in implementation. dnotify probably was a
good design, but for a worthless interface.

The interface isn't always important, but it's certainly something
that has to be understood before putting the finishing touches on the
behind-the-scenes implementation.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/11] syslets: core code [ In reply to ]
On Fri, Feb 16, 2007 at 08:53:30AM -0800, Ray Lee (madrabbit@gmail.com) wrote:
> On 2/16/07, Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:
> >if its design is good, then
> >interface can be changed in a moment without any problem
>
> This isn't always the case. Sometimes the interface puts requirements
> (contract-like) upon the implementation. Case in point in the kernel,
> dnotify versus inotify. dnotify is a steaming pile of worthlessness,
> because it's userspace interface is so bad (meaning inefficient) as to
> be nearly unusable.
>
> inotify has a different interface, one that supplies details about
> events rather that mere notice that an event occurred, and therefore
> has different requirements in implementation. dnotify probably was a
> good design, but for a worthless interface.
>
> The interface isn't always important, but it's certainly something
> that has to be understood before putting the finishing touches on the
> behind-the-scenes implementation.

Absolutely.
And if overall system design is good, there is no problem to change
(well, for those who fail to read to the end and understand my english
replace 'to change' with 'to create and commit') interface to the state
where it will satisfy all (majority of) users.

Situations when system is designed from interface down to system ends up
with one thread per IO and huge limitations on how system is going to be
used at all.

> Ray

--
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/11] syslets: core code [ In reply to ]
On Fri, Feb 16, 2007 at 07:58:54PM +0300, Evgeniy Polyakov wrote:
| Absolutely.
| And if overall system design is good, there is no problem to change
| (well, for those who fail to read to the end and understand my english
| replace 'to change' with 'to create and commit') interface to the state
| where it will satisfy all (majority of) users.
|
| Situations when system is designed from interface down to system ends up
| with one thread per IO and huge limitations on how system is going to be
| used at all.
|
| --
| Evgeniy Polyakov

I'm sorry for meddling in conversation but I think Linus misunderstood
you. If I'm right you propose to "create and commit" _new_ interfaces
only? I mean _changing_ of interfaces exported to user space is
very painfull... for further support. Don't swear at me if I wrote
something stupid ;)

--

Cyrill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/11] syslets: core code [ In reply to ]
Evgeniy Polyakov wrote:
> On Fri, Feb 16, 2007 at 08:53:30AM -0800, Ray Lee (madrabbit@gmail.com) wrote:
>> On 2/16/07, Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:
>>> if its design is good, then
>>> interface can be changed in a moment without any problem
>> This isn't always the case. Sometimes the interface puts requirements
>> (contract-like) upon the implementation. Case in point in the kernel,
>> dnotify versus inotify. dnotify is a steaming pile of worthlessness,
>> because it's userspace interface is so bad (meaning inefficient) as to
>> be nearly unusable.
>>
>> inotify has a different interface, one that supplies details about
>> events rather that mere notice that an event occurred, and therefore
>> has different requirements in implementation. dnotify probably was a
>> good design, but for a worthless interface.
>>
>> The interface isn't always important, but it's certainly something
>> that has to be understood before putting the finishing touches on the
>> behind-the-scenes implementation.
>
> Absolutely.
> And if overall system design is good,

dnotify was a good system design for a stupid (or misunderstood) problem.

> there is no problem to change
> (well, for those who fail to read to the end and understand my english
> replace 'to change' with 'to create and commit') interface to the state
> where it will satisfy all (majority of) users.

You might be right, but the point I (and others) are trying to make is
that there are some cases where you *really* need to understand the
users of the interface first. You might have everything else right
(userspace wants to know when filesystem changes occur, great), but if
you don't know what form those notifications have to look like, you'll
end up doing a lot of wasted work on a worthless piece of code that no
one will ever use.

Sometimes the interface really is the most important thing. Just like a
contract between people.

(This is probably why, by the way, most people are staying silent on
your excellent kevent work. The kernel side is, in some ways, the easy
part. It's getting an interface that will handle all users [ users ==
producers and consumers of kevents ], that is the hard bit.)

Or, let me put it yet another way: How do you prove to the rest of us
that you, or Ingo, or whomever, are not building another dnotify? (Maybe
you're smart enough in this problem space that you know you're not --
that's actually the most likely possibility. But you still have to prove
it to the rest of us. Sucks, I know.)

> Situations when system is designed from interface down to system ends up
> with one thread per IO and huge limitations on how system is going to be
> used at all.

The other side is you start from the goal in mind and get Ingo's state
machines with loops and conditionals and marmalade in syslets which
appear a bit baroque and overkill for the majority of us userspace folk.

(No offense intended to Ingo, he's obviously quite a bit more conversant
on the needs of high speed interfaces than I am. However, I suspect I
have a bit more clarity on what us normal folk would actually use, and
kernel driven FSMs ain't it. Userspace often makes a lot of contextual
decisions that I would absolutely *hate* to write and debug as a state
machine that gets handed off to the kernel. I'll happily take a 10% hit
in efficiency that Moore's law will get me back in a few months, instead
of spending a bunch of time debugging difficult heisenbugs due to the
syslet FSM reading a userspace variable at a slightly different time
once in a blue moon. OTOH, I'm also not Oracle, so what do I know?)

The truth of this lies somewhere in the middle. It isn't kernel driven,
or userspace interface driven, but a tradeoff between the two.

So:

> Userspace_API_is_the_ever_possible_last_thing_to_ever_think_about.
> Period

Please listen to those of us who are saying that this might not be the
case. Maybe we're idiots, but then again maybe we're not, okay?
Sometimes the API really *DOES* change the underlying implementation.

Ray
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/11] syslets: core code [ In reply to ]
On Fri, Feb 16, 2007 at 11:20:36PM +0300, Cyrill V. Gorcunov (gorcunov@gmail.com) wrote:
> On Fri, Feb 16, 2007 at 07:58:54PM +0300, Evgeniy Polyakov wrote:
> | Absolutely.
> | And if overall system design is good, there is no problem to change
> | (well, for those who fail to read to the end and understand my english
> | replace 'to change' with 'to create and commit') interface to the state
> | where it will satisfy all (majority of) users.
> |
> | Situations when system is designed from interface down to system ends up
> | with one thread per IO and huge limitations on how system is going to be
> | used at all.
> |
> | --
> | Evgeniy Polyakov
>
> I'm sorry for meddling in conversation but I think Linus misunderstood
> you. If I'm right you propose to "create and commit" _new_ interfaces
> only? I mean _changing_ of interfaces exported to user space is
> very painfull... for further support. Don't swear at me if I wrote
> something stupid ;)

Yes, I only proposed to change what Ingo has right now - although it is
usable, but it does suck, but since overall syslet design is indeed good
it does not suffer from possible interface changes - so I said that it
can be trivially changed in that regard that until it is committed
anything can be done to extend it.

> --
>
> Cyrill

--
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/11] syslets: core code [ In reply to ]
On Fri, Feb 16, 2007 at 08:54:11PM -0800, Ray Lee (ray-lk@madrabbit.org) wrote:
> (This is probably why, by the way, most people are staying silent on
> your excellent kevent work. The kernel side is, in some ways, the easy
> part. It's getting an interface that will handle all users [ users ==
> producers and consumers of kevents ], that is the hard bit.)

Kevent interface was completely changed 4 (!) times for the last year
after kernel developers request without any damage to its kernel part.

> Or, let me put it yet another way: How do you prove to the rest of us
> that you, or Ingo, or whomever, are not building another dnotify? (Maybe
> you're smart enough in this problem space that you know you're not --
> that's actually the most likely possibility. But you still have to prove
> it to the rest of us. Sucks, I know.)

I only want to say that when system is designed correctly there is no
problem to change interface (yes, I again said 'to change' just because
I hope everyone understand that I'm talking about time when system is
not yet committed to the tree).

Btw, dnotify had problems in its design highlighted at inotify statrt -
mainly that watchers were not attached to inode.

It is right now the time to ask users what interface they expect from
AIO - so I asked Linus and proposed three different ones, two of them
were designed in a way that user would not even know that some
allocation/freeing was done - and as a result I got 'you suck' response
exactly the same as was returned on the first syslet release - just
_anly_ fscking _just_ because it had ugly interface.

> > Situations when system is designed from interface down to system ends up
> > with one thread per IO and huge limitations on how system is going to be
> > used at all.
>
> The other side is you start from the goal in mind and get Ingo's state
> machines with loops and conditionals and marmalade in syslets which
> appear a bit baroque and overkill for the majority of us userspace folk.

Well, I designed kevent AIO in the similar way, but it has even more
complex one which is built on top of internal page population functions.

It is complex a bit, but it works fast. And it works with any type (if I
would not be lazy and implement bindings) of AIO.

Interface of syslets is not perfect, but it can be changed (I said it
again? I think we all understand what I mean by that already) trivially
right now (before it is included) - it is not the way to throw thing
just because it has bad interface which can be extended in a moment.

> (No offense intended to Ingo, he's obviously quite a bit more conversant
> on the needs of high speed interfaces than I am. However, I suspect I
> have a bit more clarity on what us normal folk would actually use, and
> kernel driven FSMs ain't it. Userspace often makes a lot of contextual
> decisions that I would absolutely *hate* to write and debug as a state
> machine that gets handed off to the kernel. I'll happily take a 10% hit
> in efficiency that Moore's law will get me back in a few months, instead
> of spending a bunch of time debugging difficult heisenbugs due to the
> syslet FSM reading a userspace variable at a slightly different time
> once in a blue moon. OTOH, I'm also not Oracle, so what do I know?)
>
> The truth of this lies somewhere in the middle. It isn't kernel driven,
> or userspace interface driven, but a tradeoff between the two.
>
> So:
>
> > Userspace_API_is_the_ever_possible_last_thing_to_ever_think_about.
> > Period
>
> Please listen to those of us who are saying that this might not be the
> case. Maybe we're idiots, but then again maybe we're not, okay?
> Sometimes the API really *DOES* change the underlying implementation.

It is exactly the time to say what interface sould be good.
System is almost ready - it is time to make it looks cool for users.

> Ray

--
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/11] syslets: core code [ In reply to ]
Evgeniy Polyakov wrote:
> Ray Lee (ray-lk@madrabbit.org) wrote:
> > The truth of this lies somewhere in the middle. It isn't kernel driven,
> > or userspace interface driven, but a tradeoff between the two.
> >
> > So:
> > > Userspace_API_is_the_ever_possible_last_thing_to_ever_think_about.
> > > Period
> >
> > Please listen to those of us who are saying that this might not be the
> > case. Maybe we're idiots, but then again maybe we're not, okay?
> > Sometimes the API really *DOES* change the underlying implementation.
>
> It is exactly the time to say what interface sould be good.
> System is almost ready - it is time to make it looks cool for users.

IMHO, what is needed is an event registration switch-board that handles
notifications from the kernel and the user side respectively.


Thanks!

--
Al

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/11] syslets: core code [ In reply to ]
On Sat, Feb 17, 2007 at 01:02:00PM +0300, Evgeniy Polyakov wrote:
[... snipped ...]

| Yes, I only proposed to change what Ingo has right now - although it is
| usable, but it does suck, but since overall syslet design is indeed good
| it does not suffer from possible interface changes - so I said that it
| can be trivially changed in that regard that until it is committed
| anything can be done to extend it.
|
| --
| Evgeniy Polyakov
|

I think Evgeniy - you are right! For times of research _changing_ a lot
of things is almost a low. syslets are in test area and why should we
bound ourself in survey of best. If something in syslets is sucks so
lets change it as early as possible. Of course I mean no more interface
changing after some _commit_ point (and that should be Linus decision).

--

Cyrill

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/11] syslets: core code [ In reply to ]
Hi!

> > The upcall will setup a frame, execute the clet (where jump/conditions and
> > userspace variable changes happen in machine code - gcc is pretty good in
> > taking care of that for us) on its return, come back through a
> > sys_async_return, and go back to userspace.
>
> So, for example, this is the setup code for the current API (and that's a
> really simple one - immagine going wacko with loops and userspace varaible
> changes):
>
>
> static struct req *alloc_req(void)
> {
> /*
> * Constants can be picked up by syslets via static variables:
> */
> static long O_RDONLY_var = O_RDONLY;
> static long FILE_BUF_SIZE_var = FILE_BUF_SIZE;
>
> struct req *req;
>
> if (freelist) {
> req = freelist;
> freelist = freelist->next_free;
> req->next_free = NULL;
> return req;
> }
>
> req = calloc(1, sizeof(struct req));
>
> /*
> * This is the first atom in the syslet, it opens the file:
> *
> * req->fd = open(req->filename, O_RDONLY);
> *
> * It is linked to the next read() atom.
> */
> req->filename_p = req->filename;
> init_atom(req, &req->open_file, __NR_sys_open,
> &req->filename_p, &O_RDONLY_var, NULL, NULL, NULL, NULL,
> &req->fd, SYSLET_STOP_ON_NEGATIVE, &req->read_file);
>
> /*
> * This second read() atom is linked back to itself, it skips to
> * the next one on stop:
> */
> req->file_buf_ptr = req->file_buf;
> init_atom(req, &req->read_file, __NR_sys_read,
> &req->fd, &req->file_buf_ptr, &FILE_BUF_SIZE_var,
> NULL, NULL, NULL, NULL,
> SYSLET_STOP_ON_NON_POSITIVE | SYSLET_SKIP_TO_NEXT_ON_STOP,
> &req->read_file);
>
> /*
> * This close() atom has NULL as next, this finishes the syslet:
> */
> init_atom(req, &req->close_file, __NR_sys_close,
> &req->fd, NULL, NULL, NULL, NULL, NULL, NULL, 0, NULL);
>
> return req;
> }
>
>
> Here's how your clet would look like:
>
> static long main_sync_loop(ctx *c)
> {
> int fd;
> char file_buf[FILE_BUF_SIZE+1];
>
> if ((fd = open(c->filename, O_RDONLY)) == -1)
> return -1;
> while (read(fd, file_buf, FILE_BUF_SIZE) > 0)
> ;
> close(fd);
> return 0;
> }
>
>
> Kinda easier to code isn't it? And the cost of the upcall to schedule the
> clet is widely amortized by the multple syscalls you're going to do inside
> your clet.

I do not get it. What if clet includes

int *a = 0; *a = 1; /* enjoy your oops, stupid kernel? */

I.e. how do you make sure kernel is protected from malicious clets?

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/11] syslets: core code [ In reply to ]
On Sun, 18 Feb 2007, Pavel Machek wrote:

> > > The upcall will setup a frame, execute the clet (where jump/conditions and
> > > userspace variable changes happen in machine code - gcc is pretty good in
> > > taking care of that for us) on its return, come back through a
> > > sys_async_return, and go back to userspace.
> >
> > So, for example, this is the setup code for the current API (and that's a
> > really simple one - immagine going wacko with loops and userspace varaible
> > changes):
> >
> >
> > static struct req *alloc_req(void)
> > {
> > /*
> > * Constants can be picked up by syslets via static variables:
> > */
> > static long O_RDONLY_var = O_RDONLY;
> > static long FILE_BUF_SIZE_var = FILE_BUF_SIZE;
> >
> > struct req *req;
> >
> > if (freelist) {
> > req = freelist;
> > freelist = freelist->next_free;
> > req->next_free = NULL;
> > return req;
> > }
> >
> > req = calloc(1, sizeof(struct req));
> >
> > /*
> > * This is the first atom in the syslet, it opens the file:
> > *
> > * req->fd = open(req->filename, O_RDONLY);
> > *
> > * It is linked to the next read() atom.
> > */
> > req->filename_p = req->filename;
> > init_atom(req, &req->open_file, __NR_sys_open,
> > &req->filename_p, &O_RDONLY_var, NULL, NULL, NULL, NULL,
> > &req->fd, SYSLET_STOP_ON_NEGATIVE, &req->read_file);
> >
> > /*
> > * This second read() atom is linked back to itself, it skips to
> > * the next one on stop:
> > */
> > req->file_buf_ptr = req->file_buf;
> > init_atom(req, &req->read_file, __NR_sys_read,
> > &req->fd, &req->file_buf_ptr, &FILE_BUF_SIZE_var,
> > NULL, NULL, NULL, NULL,
> > SYSLET_STOP_ON_NON_POSITIVE | SYSLET_SKIP_TO_NEXT_ON_STOP,
> > &req->read_file);
> >
> > /*
> > * This close() atom has NULL as next, this finishes the syslet:
> > */
> > init_atom(req, &req->close_file, __NR_sys_close,
> > &req->fd, NULL, NULL, NULL, NULL, NULL, NULL, 0, NULL);
> >
> > return req;
> > }
> >
> >
> > Here's how your clet would look like:
> >
> > static long main_sync_loop(ctx *c)
> > {
> > int fd;
> > char file_buf[FILE_BUF_SIZE+1];
> >
> > if ((fd = open(c->filename, O_RDONLY)) == -1)
> > return -1;
> > while (read(fd, file_buf, FILE_BUF_SIZE) > 0)
> > ;
> > close(fd);
> > return 0;
> > }
> >
> >
> > Kinda easier to code isn't it? And the cost of the upcall to schedule the
> > clet is widely amortized by the multple syscalls you're going to do inside
> > your clet.
>
> I do not get it. What if clet includes
>
> int *a = 0; *a = 1; /* enjoy your oops, stupid kernel? */
>
> I.e. how do you make sure kernel is protected from malicious clets?

Clets would execute in userspace, like signal handlers, but under the
special schedule() handler. In that way chains happens by the mean of
natural C code, and access to userspace variables happen by the mean of
natural C code too (not with special syscalls to manipulate userspace
memory). I'm not a big fan of chains of syscalls for the reasons I
already explained, but at least clets (or whatever name) has a way lower
cost for the programmer (easier to code than atom chains), and for the
kernel (no need of all that atom handling stuff, no need of limited
cond/jump interpreters in the kernel, and no need of nightmare compat
code).



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Re: [patch 05/11] syslets: core code [ In reply to ]
On 2/18/07, Davide Libenzi <davidel@xmailserver.org> wrote:
> Clets would execute in userspace, like signal handlers,

or like "event handlers" in cooperative multitasking environments
without the Unix baggage

> but under the special schedule() handler.

or, better yet, as the next tasklet in the chain after the softirq
dispatcher, since I/Os almost always unblock as a result of something
that happens in an ISR or softirq

> In that way chains happens by the mean of
> natural C code, and access to userspace variables happen by the mean of
> natural C code too (not with special syscalls to manipulate userspace
> memory).

yep. That way you can exploit this nice hardware block called an MMU.

> I'm not a big fan of chains of syscalls for the reasons I
> already explained,

to a kernel programmer, all userspace programs are chains of syscalls. :-)

> but at least clets (or whatever name) has a way lower
> cost for the programmer (easier to code than atom chains),

except you still have the 80% of the code that is half-assed exception
handling using overloaded semantics on function return values and a
thread-local errno, which is totally unsafe with fibrils, syslets,
clets, and giblets, since none of them promise to run continuations in
the same thread context as the submission. Of course you aren't going
to use errno as such, but that means that async-ifying code isn't
s/syscall/aio_syscall/, it's a complete rewrite. If you're going to
design a new AIO interface, please model it after the only standard
that has ever made deeply pipelined, massively parallel execution
programmer-friendly -- IEEE 754.

> and for the kernel (no need of all that atom handling stuff,

you still need this, but it has to be centered on a data structure
that makes request throttling, dynamic reprioritization, and bulk
cancellation practical

> no need of limited cond/jump interpreters in the kernel,

you still need this, for efficient handling of speculative execution,
pipeline stalls, and exception propagation, but it's invisible to the
interface and you don't have to invent it up front

> and no need of nightmare compat code).

compat code, yes. nighmare, no. Just like kernel FP emulation on any
processor other than an x86. Unimplemented instruction traps. x86 is
so utterly the wrong architecture on which to prototype this it isn't
even funny.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

1 2 3  View All