Mailing List Archive

Plugins vs. Engines, Dispatchers...
Hy!

First of all, I agree with those who think htat unicode support should
be part of catalyst core, but this message is about discussing my
impressions on how catalyst may can better support plugins.

The whole thing started with the problem that C::P::Unicode::Encoding
don't decodes action arguments (and action names, really the entire uri
before query parameters). I have made a test case for the argument part
of the issue (committed[1] as rev #13007).

I've been digging into the code to find out how we solve that problem,
and it seems pretty hard...

The problem is that to touch the arguments we may need to hook methods
in the dispatcher, or in the engine. For this, because the plugins
loaded before both engine and dispatcher, we should hook setup_engine /
_dispatcher to then hook the engine / dispatcher when they loaded.
This should work but really complicated..

One solution may be to load plugins after the two, but it *may* break
existing plugins, if they do something simlar described above. (and I
don't even know if there is a good reason for this loading order)

Another and maybe better solution is to have a list of roles that shoild
be applied to engines / dispatcher as soon as they loaded, so plugins
simply populate the lists at their loading time, and setup_engine /
_dispatcher will take care of the rest! This may be a far more elegant
method than every plugin implement their role loadings into engine /
dispatcher classes!

Cheers!
u-foka

P.S.: Please forgive me if I understand something wrond, because, even
if I already found some time to read the code, catalyst is relatively
new to me.

[1] http://dev.catalystframework.org/svnweb/Catalyst/revision?rev=13007
Re: Plugins vs. Engines, Dispatchers... [ In reply to ]
Hi,

2010/3/6 Eisenberger Tamás <tamas@eisenberger.hu>:
> The whole thing started with the problem that C::P::Unicode::Encoding don't
> decodes  action arguments (and action names, really the entire uri before
> query parameters).

URL with UTF8 characters? Is there a RFC or a draft that allows for that?

Just curious, I was under the impression that URLs where still
US-ASCII only, but I bet I missed a RFC somewhere.

Bye,
--
Pedro Melo
http://www.simplicidade.org/
xmpp:melo@simplicidade.org
mailto:melo@simplicidade.org

_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev
Re: Plugins vs. Engines, Dispatchers... [ In reply to ]
Hy!

I don't know the exact rfc that describes this, but uri escaping seems
handle the problem well, think about query parameters, as they are part
of the same uri! And after I pass the arguments trough utf::decode the
characters appears correctly, the problem is only that uncode::encoding
only takes care of query parameters, not the entire uri...

and this is why i'm looking to engine->unescape_uri may can be hooked to
decode the characters

On 03/06/2010 10:26 PM, Pedro Melo wrote:
> Hi,
>
> 2010/3/6 Eisenberger Tamás <tamas@eisenberger.hu>:
>
>> The whole thing started with the problem that C::P::Unicode::Encoding don't
>> decodes action arguments (and action names, really the entire uri before
>> query parameters).
>>
> URL with UTF8 characters? Is there a RFC or a draft that allows for that?
>
> Just curious, I was under the impression that URLs where still
> US-ASCII only, but I bet I missed a RFC somewhere.
>
> Bye,
>
Re: Plugins vs. Engines, Dispatchers... [ In reply to ]
Hi,

2010/3/6 Eisenberger Tamás <tamas@eisenberger.hu> top posted:
> I don't know the exact rfc that describes this, but uri escaping seems
> handle the problem well, think about query parameters, as they are part
> of the same uri! And after I pass the arguments trough utf::decode the
> characters appears correctly, the problem is only that uncode::encoding
> only takes care of query parameters, not the entire uri...

My question about RFC was to figure out how can you be sure that the
user-agent used UTF8 encoding.

How can you know that I sent you ááá encoded as utf8 or iso-latin-1?

Without a clear indication from the user-agent, how can you know?

Bye,
--
Pedro Melo
http://www.simplicidade.org/
xmpp:melo@simplicidade.org
mailto:melo@simplicidade.org

_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev
Re: Plugins vs. Engines, Dispatchers... [ In reply to ]
Hy!

It may be RFC 3986[1], according to [2]:
"The generic URI syntax mandates that new URI schemes that provide for
the representation of character data in a URI must, in effect, represent
characters from the unreserved set without translation, and should
convert all other characters to bytes according to UTF-8, and then
percent-encode those values. This requirement was introduced in January
2005 with the publication of RFC 3986."

[1] http://www.ietf.org/rfc/rfc3986.txt
[2] http://en.wikipedia.org/wiki/Percent-encoding

On 03/06/2010 10:45 PM, Pedro Melo wrote:
> Hi,
>
> 2010/3/6 Eisenberger Tamás <tamas@eisenberger.hu> top posted:
>
>> I don't know the exact rfc that describes this, but uri escaping seems
>> handle the problem well, think about query parameters, as they are part
>> of the same uri! And after I pass the arguments trough utf::decode the
>> characters appears correctly, the problem is only that uncode::encoding
>> only takes care of query parameters, not the entire uri...
>>
> My question about RFC was to figure out how can you be sure that the
> user-agent used UTF8 encoding.
>
> How can you know that I sent you ááá encoded as utf8 or iso-latin-1?
>
> Without a clear indication from the user-agent, how can you know?
>
> Bye,
>
Re: Plugins vs. Engines, Dispatchers... [ In reply to ]
Pedro Melo wrote:
> Hi,
>
> 2010/3/6 Eisenberger Tamás <tamas@eisenberger.hu> top posted:
>> I don't know the exact rfc that describes this, but uri escaping seems
>> handle the problem well, think about query parameters, as they are part
>> of the same uri! And after I pass the arguments trough utf::decode the
>> characters appears correctly, the problem is only that uncode::encoding
>> only takes care of query parameters, not the entire uri...
>
> My question about RFC was to figure out how can you be sure that the
> user-agent used UTF8 encoding.
>
> How can you know that I sent you ááá encoded as utf8 or iso-latin-1?
>
> Without a clear indication from the user-agent, how can you know?
>
> Bye,

I seem to recall you can explicitly encode into your HTTP header or your HTML
forms as to what text encoding you want data submitted with. At least, I know
(just checked) that an HTML "form" tag can have "enctype" and "accept-charset"
attributes, and so you can at least tell clients what you expect them to send
you. Then if submissions look like valid data of the kind you said you expect,
you should just be able to interpret it as such, and if that doesn't work then
the client has a problem. -- Darren Duncan


_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev
Re: Plugins vs. Engines, Dispatchers... [ In reply to ]
On 6 Mar 2010, at 19:43, Eisenberger Tamás wrote:

> First of all, I agree with those who think htat unicode support
> should be part of catalyst core, but this message is about
> discussing my impressions on how catalyst may can better support
> plugins.

Good stuff, so do I :)

The reason it isn't there yet is that people haven't yet worked on it.

Please feel free to branch and start, people will be happy to be
pestered with questions/help you out. :_)

> The whole thing started with the problem that
> C::P::Unicode::Encoding don't decodes action arguments (and action
> names, really the entire uri before query parameters). I have made a
> test case for the argument part of the issue (committed[1] as rev
> #13007).
>
> I've been digging into the code to find out how we solve that
> problem, and it seems pretty hard...
>
> The problem is that to touch the arguments we may need to hook
> methods in the dispatcher, or in the engine. For this, because the
> plugins loaded before both engine and dispatcher, we should hook
> setup_engine / _dispatcher to then hook the engine / dispatcher when
> they loaded.
> This should work but really complicated..

Yup. And when you look into Catalyst::Plugin::Unicode::Encoding, what
it's already doing messes with the internal state of several things.
Not very pretty.

This and the invasiveness as what convinced me that the only way to do
it correctly was to move unicode handling into core.

> One solution may be to load plugins after the two, but it *may*
> break existing plugins, if they do something simlar described above.
> (and I don't even know if there is a good reason for this loading
> order)
>
> Another and maybe better solution is to have a list of roles that
> shoild be applied to engines / dispatcher as soon as they loaded, so
> plugins simply populate the lists at their loading time, and
> setup_engine / _dispatcher will take care of the rest! This may be a
> far more elegant method than every plugin implement their role
> loadings into engine / dispatcher classes!

I see what you're getting at here.

The current best solution is to forcibly use CatalystX::RoleApplicator
on the application after setup_finalize.. Which isn't amazing, but
nobody has tried to propose a better API really.

> P.S.: Please forgive me if I understand something wrond, because,
> even if I already found some time to read the code, catalyst is
> relatively new to me.

No, I think you're spot on with your conclusions here.

> [1] http://dev.catalystframework.org/svnweb/Catalyst/revision?
> rev=13007

Thanks for the reminder (again), I'm going to have a look into this now.

Cheers
t0m


_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev
Re: Plugins vs. Engines, Dispatchers... [ In reply to ]
* Darren Duncan <darren@darrenduncan.net> [2010-03-06 23:10]:
> I seem to recall you can explicitly encode into your HTTP
> header or your HTML forms as to what text encoding you want
> data submitted with. At least, I know (just checked) that an
> HTML "form" tag can have "enctype" and "accept-charset"
> attributes, and so you can at least tell clients what you
> expect them to send you.

That’s only in theory unfortunately. In practice browsers will
typically send data in the same encoding that they received the
page in, but even that isn’t entirely certain. (I don’t recall
the specifics, only that that rule of thumb works in most cases
and that the other cases are a big mess.) The only 100% reliable
solution is <http://search.cpan.org/perldoc?Encode::HEBCI>.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

_______________________________________________
Catalyst-dev mailing list
Catalyst-dev@lists.scsys.co.uk
http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst-dev