Mailing List Archive

UTF8 and content length
Hi!

if content type is 'application/json' or 'application/json;
charset=utf-8' Catalyst sets content length in chars, NOT IN BYTES and
I'm getting

{"id":1, "msg":"В Питере

if content type is 'text/html' Catalyst sets content length in bytes
(properly) and everything works fine

Is there any workaround to configure this behaviour, except setting
content length manually everytime ?


my $json_text = '{"id":1, "msg":"В Питере пить"}';

$c->response->content_type('application/json');
$c->response->content_length(bytes::length $json_text);
$c->response->body($json_text);

Thanks in advance
Re: UTF8 and content length [ In reply to ]
* Kroshka Yenot <trashbox@cary.lv> [2016-07-15 13:12]:
> Hi!
>
> if content type is 'application/json' or 'application/json;
> charset=utf-8' Catalyst sets content length in chars, NOT IN BYTES and
> I'm getting
>
> {"id":1, "msg":"В Питере
>
> if content type is 'text/html' Catalyst sets content length in bytes
> (properly) and everything works fine

I am guessing you have an encoding configured in Catalyst? If yes, then
it encodes text/html bodies etc automatically for you, so the body comes
out in bytes, and its length is then correct, so everything works.

> Is there any workaround to configure this behaviour, except setting
> content length manually everytime ?
>
>
> my $json_text = '{"id":1, "msg":"В Питере пить"}';
>
> $c->response->content_type('application/json');
> $c->response->content_length(bytes::length $json_text);
> $c->response->body($json_text);
>
> Thanks in advance

(Side note: if that code works, you must have `use utf8` in effect.
Next time you ask about such a problem, please mention this and any
other relevant parts of your configuration/setup. They are crucial.)

Here you are using bytes::length, which is broken by design and is
always the wrong thing to use (unless you are debugging perl itself or
writing XS code maybe), after putting a character string in the body,
and then relying on the fact that perl falls back to converting char
strings to UTF-8 on output because it can’t do anything else.

This ends up working, but it’s a terrible way to achieve what you need.
It relies on multiple broken things and workarounds cancelling each
other in just the right way to get the correct answer. The clean way to
do this is to simply encode the data before you put it in the body:

use utf8;
my $json_text = '{"id":1, "msg":"В Питере пить"}';

$c->response->content_type('application/json; charset=utf-8');
$c->response->body(Encode::encode_utf8 $json_text);

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>

_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/
Re: UTF8 and content length [ In reply to ]
>> The clean way to do this is to simply encode the data before you
put it in the body:

I fogot or, most likely, didn't realise I need to encode to utf-8 string
wich is already utf8 in sources. I still need to think over this tricky
rocket science, but your solution is working.

σας ευχαριστώ




15.07.2016 15:12, Aristotle Pagaltzis пишет:
> * Kroshka Yenot <trashbox@cary.lv> [2016-07-15 13:12]:
>> Hi!
>>
>> if content type is 'application/json' or 'application/json;
>> charset=utf-8' Catalyst sets content length in chars, NOT IN BYTES and
>> I'm getting
>>
>> {"id":1, "msg":"В Питере
>>
>> if content type is 'text/html' Catalyst sets content length in bytes
>> (properly) and everything works fine
> I am guessing you have an encoding configured in Catalyst? If yes, then
> it encodes text/html bodies etc automatically for you, so the body comes
> out in bytes, and its length is then correct, so everything works.
>
>> Is there any workaround to configure this behaviour, except setting
>> content length manually everytime ?
>>
>>
>> my $json_text = '{"id":1, "msg":"В Питере пить"}';
>>
>> $c->response->content_type('application/json');
>> $c->response->content_length(bytes::length $json_text);
>> $c->response->body($json_text);
>>
>> Thanks in advance
> (Side note: if that code works, you must have `use utf8` in effect.
> Next time you ask about such a problem, please mention this and any
> other relevant parts of your configuration/setup. They are crucial.)
>
> Here you are using bytes::length, which is broken by design and is
> always the wrong thing to use (unless you are debugging perl itself or
> writing XS code maybe), after putting a character string in the body,
> and then relying on the fact that perl falls back to converting char
> strings to UTF-8 on output because it can’t do anything else.
>
> This ends up working, but it’s a terrible way to achieve what you need.
> It relies on multiple broken things and workarounds cancelling each
> other in just the right way to get the correct answer. The clean way to
> do this is to simply encode the data before you put it in the body:
>
> use utf8;
> my $json_text = '{"id":1, "msg":"В Питере пить"}';
>
> $c->response->content_type('application/json; charset=utf-8');
> $c->response->body(Encode::encode_utf8 $json_text);
>
> Regards,


_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/
Re: UTF8 and content length [ In reply to ]
Looks like to a bug to me, although I'm not personally keen on the auto length setting in Catalyst it should be corrected.  I'm happy to get a patch, or at the very least give me a broken test case (checkout https://github.com/perl-catalyst/catalyst-runtime/blob/master/t/utf_incoming.t 
and see if you can help me figure it out -jnap
(created an issues for this, _https://github.com/perl-catalyst/catalyst-runtime/issues/143




On Friday, July 15, 2016 6:07 AM, Kroshka Yenot <trashbox@cary.lv> wrote:


Hi! if content type is 'application/json' or 'application/json; charset=utf-8' Catalyst sets content length in chars, NOT IN BYTES and I'm getting
{"id":1, "msg":"В Питере if content type is 'text/html' Catalyst sets content length in bytes (properly) and everything works fine
Is there any workaround to configure this behaviour, except setting content length manually everytime ?

my $json_text = '{"id":1, "msg":"В Питере пить"}';
$c->response->content_type('application/json');
$c->response->content_length(bytes::length $json_text);
$c->response->body($json_text);
Thanks in advance


_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/
Re: UTF8 and content length [ In reply to ]
>>> Looks like to a bug to me

tl;dr I'm not sure its a Catalyst bug or problem. It's may be MY
configuration problem or standard violation


Here are my investigation results


I created a test to reproduce this situation

# catalyst.pl test

# test/script/test_create.pl view HTML TT

# [editor] test/lib/test/Controller/Root.pm

sub index :Path :Args(0)
{
my ( $self, $c ) = @_;

my $json_text = '{"id":1, "msg":"В Питере пить"}';
$c->response->content_type('application/json');
$c->response->body($json_text);
}


and found following:


wget -S -O - http://domain.tld:3000
--2016-07-20 13:56:18-- http://domain.tld:3000/
Resolving cary.lv (cary.lv)... aaa.bbb.ccc.ddd
Connecting to domain.tld (domain.tld)|aaa.bbb.ccc.ddd|:3000... connected.
HTTP request sent, awaiting response...
HTTP/1.0 200 OK
Date: Wed, 20 Jul 2016 10:56:18 GMT
Server: HTTP::Server::PSGI
Content-Type: application/json
X-Catalyst: 5.90106
Content-Length: 42
Length: 42 [application/json]
Saving to: 'STDOUT'


content-Length is properly set. I see same using Firefox Dev tools

but in the log (build-in test server log)

[debug] Response Code: 200; Content-Type: application/json;
Content-Length: unknown


Exactly same code, but app works as fastcgi daemon and Apache/2.4.23
(FreeBSD) serves http requests

# wget -S -O - http://domain.tld/
--2016-07-20 15:02:28-- http://domain.tld/
Resolving domain.tld (domain.tld)... aaa.bbb.ccc.ddd
Connecting to domain.tld (domain.tld)|aaa.bbb.ccc.ddd|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Wed, 20 Jul 2016 12:02:28 GMT
Server: Apache
Set-Cookie: lang=ru; path=/; expires=Thu, 20-Jul-2017 12:02:28 GMT
Set-Cookie: sid=3b2b88c4106b5e06c0c24a5c3a513ccbcb939299;
domain=domain.tld; path=/; expires=Wed, 20-Jul-2016 12:52:28 GMT; HttpOnly
X-Catalyst: 5.90106
Content-Length: 31
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: application/json
Length: 31 [application/json]


Content length here is in chars not in bytes

A solution by Aristotle Pagaltzis

$c->response->body(Encode::encode_utf8 $json_text);

gives proper content length in this situation

I'm getting same proper content length if I change content type to
'text/html'


Finally, I've discovered Catalyst::View::JSON

and it not only solved this problem for me, but also gave me a much more
comfortable solution to work with json

$c->stash->{msg} = "В Питере пить";
$c->stash->{id} = 1;
$c->forward('View::JSON');

Works like a charm


Taking this opportunity, thank you for this lovely framework!

I'll be happy to provide any additional information if you still
consider there is something should be fixed







19.07.2016 19:10, John Napiorkowski пишет:
> Looks like to a bug to me, although I'm not personally keen on the
> auto length setting in Catalyst it should be corrected. I'm happy to
> get a patch, or at the very least give me a broken test case (checkout
> https://github.com/perl-catalyst/catalyst-runtime/blob/master/t/utf_incoming.t
>
>
> and see if you can help me figure it out -jnap
>
> (created an issues for this,
> _https://github.com/perl-catalyst/catalyst-runtime/issues/143
>
>
>
>
>
> On Friday, July 15, 2016 6:07 AM, Kroshka Yenot <trashbox@cary.lv> wrote:
>
>
> Hi!
> if content type is 'application/json' or 'application/json;
> charset=utf-8' Catalyst sets content length in chars, NOT IN BYTES and
> I'm getting
> {"id":1, "msg":"В Питере
> if content type is 'text/html' Catalyst sets content length in bytes
> (properly) and everything works fine
> Is there any workaround to configure this behaviour, except setting
> content length manually everytime ?
>
> my $json_text = '{"id":1, "msg":"В Питере пить"}';
> $c->response->content_type('application/json');
> $c->response->content_length(bytes::length $json_text);
> $c->response->body($json_text);
> Thanks in advance
>
>
> _______________________________________________
> List: Catalyst@lists.scsys.co.uk <mailto:Catalyst@lists.scsys.co.uk>
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive:
> http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
>
>
>
>
> _______________________________________________
> List: Catalyst@lists.scsys.co.uk
> Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
> Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
> Dev site: http://dev.catalyst.perl.org/
Re: UTF8 and content length [ In reply to ]
So what it looks like to me is that the code that sets a content length if one is not set by the view is not dealing with unicode correctly.  I have another unicode issue I need to look at soonish so I try to see if we can get  a test case for this.  -jnap

On Wednesday, July 20, 2016 8:18 AM, Kroshka Yenot <trashbox@cary.lv> wrote:


>>> Looks like to a bug to me tl;dr   I'm not sure its a Catalyst bug or problem. It's may be MY configuration problem or standard violation

Here are my investigation results

I created a test to reproduce this situation
# catalyst.pl test # test/script/test_create.pl view HTML TT
# [editor]      test/lib/test/Controller/Root.pm
sub index :Path :Args(0)
{
    my ( $self, $c ) = @_;

    my $json_text = '{"id":1, "msg":"В Питере пить"}';
    $c->response->content_type('application/json');
    $c->response->body($json_text);
}

and found following:

 wget -S -O - http://domain.tld:3000
--2016-07-20 13:56:18--  http://domain.tld:3000/
Resolving cary.lv (cary.lv)... aaa.bbb.ccc.ddd
Connecting to domain.tld (domain.tld)|aaa.bbb.ccc.ddd|:3000... connected.
HTTP request sent, awaiting response...
  HTTP/1.0 200 OK
  Date: Wed, 20 Jul 2016 10:56:18 GMT
  Server: HTTP::Server::PSGI
  Content-Type: application/json
  X-Catalyst: 5.90106
  Content-Length: 42
Length: 42 [application/json]
Saving to: 'STDOUT'

content-Length is properly set. I see same using Firefox Dev tools
but in the log (build-in test server log)
[debug] Response Code: 200; Content-Type: application/json; Content-Length: unknown

Exactly same code, but app works as fastcgi daemon and Apache/2.4.23 (FreeBSD) serves http requests
# wget -S -O - http://domain.tld/
--2016-07-20 15:02:28--  http://domain.tld/
Resolving domain.tld (domain.tld)... aaa.bbb.ccc.ddd
Connecting to domain.tld (domain.tld)|aaa.bbb.ccc.ddd|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.1 200 OK
  Date: Wed, 20 Jul 2016 12:02:28 GMT
  Server: Apache
  Set-Cookie: lang=ru; path=/; expires=Thu, 20-Jul-2017 12:02:28 GMT
  Set-Cookie: sid=3b2b88c4106b5e06c0c24a5c3a513ccbcb939299; domain=domain.tld; path=/; expires=Wed, 20-Jul-2016 12:52:28 GMT; HttpOnly
  X-Catalyst: 5.90106
  Content-Length: 31
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive
  Content-Type: application/json
Length: 31 [application/json]

Content length here is in chars not in bytes
A solution by Aristotle Pagaltzis
$c->response->body(Encode::encode_utf8 $json_text); gives proper content length in this situation
I'm getting same proper content length if I change content type to 'text/html'
Finally, I've discovered Catalyst::View::JSON and it not only solved this problem for me, but also gave me a much more comfortable solution to work with json
$c->stash->{msg} = "В Питере пить";
$c->stash->{id} = 1;
$c->forward('View::JSON');
Works like a charm

Taking this opportunity, thank you for this lovely framework! I'll be happy to provide any additional information if you still consider there is something should be fixed






19.07.2016 19:10, John Napiorkowski пишет:

Looks like to a bug to me, although I'm not personally keen on the auto length setting in Catalyst it should be corrected.  I'm happy to get a patch, or at the very least give me a broken test case (checkout https://github.com/perl-catalyst/catalyst-runtime/blob/master/t/utf_incoming.t 
and see if you can help me figure it out -jnap
(created an issues for this, _https://github.com/perl-catalyst/catalyst-runtime/issues/143




On Friday, July 15, 2016 6:07 AM, Kroshka Yenot <trashbox@cary.lv> wrote:


Hi! if content type is 'application/json' or 'application/json; charset=utf-8' Catalyst sets content length in chars, NOT IN BYTES and I'm getting
{"id":1, "msg":"В Питере if content type is 'text/html' Catalyst sets content length in bytes (properly) and everything works fine
Is there any workaround to configure this behaviour, except setting content length manually everytime ?

my $json_text = '{"id":1, "msg":"В Питере пить"}';
$c->response->content_type('application/json');
$c->response->content_length(bytes::length $json_text);
$c->response->body($json_text);
Thanks in advance


_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/




_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/



_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/
Re: UTF8 and content length [ In reply to ]
John Napiorkowski <jjn1056@yahoo.com> writes:

> So what it looks like to me is that the code that sets a content
> length if one is not set by the view is not dealing with unicode
> correctly. I have another unicode issue I need to look at soonish so I
> try to see if we can get a test case for this. -jnap
> sub index :Path :Args(0)
> {
> my ( $self, $c ) = @_;
>
> my $json_text = '{"id":1, "msg":"В Питере пить"}';
> $c->response->content_type('application/json');
> $c->response->body($json_text);
> }
>

The content type "application/json" is not encoded by catalyst, because
most of the serializers prefer to output bytes not characters (with the,
good or wrong, reason that json is data, not text). There you are
storing a decoded string, declaring the content type, and serve the
body. Which of course is not going to work.

https://metacpan.org/pod/Catalyst#ENCODING

It says: If you are producing JSON response in an unconventional manner
(such as via a template or manual strings) you should perform the UTF8
encoding manually as well such as to conform to the JSON specification.

And setting the json manually *is* an unconventional manner.

I hope this helps.

Best wishes


--
Marco

_______________________________________________
List: Catalyst@lists.scsys.co.uk
Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
Dev site: http://dev.catalyst.perl.org/