Mailing List Archive

[OT] Saving an image as black and white
I've got a bunch of scans, let's assume they're text documents. And
they're rather big ... I want to email them.

How on earth do I convert them to TRUE b&w documents? At the moment they
are jpegs that weigh in at 3MB, and I guess they're using about 5 bytes
to store all the colour, luminance, whatever, per pixel. But actually,
there's only ONE BIT of information there - whether that pixel is black
or white.

I'm using imagemagick, but so far all my attempts to strip out the
surplus information have resulted in INcreasing the file size ???

So basically, how do I save an image as "one bit per pixel" like you'd
think you'd send to a B&W printer?

Even at 300dpi, I make that 300*300/8 ~= 10KB/in^2 or 800KB of
uncompressed info for a page of A4, not 3MB.

Cheers,
Wol
Re: [OT] Saving an image as black and white [ In reply to ]
On March 1, 2021 12:50:35 PM GMT+01:00, Wols Lists <antlists@youngman.org.uk> wrote:
>I've got a bunch of scans, let's assume they're text documents. And
>they're rather big ... I want to email them.
>
>How on earth do I convert them to TRUE b&w documents? At the moment they
>are jpegs that weigh in at 3MB, and I guess they're using about 5 bytes
>to store all the colour, luminance, whatever, per pixel. But actually,
>there's only ONE BIT of information there - whether that pixel is black
>or white.
>
>I'm using imagemagick, but so far all my attempts to strip out the
>surplus information have resulted in INcreasing the file size ???
>
>So basically, how do I save an image as "one bit per pixel" like you'd
>think you'd send to a B&W printer?
>
>Even at 300dpi, I make that 300*300/8 ~= 10KB/in^2 or 800KB of
>uncompressed info for a page of A4, not 3MB.
>
>Cheers,
>Wol
>

Have you tried an optical character recognition software like Tesseract[1]?

1. https://github.com/tesseract-ocr/tesseract



--
Hund
Re: [OT] Saving an image as black and white [ In reply to ]
On 2021-03-01, Wols Lists wrote:

> I've got a bunch of scans, let's assume they're text documents. And
> they're rather big ... I want to email them.
>
> How on earth do I convert them to TRUE b&w documents? At the moment they
> are jpegs that weigh in at 3MB, and I guess they're using about 5 bytes
> to store all the colour, luminance, whatever, per pixel. But actually,
> there's only ONE BIT of information there - whether that pixel is black
> or white.
>
> I'm using imagemagick, but so far all my attempts to strip out the
> surplus information have resulted in INcreasing the file size ???
>
> So basically, how do I save an image as "one bit per pixel" like you'd
> think you'd send to a B&W printer?
>
> Even at 300dpi, I make that 300*300/8 ~= 10KB/in^2 or 800KB of
> uncompressed info for a page of A4, not 3MB.
>
> Cheers,
> Wol

Somebody else might have a better suggestion, or perhaps a better
understanding of the JPEG format and of what needs to be tuned, but, for
example:

convert origin.jpg -threshold 70% -monochrome result.jpg

(And adjust the "-threshold percent" if needed. It might be that you
don't need thresholding at all, but if you do, it apparently must go
before "-monochrome".)

(Depending on the receiving end, you could also explore other
formats. Here, if the scanned document can be stored in monochrome, I
usually use djvu.)

--
Nuno Silva
Re: [OT] Saving an image as black and white [ In reply to ]
On 2021-03-01, Wols Lists wrote:

> On 01/03/21 12:11, (Nuno Silva) wrote:
>> On 2021-03-01, Wols Lists wrote:
>>
>>> I've got a bunch of scans, let's assume they're text documents. And
>>> they're rather big ... I want to email them.
>>>
>>> How on earth do I convert them to TRUE b&w documents? At the moment they
>>> are jpegs that weigh in at 3MB, and I guess they're using about 5 bytes
>>> to store all the colour, luminance, whatever, per pixel. But actually,
>>> there's only ONE BIT of information there - whether that pixel is black
>>> or white.
>>>
>>> I'm using imagemagick, but so far all my attempts to strip out the
>>> surplus information have resulted in INcreasing the file size ???
>>>
>>> So basically, how do I save an image as "one bit per pixel" like you'd
>>> think you'd send to a B&W printer?
>>>
>>> Even at 300dpi, I make that 300*300/8 ~= 10KB/in^2 or 800KB of
>>> uncompressed info for a page of A4, not 3MB.
>>>
>>> Cheers,
>>> Wol
>>
>> Somebody else might have a better suggestion, or perhaps a better
>> understanding of the JPEG format and of what needs to be tuned, but, for
>> example:
>>
>> convert origin.jpg -threshold 70% -monochrome result.jpg
>>
>> (And adjust the "-threshold percent" if needed. It might be that you
>> don't need thresholding at all, but if you do, it apparently must go
>> before "-monochrome".)
>>
>> (Depending on the receiving end, you could also explore other
>> formats. Here, if the scanned document can be stored in monochrome, I
>> usually use djvu.)
>>
> Thanks but no, I've already tried that. It makes matters worse!
>
> I've messed about with the scanner, so it is now creating 800KB images,
> but I don't want to rescan everything I've done.
>
> The problem is that it is clearly saving the images as greyscale, not as
> black&white. And when I search for help, what I want is swamped by all
> the false positives for greyscale.
>
> Oh - and for Nuno - sorry tesseract is no use, they are NOT text. That's
> why I used the word "assume" - to make it clear that I want a
> 1-bit/pixel palette, not a 5-byte/pixel greyscale.
>
> Cheers,
> Wol

Sorry, my bad - I was checking the file sizes, but I didn't notice the
larger one was the new, "monochrome" version. More coffee needed, it
seems.

--
Nuno Silva
Re: Re: [OT] Saving an image as black and white [ In reply to ]
On 01/03/21 12:11, (Nuno Silva) wrote:
> On 2021-03-01, Wols Lists wrote:
>
>> I've got a bunch of scans, let's assume they're text documents. And
>> they're rather big ... I want to email them.
>>
>> How on earth do I convert them to TRUE b&w documents? At the moment they
>> are jpegs that weigh in at 3MB, and I guess they're using about 5 bytes
>> to store all the colour, luminance, whatever, per pixel. But actually,
>> there's only ONE BIT of information there - whether that pixel is black
>> or white.
>>
>> I'm using imagemagick, but so far all my attempts to strip out the
>> surplus information have resulted in INcreasing the file size ???
>>
>> So basically, how do I save an image as "one bit per pixel" like you'd
>> think you'd send to a B&W printer?
>>
>> Even at 300dpi, I make that 300*300/8 ~= 10KB/in^2 or 800KB of
>> uncompressed info for a page of A4, not 3MB.
>>
>> Cheers,
>> Wol
>
> Somebody else might have a better suggestion, or perhaps a better
> understanding of the JPEG format and of what needs to be tuned, but, for
> example:
>
> convert origin.jpg -threshold 70% -monochrome result.jpg
>
> (And adjust the "-threshold percent" if needed. It might be that you
> don't need thresholding at all, but if you do, it apparently must go
> before "-monochrome".)
>
> (Depending on the receiving end, you could also explore other
> formats. Here, if the scanned document can be stored in monochrome, I
> usually use djvu.)
>
Thanks but no, I've already tried that. It makes matters worse!

I've messed about with the scanner, so it is now creating 800KB images,
but I don't want to rescan everything I've done.

The problem is that it is clearly saving the images as greyscale, not as
black&white. And when I search for help, what I want is swamped by all
the false positives for greyscale.

Oh - and for Nuno - sorry tesseract is no use, they are NOT text. That's
why I used the word "assume" - to make it clear that I want a
1-bit/pixel palette, not a 5-byte/pixel greyscale.

Cheers,
Wol
Re: Re: [OT] Saving an image as black and white [ In reply to ]
save/convert to pdf - use gs from ghostscrpit to convert them (I use
ebook for the target) which gives 10-20x reduction in size with only a
small reduction in quality - perfect for emailing.

I dont have the actual command string but I originally found the
suggestion via google.

BillK


On 1/3/21 9:17 pm, Wols Lists wrote:
> On 01/03/21 12:11, (Nuno Silva) wrote:
>> On 2021-03-01, Wols Lists wrote:
>>
>>> I've got a bunch of scans, let's assume they're text documents. And
>>> they're rather big ... I want to email them.
>>>
>>> How on earth do I convert them to TRUE b&w documents? At the moment they
>>> are jpegs that weigh in at 3MB, and I guess they're using about 5 bytes
>>> to store all the colour, luminance, whatever, per pixel. But actually,
>>> there's only ONE BIT of information there - whether that pixel is black
>>> or white.
>>>
>>> I'm using imagemagick, but so far all my attempts to strip out the
>>> surplus information have resulted in INcreasing the file size ???
>>>
>>> So basically, how do I save an image as "one bit per pixel" like you'd
>>> think you'd send to a B&W printer?
>>>
>>> Even at 300dpi, I make that 300*300/8 ~= 10KB/in^2 or 800KB of
>>> uncompressed info for a page of A4, not 3MB.
>>>
>>> Cheers,
>>> Wol
>> Somebody else might have a better suggestion, or perhaps a better
>> understanding of the JPEG format and of what needs to be tuned, but, for
>> example:
>>
>> convert origin.jpg -threshold 70% -monochrome result.jpg
>>
>> (And adjust the "-threshold percent" if needed. It might be that you
>> don't need thresholding at all, but if you do, it apparently must go
>> before "-monochrome".)
>>
>> (Depending on the receiving end, you could also explore other
>> formats. Here, if the scanned document can be stored in monochrome, I
>> usually use djvu.)
>>
> Thanks but no, I've already tried that. It makes matters worse!
>
> I've messed about with the scanner, so it is now creating 800KB images,
> but I don't want to rescan everything I've done.
>
> The problem is that it is clearly saving the images as greyscale, not as
> black&white. And when I search for help, what I want is swamped by all
> the false positives for greyscale.
>
> Oh - and for Nuno - sorry tesseract is no use, they are NOT text. That's
> why I used the word "assume" - to make it clear that I want a
> 1-bit/pixel palette, not a 5-byte/pixel greyscale.
>
> Cheers,
> Wol
>
Re: [OT] Saving an image as black and white [ In reply to ]
On Mon, 1 Mar 2021 11:50:35 +0000, Wols Lists wrote:

> I've got a bunch of scans, let's assume they're text documents. And
> they're rather big ... I want to email them.
>
> How on earth do I convert them to TRUE b&w documents? At the moment they
> are jpegs that weigh in at 3MB, and I guess they're using about 5 bytes
> to store all the colour, luminance, whatever, per pixel. But actually,
> there's only ONE BIT of information there - whether that pixel is black
> or white.
>
> I'm using imagemagick, but so far all my attempts to strip out the
> surplus information have resulted in INcreasing the file size ???
>
> So basically, how do I save an image as "one bit per pixel" like you'd
> think you'd send to a B&W printer?

$ convert input.jpg -threshold 50% output.png

should do it, you may need to play with the threshold setting. The file
command reports the output file as being "1-bit grayscale".

You can also use -monochrome but that will produce a dithered image,
that's probably not what you want judging by your description.


--
Neil Bothwick

If we aren't supposed to eat animals, why are they made of meat?
Re: [OT] Saving an image as black and white [ In reply to ]
On Mon, Mar 1, 2021 at 8:48 AM Neil Bothwick <neil@digimed.co.uk> wrote:
>
> should do it, you may need to play with the threshold setting. The file
> command reports the output file as being "1-bit grayscale".
>
> You can also use -monochrome but that will produce a dithered image,
> that's probably not what you want judging by your description.

Keep in mind that your starting image might not be 1-bit. You might
be scanning in greyscale, which is probably 8-bit.

Nothing wrong with converting to 1-bit, but in that case you would be
throwing away detail. If you plan to do any processing of the file
you might want to do that before throwing out the detail. You also
may or may not want the threshold to be 50%.

Also, as some are starting to hit on, jpeg may or may not be an ideal
format depending on what you're scanning. It was designed for
photographs, and it doesn't really cope well with sharp edges unless
you use very high quality levels. I don't want to offer too much
advice beyond that as I don't really deal with document scanning at
any kind of scale where I get concerned with this stuff - defaults are
almost always fine for me. I'm sure the right format and process
would depend a bit on what you intend to do with the files.

--
Rich
Re: [OT] Saving an image as black and white [ In reply to ]
On 01/03/21 13:48, Neil Bothwick wrote:
> On Mon, 1 Mar 2021 11:50:35 +0000, Wols Lists wrote:
>
>> I've got a bunch of scans, let's assume they're text documents. And
>> they're rather big ... I want to email them.
>>
>> How on earth do I convert them to TRUE b&w documents? At the moment they
>> are jpegs that weigh in at 3MB, and I guess they're using about 5 bytes
>> to store all the colour, luminance, whatever, per pixel. But actually,
>> there's only ONE BIT of information there - whether that pixel is black
>> or white.
>>
>> I'm using imagemagick, but so far all my attempts to strip out the
>> surplus information have resulted in INcreasing the file size ???
>>
>> So basically, how do I save an image as "one bit per pixel" like you'd
>> think you'd send to a B&W printer?
>
> $ convert input.jpg -threshold 50% output.png
>
> should do it, you may need to play with the threshold setting. The file
> command reports the output file as being "1-bit grayscale".
>
> You can also use -monochrome but that will produce a dithered image,
> that's probably not what you want judging by your description.
>
>
FINALLY!

Thanks, that worked! Okay, I also adjusted the dpi because the original
scan was 600 and I've reduced it to 300, but this has reduced the file
size from 3MB to 180KB.

Dunno why, but everything I was trying was INcreasing the file size :-(

And the png does make a massive difference - the same command with jpg
output is 1.7MB - so why is my scanner chucking out 800KB jpegs if I set
it correctly?

Cheers,
Wol
Re: [OT] Saving an image as black and white [ In reply to ]
On Mon, Mar 1, 2021 at 10:54 AM Wols Lists <antlists@youngman.org.uk> wrote:
>
> And the png does make a massive difference - the same command with jpg
> output is 1.7MB - so why is my scanner chucking out 800KB jpegs if I set
> it correctly?

jpeg quality is adjustable. You can output a jpeg file of almost any size.

Software less geared towards image editing may not actually let you
set the quality level, but the software IS using one. So, two
programs could output the same file at different sizes.

The smaller you make the file, the lower the quality. This does have
diminishing returns - as you approach maximum quality you increase the
size greatly with very little difference in visual quality.

Of course, if you try to convert that 1.7MB jpeg into a 30kb jpeg,
you'll probably notice the difference. This is why this is a meme:
http://needsmorejpeg.com/

--
Rich
Re: [OT] Saving an image as black and white [ In reply to ]
Am Mon, Mar 01, 2021 at 03:54:12PM +0000 schrieb Wols Lists:

> >> So basically, how do I save an image as "one bit per pixel" like you'd
> >> think you'd send to a B&W printer?
> >
> > $ convert input.jpg -threshold 50% output.png
> >
> > should do it, you may need to play with the threshold setting. The file
> > command reports the output file as being "1-bit grayscale".
> >
> > You can also use -monochrome but that will produce a dithered image,
> > that's probably not what you want judging by your description.
> >
> >
> FINALLY!
>
> Thanks, that worked! Okay, I also adjusted the dpi because the original
> scan was 600 and I've reduced it to 300, but this has reduced the file
> size from 3MB to 180KB.

Also note: DPI is just a factor that is stored in the image’s metadata. What
produces the actual filesize are the pixels. DPI is used to “convert”
between the physical size of a hypothetical print (i.e. sheet of paper) and
the number of pixel required for a certain density (and thus, quality).

As far as I know, jpeg does not have a special “grayscale mode”. You may
have reduced the information of the image by making all three colour
channels equal to one another, but jpeg still encodes the data as if were a
colour image. That’s why png is the much better option in this case.

--
Gruß | Greetings | Qapla’
Please do not share anything from, with or about me on any social network.

UNIX is not user-unfriendly.
It just expects the user to be a little more computer-friendly.