Mailing List Archive: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?)

"For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?)

Aug 4, 2014, 3:04 PM

Post #1 of 23 (7505 views)

As the line in that favorite song goes "Paranoia strikes deep"...

<NOTE>
I am NOT trying to start ANY political discussion here. I hope no one will
go too far down that path, at least here on this list. There are better
places to do that.

I am also NOT suggesting anything like what I ask next has happened, either
here or elsewhere. It's just a question.

Thanks in advance.
</NOTE>

I'm currently reading a new book by Glen Greenwald called "No Place To
Hide" which is about Greenwald's introduction to Edward Snowden and the
release of all of the confidential NSA documents Snowden acquired. This got
me wondering about Gentoo, or even just Linux in general. If the underlying
issue in all of that Snowden stuff is that the NSA has the ability to
intercept and hack into whatever they please, then how do I know that the
source code I build on my Gentoo machines hasn't been modified by someone
to provide access to my machine, networks, etc.?

Essentially, what is the security model for all this source code and how do
I verify that it hasn't been tampered with in some manner?

1) That the code I build is exactly as written and accepted by the OS
community?

2) That the compilers and interpreters don't do anything except build the
code?

There's certainly lots of other issues about security, like protecting
passwords, protecting physical access to the network and machines, root
kits and the like, etc., but assuming none of that is in question (I don't
have any reason to think the NSA has been in my home!) ;-) I'm looking for
info on how the code is protected from the time it's signed off until it's
built and running here.

If someone knows of a good web site to read on this subject let me know.
I've gone through my Linux life more or less like most everyone went
through life 20 years ago, but paranoia strikes deep.

Thanks in advance,
Mark

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

1i5t5.duncan at cox

Aug 4, 2014, 10:52 PM

Post #2 of 23 (7467 views)

Permalink

Mark Knecht posted on Mon, 04 Aug 2014 15:04:12 -0700 as excerpted:

> As the line in that favorite song goes "Paranoia strikes deep"...

FWIW, while my lists sig is the proprietary-master quote from Richard
Stallman below, since the (anti-)patriot bill was passed in the reaction
to 9-11, my private email sig is a famous quote from Benjamin Franklin:

"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety."

So "I'm with ya..."

> <NOTE>
> I am NOT trying to start ANY political discussion here. I hope no one
> will go too far down that path, at least here on this list. There are
> better places to do that.
>
> I am also NOT suggesting anything like what I ask next has happened,
> either here or elsewhere. It's just a question.
>
> Thanks in advance.
> </NOTE>
>
> I'm currently reading a new book by Glen Greenwald called "No Place To
> Hide" which is about Greenwald's introduction to Edward Snowden and the
> release of all of the confidential NSA documents Snowden acquired. This
> got me wondering about Gentoo, or even just Linux in general. If the
> underlying issue in all of that Snowden stuff is that the NSA has the
> ability to intercept and hack into whatever they please, then how do I
> know that the source code I build on my Gentoo machines hasn't been
> modified by someone to provide access to my machine, networks, etc.?

These are good questions to ask, and to have some idea of the answers to,
as well.

Big picture, at some level, you pretty much have to accept that you
/don't/ know. However, there's /some/ level of security... tho honestly
a bit less on Gentoo than on some of the other distros (see below), tho
it'd still not be /entirely/ easy to subvert at least widely (for an
individual downloader is another question), but it could be done.

> Essentially, what is the security model for all this source code and how
> do I verify that it hasn't been tampered with in some manner?
>
> 1) That the code I build is exactly as written and accepted by the OS
> community?

At a basic level, source and ebuild integrity, protecting both from
accidental corruption (where it's pretty good) and from deliberate
tampering (where it may or may not be considered "acceptable", but if
someone with the resources wanted to bad enough, they could subvert), is
what ebuild and sources digests are all about. The idea is that the
gentoo package maintainer creates hash digests of multiple types for both
the ebuild and the sources, such that should the copy that a gentoo user
gets not match the copy that a gentoo maintainer created, the package
manager (PM, normally portage), if configured to do so (mainly
FEATURES=strict, also see stricter and assume-digests, plus the webrsync-
gpg feature mentioned below) will error out and refuse to emerge that
package.

But there are serious limits to that protection. Here's a few points to
consider:

1) While the ebuilds and sources are digested, those digests do *NOT*
extend to the rest of the tree, the various files in the profile
directory, the various eclasses, etc. So in theory at least, someone
could mess with say the package.mask file in profiles, or one of the
eclasses, and could potentially get away with it. But see point #3 as
there's a (partial) workaround for the paranoid.

2) Meanwhile, since hashing (unlike gpg signing) isn't designed to be
secure, primarily protecting against accidental damage not so much
deliberate compromise, with digest verification verifying that nothing
changed in transit but not who did the digest in the first place, there's
some risk that one or more gentoo rsync mirrors could be compromised or
be run by a bad actor in the first place. Should that occur, the bad
actor could attempt to replace BOTH the digested ebuild and/or sources
AND the digest files, updating the latter to reflect his compromised
version instead of the version originally digested by the gentoo
maintainer. Similarly, someone such as the NSA could at least in theory
do the same thing in transit, targeting a specific user's downloads while
leaving everyone else's downloads from the same mirror alone, so only the
target got the compromised version. While there's a reasonable chance
someone would catch a bad mirror, if a single downloader is specifically
targeted, unless they're specifically validating against other mirrors as
well and/or comparing digests (over a secure channel) against those
someone else downloaded, there's little chance they'd detect the
problem. So even digest-protected files aren't immune to compromise.

But as I said above, there's a (partial) workaround. See point #3.

3) While #1 applies to the tree in general when it is rsynced, gentoo
does have a somewhat higher security sync method for the paranoid and to
support users behind firewalls which don't pass rsync. Instead of
running emerge sync, this method uses the emerge-webrsync tool, which
downloads the entire main gentoo tree as a gpg-signed tarball. If you
have FEATURES=webrsync-gpg set (see the make.conf manpage, FEATURES,
webrsync-gpg), portage will verify the gpg signature on this tarball.

The two caveats here are (1) that the webrsync tarball is generated only
once per day, while the main tree is synced every few minutes, so the
rsynced tree is going to be more current, and (2) that each snapshot is
the entire tree, not just the changes, so for those updating daily or
close to it, fetching the full tarball every day instead of just the
changes will be more network traffic. Tho I think the tarball is
compressed (I've never tried this method personally so can't say for
sure) while the rsync tree isn't, so if you're updating monthly, I'd
guess it's less traffic to get the tarball.

The tarball is gpg-signed which is more secure than simple hash digests,
but the signature covers the entire thing, not individual files, so the
granularity of the digests is better. Additionally, the tarball signing
is automated, so while a signature validation pretty well ensures that
the tarball did indeed come from gentoo, should someone compromise gentoo
infrastructure security and somehow get a bad file in place, the daily
snapshot tarball would blindly sign and package up the bad file along
with all the rest.

So sync-method bottom line, if you're paranoid or simply want additional
gpg-signed security, use emerge-webrsync along with FEATURES=webrsync-gpg,
instead of normal rsync-based emerge sync. That pretty well ensures that
you're getting exactly the gentoo tree tarball gentoo built and signed,
which is certainly far more secure than normal rsync syncing, but because
the tarballing and signing is automated and covers the entire tree,
there's still the possibility that one or more files in that tarball are
compromised and that it hasn't been detected yet.

Meanwhile, I mentioned above that gentoo isn't as secure in this regard
as a number of other Linux distros. This is DEFINITELY the case for
normal rsync syncers, but even for webrsync-gpg syncers it remains the
case to some extent. Unfortunately, in practice it seems that isn't
likely to change in the near-term, and possibly not in the medium or
longer term either, unless some big gentoo compromise is detected and
makes the news. THEN we're likely to see changes.

Alternatively, when that big pie-in-the-sky main gentoo tree switch from
cvs (yes, still) to git eventually happens, the switch to full-signing
will be quite a bit easier, tho there will still be policies to enforce,
etc. But they've been talking about the switch to git for years, as
well, and... incrementally... drawing closer, including the fact that
major portions of gentoo are actually developed in git-based overlays
these days. But will the main tree ever actually switch to git? Who
knows? As of now it's still pie-in-the-sky, with no nailed down plans.
Perhaps at some point somebody and some gentoo council together will
decide it's time and move whatever mountains or molehills remain to get
it done, and at this point I think that's mostly what it'll take, perhaps
not, but unless that somebody steps up and makes that push come hell or
high water, assuming gentoo's still around by then, come 2025 we could
still be talking about doing it... someday...

Back to secure-by-policy gpg-signing...

The problem is that while we've known what must be done, and what other
distros have already done, for years, and while gentoo has made some
progress down the security road, in the absence of that ACTIVE KNOWN
COMPROMISE RIGHT NOW immediate threat, other things simply continue to be
higher priority, while REAL gentoo security continues to be back-burnered.

Basically, what must be done, thru all the way to policy enforcement and
refusing gentoo developer commits if they don't match policy, is enforce
a policy that every gentoo dev has a registered gpg key (AFAIK that much
is already the case), and that every commit they make is SIGNED by that
personal developer key, with gentoo-infra verification of those
signatures, rejecting any commit that doesn't verify.

FWIW, there's GLEPs detailing most of this. They've just never been
fully implemented, tho incrementally, bits and pieces have been, over
time.

As I said, other distros have done this, generally when they HAD to, when
they had that compromise hitting the news. Tho I think a few distros
have implemented such a signed-no-exceptions policy when some OTHER
distro got hit. Gentoo hasn't had that happen yet, and while the
infrastructure is generally there to sign at least individual package
commits, and some devs actually do so (you can see the signed digests for
some packages, for instance), that hasn't been enforced tree-wide, and in
fact, there's a few relatively minor but still important policy questions
to resolve first, before such enforcement is actually activated.

Here's one such signing-policy question to consider. Currently, package
maintainer devs make changes to their ebuilds, and later, after a period
of testing, arch-devs keyword a particular ebuild stable for their arch.
Occasionally arch-devs may add a bit of conditional code that applies to
their arch only, as well.

Now consider this. Suppose a compromised package is detected after the
package has been keyworded stable. The last several signed commits to
that package were keywording only, while the commit introducing the
compromise was sometime earlier.

Question: Are those arch-devs that signed their keywording-only commits
responsible too, because they signed off on the package, meaning they now
have to inspect every package they keyword, checking for compromises that
might not be entirely obvious to them, or are they only responsible for
the keywording changes they actually committed, and aren't obligated to
actually inspect the rest of the ebuild they're now signing?

OK, so we say that they're only responsible for the keywording. Simple
enough. But what about this? Suppose they add an arch-conditional that
combined with earlier code in the package results in a compromise. But
the conditional code they added looks straightforward enough on its own,
and really does solve a problem on that arch, and without that code, the
original code looks innocently functional as well. But together, anyone
installing that package on that arch is now open to the world. Both devs
signed, the code of both devs is legit and looks innocent enough on its
own, but taken together, they result in a bad situation. Now it's not so
clear that an arch-dev shouldn't have to inspect and sign for the results
of the package after his commit, is it? Yet enforcing that as policy
will seriously slow-down arch stable keywording, and some archs can't
keep up as it is, so such a policy will be an effective death sentence
for them as a gentoo-stable supported arch.

Certainly there are answers to that sort of question, and various distros
have faced and come up with their own policy answers, often because in
the face of a REAL DISTRO COMPROMISE making the news, they've had no
other choice. To some extent, gentoo is lucky in that it hasn't been
faced with making those hard choices yet. But the fact is, all gentoo
users remain less safe than we could be, because those hard choices
haven't been made and enforced... because we've not been forced to do so.

Meanwhile, even were we to have done so, there's still the possibility
that upstream development might be compromised. Every year or two, some
upstream project or another makes news due to some compromise or
another. Sometimes vulnerable versions have been distributed for awhile,
and various distros have picked them up. In an upstream-compromise
situation like that, there's little a distro can do, with the exception
of going slow enough that their packages are all effectively outdated,
which also happens to be a relatively effective counter to this sort of
issue since if a several years old version changes it'll be detected
right away, and (one hopes) most compromises to a project server will be
detected within months at the longest, so anything a year or more old
should be relatively safe from this sort of issue, simply by virtue of
its age.

Obviously the people and enterprise distros willing to run years outdated
code do have that advantage, and that's a risk that people wishing to run
reasonably current code simply have to take as a result of that choice,
regardless of the distro they chose to get that current code from.

But even if you choose to run an old distro so aren't likely to be hit by
current upstream compromises, that has and enforces a full signing policy
so every commit can be accounted for, and even if none of those
developers at either the distro or upstream levels deliberately breaks
the trust and goes bad, there's still the issue below...

> 2) That the compilers and interpreters don't do anything except build
> the code?

There's a very famous in security circles paper that effectively proves
that unless you can absolutely trust every single layer in the build
line, including the hardware layer (which means its sources) and the
compiler and tools used to build your operational tools, and the compiler
and tools used to build them, and... all the way back... you simply
cannot absolutely trust the results, period.

I never kept the link, but it seems the title actually stuck in memory
well enough for me to google it: "Reflections on Trusting Trust"
=:^) Here's the google link:

https://www.google.com/search?q=%22reflections+on+trusting+trust%22

That means that in ordered to absolutely prove the gcc (for example) on
our own systems, even if we can read and understand every line of gcc
source, we must absolutely prove the tools on the original installation
media and in the stage tarballs that we used to build our system. Which
means we must not only have the code to them and trust the builders, but
we must have the code and trust the builders of the tools they used, and
the builders and tools of those tools, and...

Meanwhile, the same rule effectively applies to the hardware as well.
And while Richard Stallman may run a computer that is totally open source
hardware and firmware (down to the BIOS or equivalent), for which he has
all the schemantics, etc, most of us run at least some semi-proprietary
hardware of /some/ sort. Which means even if we /could/ fully understand
the sources ourselves, without them and without that full understanding,
at that level, we simply have to trust... someone... basically, the
people who design and manufacture that hardware.

Thus, in practice, (nearly) everyone ends up drawing the line
/somewhere/. The Stallmans of the world draw it pretty strictly,
refusing to run anything which at minimum has replaceable firmware which
doesn't itself have sources available. (As Stallman defines it, if the
firmware is effectively burned in such that the manufacturer themselves
can't update it, then that's good enough for the line he draws. Tho that
leads to absurdities such as an OpenMOKO phone that at extra expense has
the firmware burned onto a separate chip such that it can't be replaced
by anyone, in ordered to be able to use hardware that would otherwise be
running firmware that the supplier refuses to open-source -- because the
extra expense to do it that way means the manufacturer can't replace the
firmware either, so it's on the OK side of Stallman's line.)

Meanwhile, I personally draw the line at what runs at the OS level on my
computer. That means I won't run proprietary graphics drivers or flash,
but I will and do load source-less firmware onto the Radeon-based
graphics hardware I do run, in ordered to use the freedomware kernel
drivers for the same hardware that I refuse to run the proprietary frglx
drivers on.

Other people are fine running flash and/or proprietary graphics drivers,
but won't run a mostly-proprietary full OS such as MS Windows or Apple
OSX.

Still others prefer to run open source where it fits their needs, but
won't go out of their way to do so if proprietary works better for them,
and still others simply don't care either way, running whatever works
best regardless of the freedom or lack thereof of its sources.

Anyway, when it comes to hardware and compiler, in practice the best you
can do is run a FLOSS compiler such as gcc, while trusting the tools you
used to build the first ancestor, basically, the gcc and tools in the
stage tarballs, as well as whatever you booted (probably either a gentoo-
installer or another distro) in ordered to chroot into that unpacked
stage and build from there. Beyond that, well... good luck, but you're
still going to end up drawing the line /somewhere/.

> There's certainly lots of other issues about security, like protecting
> passwords, protecting physical access to the network and machines, root
> kits and the like, etc., but assuming none of that is in question (I
> don't have any reason to think the NSA has been in my home!) ;-) I'm
> looking for info on how the code is protected from the time it's signed
> off until it's built and running here.
>
> If someone knows of a good web site to read on this subject let me know.
> I've gone through my Linux life more or less like most everyone went
> through life 20 years ago, but paranoia strikes deep.

Indeed. Hope the above was helpful. I think it's a pretty accurate
picture from at least my own perspective, as someone who cares enough
about it to at least spend a not insignificant amount of time keeping up
on the current situation in this area, both for linux in general, and for
gentoo in particular.

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

rich0 at gentoo

Aug 5, 2014, 4:36 AM

Post #3 of 23 (7467 views)

Permalink

On Mon, Aug 4, 2014 at 6:04 PM, Mark Knecht <markknecht@gmail.com> wrote:
>
> Essentially, what is the security model for all this source code and how do
> I verify that it hasn't been tampered with in some manner?

Duncan already gave a fairly comprehensive response. I believe the
intent is to refactor and generally improve things when we move to
git. Even today there aren't a lot of avenues for slipping code in
without compromising a gentoo server or manipulating your rsync data
transfer (if it isn't secured).

But...

> There's certainly lots of other issues about security, like protecting
> passwords, protecting physical access to the network and machines, root kits
> and the like, etc., but assuming none of that is in question (I don't have
> any reason to think the NSA has been in my home!) ;-) I'm looking for info
> on how the code is protected from the time it's signed off until it's built
> and running here.

You may very well be underestimating the NSA here. It has already
come out that they hack into peoples systems just to get their ssh
keys to hack into other people's systems, even if the admins that
they're targeting aren't of any interest otherwise. That is, you
don't have to be a suspected terrorist/etc to be on their list.

I run a relay-only tor node (which doesn't seem to keep everybody and
their uncle from blocking me as if I'm an exit node it seems). I'd be
surprised if the NSA hasn't rooted my server just so that they can
monitor my tor traffic - if they did this to all the tor relays they
could monitor the entire network, so I would think that this would be
a priority for them.

To root your system the NSA doesn't have to compromise some Gentoo
server, or even tamper with your rsync feed. The simplest solution
would be to just target a zero-day vulnerability in some software
you're running. They might use a zero-day in some daemon that runs as
root, maybe a zero-day in the kernel network stack, or a zero-day in
your browser (those certainly exist) combined with a priv escalation
attack. If they're just after your ssh keys they don't even need priv
escalation. Those attacks don't require targeting Gentoo in
particular.

If your goal is to be safe from "the NSA" then I think you need to
fundamentally rethink your approach to security. I'd recommend
verifying, signing, and verifying all code that runs (think iOS). I
doubt that any linux distro is going to suit your needs unless you
just use it as a starting point for a fork.

However, I do think that Gentoo can do a better job of securing code
than it does today, and that is a worthwhile goal. I doubt it would
stop the NSA, but we certainly can do something about lesser threats
that don't:
1. Have a 12-figure budget.
2. Have complete immunity from prosecution.
3. Have an army of the best cryptographers in the world, etc.
4. Have privileged access to the routers virtually all of your
traffic travels over.
5. Have the ability to obtain things like trusted SSL certs at will
(though I don't think anybody has caught them doing this one).

In the early post-Snowden days I was more paranoid, but these days
I've basically given up worrying about the NSA. After the ssh key
revelations I just assume they have root on my box - I just wish
they'd be nice enough to close up any other vulnerabilities they find
so that others don't get root, and maybe let me access whatever
backups they've made if for some reason I lose access to my own
backups. I still try to keep things as secure as I can to keep
everybody else out, but hiding from the NSA is a tall order.

Oh yeah, if they have compromised my box you can assume they have my
Gentoo ssh key and password and gpg key if they actually want them...
:)

Rich

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

markknecht at gmail

Aug 5, 2014, 10:50 AM

Post #4 of 23 (7473 views)

Permalink

Hi Rich,
Thanks for the response. I'll likely respond over the next few hours &
days in dribs and drabs...

On Tue, Aug 5, 2014 at 4:36 AM, Rich Freeman <rich0@gentoo.org> wrote:
>
> On Mon, Aug 4, 2014 at 6:04 PM, Mark Knecht <markknecht@gmail.com> wrote:
> >
> > Essentially, what is the security model for all this source code and
how do
> > I verify that it hasn't been tampered with in some manner?
>
> Duncan already gave a fairly comprehensive response. I believe the
> intent is to refactor and generally improve things when we move to
> git. Even today there aren't a lot of avenues for slipping code in
> without compromising a gentoo server or manipulating your rsync data
> transfer (if it isn't secured).
>
> But...
>
> > There's certainly lots of other issues about security, like protecting
> > passwords, protecting physical access to the network and machines, root
kits
> > and the like, etc., but assuming none of that is in question (I don't
have
> > any reason to think the NSA has been in my home!) ;-) I'm looking for
info
> > on how the code is protected from the time it's signed off until it's
built
> > and running here.
>
> You may very well be underestimating the NSA here. It has already
> come out that they hack into peoples systems just to get their ssh
> keys to hack into other people's systems, even if the admins that
> they're targeting aren't of any interest otherwise. That is, you
> don't have to be a suspected terrorist/etc to be on their list.
>

Yeah, I've read that. It's my basic POV at this time that if the NSA
(or any other organization) wants something I have then they have
it already. However a good portion of my original thoughts are
along the line of your zero-day point below.

> I run a relay-only tor node (which doesn't seem to keep everybody and
> their uncle from blocking me as if I'm an exit node it seems). I'd be
> surprised if the NSA hasn't rooted my server just so that they can
> monitor my tor traffic - if they did this to all the tor relays they
> could monitor the entire network, so I would think that this would be
> a priority for them.

The book I referenced made it clear that the NSA has a whole specific
program & toolset to target tor so I suspect you're correct, or even
underestimating yourself. That said, running tor is legal so more power
to you. I ran it a little to play with and found all the 2-level security
stuff
at GMail and YahooMail too much trouble to deal with.

>
> To root your system the NSA doesn't have to compromise some Gentoo
> server, or even tamper with your rsync feed. The simplest solution
> would be to just target a zero-day vulnerability in some software
> you're running. They might use a zero-day in some daemon that runs as
> root, maybe a zero-day in the kernel network stack, or a zero-day in
> your browser (those certainly exist) combined with a priv escalation
> attack. If they're just after your ssh keys they don't even need priv
> escalation. Those attacks don't require targeting Gentoo in
> particular.
>

Yep, and it's the sort of thing I was thinking about when I wrote this
yesterday:

I'm sitting here writing R code. I do it in R-Studio. How do I
know that every bit of code I run in that tool isn't being sent out to some
server? Most likely no one has done an audit of that GUI so I'm trusting
that the company isn't nefarious in nature.

I use Chrome. How do I know Chrome isn't scanning my local drives
and sending stuff somewhere? I don't.

In the limit, how would I even know if the Linux kernel was doing this? I
got source through emerge, built code using gcc, installed it by hand,
but I don't know what's really there and never will. I suspect the kernel
is likely one of the safer things on my box.

In the news yesterday was this story about some pedophile sending
child porn using GMail and then getting arrested because Google scans
'certain' attachments for known hashes. Well, that's the public story (so
far) but it seems to me that Google isn't likely creating those hashes but
getting them from the FBI, but the point is it's all being watched.

I think one way you might not be as John Le Carre-oriented as me is
that if I was the NSA and wanted inside of Linux (or M$FT or Apple) in
general, then I would simply pay people to be inside of those entities and
to do my bidding. Basic spycraft. Those folks would already be in the
kernel development area, or in KDE, or in the facilities that host the
code,
or where ever making whatever changes they want. They would have
already hacked how iOS does signing, or M$FT does updates, etc.

When it comes to security, choose whatever type you want, but how
do I as a user know that my sha-1 or pgp or whatever is what the
developers thought they were making publicly available. I don't and
probably never will.

> If your goal is to be safe from "the NSA"

It's not. Nor do I think I'll ever know if I am so I have to assume
I'm not. Life in the modern era...

<SNIP>
>
> In the early post-Snowden days I was more paranoid, but these days
> I've basically given up worrying about the NSA.

Similar for me, although reading this book, or watching the 2-episode
Frontline story, or (fill in whatever) raises the question, but more in a
general sense. I'm far less worried about the NSA and more worried
about things like general hackers after financial info or people looking
for code I'm writing.

Thanks for all the info, and thanks to Duncan also who I will write more
too when I've checked out all the technical stuff he posted.

Cheers,
Mark

Re: Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

markknecht at gmail

Aug 5, 2014, 11:50 AM

Post #5 of 23 (7470 views)

Permalink

On Mon, Aug 4, 2014 at 10:52 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>
> Mark Knecht posted on Mon, 04 Aug 2014 15:04:12 -0700 as excerpted:
>
> > As the line in that favorite song goes "Paranoia strikes deep"...
>
> FWIW,

I __LOVE__ the idea that my favorite old song has ended up being
a contraction everyone uses...

> while my lists sig is the proprietary-master quote from Richard
> Stallman below, since the (anti-)patriot bill was passed in the reaction
> to 9-11, my private email sig is a famous quote from Benjamin Franklin:
>
> "They that can give up essential liberty to obtain a little
> temporary safety, deserve neither liberty nor safety."
>
> So "I'm with ya..."

Good to know. (Not that I didn't already!)

<SNIP>
> These are good questions to ask, and to have some idea of the answers to,
> as well.
>
> Big picture, at some level, you pretty much have to accept that you
> /don't/ know.

OK.

<SNIP>
> I never kept the link, but it seems the title actually stuck in memory
> well enough for me to google it: "Reflections on Trusting Trust"
> =:^) Here's the google link:
>
> https://www.google.com/search?q=%22reflections+on+trusting+trust%22
>

This is a great paper and the Moral section is dead on right. The line:

"No amount of source-level verification or scrutiny will protect you
from using untrusted code."

is spot on and just about impossible for folks like me. (And I'm _way_
beyond the average computer use, as is anyone reading this list.)

I'll respond/etc. to other parts of your post later but want to
give a quick thanks right now.

Cheers,
Mark

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

frank.peters at comcast

Aug 5, 2014, 12:16 PM

Post #6 of 23 (7467 views)

Permalink

On Mon, 4 Aug 2014 15:04:12 -0700
Mark Knecht <markknecht@gmail.com> wrote:

>
> then how do I know that the
> source code I build on my Gentoo machines hasn't been modified by someone
> to provide access to my machine, networks, etc.?
>

There are two approaches to system development that tend to mitigate
all security concerns:

1) Highly distributed development

2) Simplicity of design

If the component pieces of a system are independently developed
by widely scattered and unrelated development teams then there
is much less chance for any integrated security attacks.

Also, if the overall system remains simple and each component is
narrowly focused then the result is better transparency for the user
which insures less opportunity for attack.

Linux _used_ to adhere to these two principles, but currently it
is more and more moving toward monolithic development and much
reduced simplicity. I refer especially to the Freedesktop
project, which is slowly becoming the centralized headquarters
for everything graphical. I also mention systemd, with its plethora
of system daemons that obscure all system transparency.

From the beginning, Linux, due to its faithfulness to the above
two principles, allowed the user to fully control and easily understand
the operation of his system. This situation is now being threatened
with freedesktop, systemd, etc., and security attacks can only become
more feasible.

We, as a community of Linux users, have to adamantly oppose these
monolithic projects that attempt to destroy choice and transform
Linux into another Microsoft Windows.

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

rich0 at gentoo

Aug 5, 2014, 12:57 PM

Post #7 of 23 (7468 views)

Permalink

On Tue, Aug 5, 2014 at 3:16 PM, Frank Peters <frank.peters@comcast.net> wrote:
> Linux _used_ to adhere to these two principles, but currently it
> is more and more moving toward monolithic development and much
> reduced simplicity. I refer especially to the Freedesktop
> project, which is slowly becoming the centralized headquarters
> for everything graphical. I also mention systemd, with its plethora
> of system daemons that obscure all system transparency.

Everybody loves to argue about which design is "simpler," the "unix way," etc.

The fact is that while systemd does bundle a fairly end-to-end
solution, many of its components are modular. I can run systemd
without running networkd, or resolved, etc. The modular components
have interfaces, though they aren't really intended to work with
anything other than systemd.

Honestly, I think the main differences are that it doesn't do things
the traditional way. Nothing prevents you from talking to daemons via
DBus, or inspecting their traffic.

Also, a set of modular components engineered to work together is less
likely to have integration-related bugs than a bunch of components
designed to operate on their own.

SystemD also allows some security-oriented optimizations, like private
tmpdirs, making the filesystem read-only, reduced capabilities/etc.
That isn't to say that you can't do this with traditional service
scripts, but there are more barriers to doing it.

Ultimately it is a lot more functional than a traditional init, so I
do agree that the attack surface is larger. Still, most of the stuff
that is incorporated into systemd is going to be running in some
process on a typical server - much of it as root.

The use of DBus also means that you can use policies to control who
can do what more granularly. If you want a user to be able to shut
down the system, I imagine that is just a DBus message to systemd and
you could probably give an otherwise-nonprivileged user the ability to
send that message without having to create suid helpers with their own
private rules. The ability to further secure message-passing in this
way is one of the reasons for kdbus, and Linus endorses that (but not
some of the practices of its maintainers).

I do suggest that you try using systemd in a VM just to see what it is
about. If nothing else you might appreciate some of the things it
attempts to solve just so that you can come up with better ways of
solving them. :)

Rich

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

frank.peters at comcast

Aug 5, 2014, 1:36 PM

Post #8 of 23 (7470 views)

Permalink

On Tue, 5 Aug 2014 10:50:35 -0700
Mark Knecht <markknecht@gmail.com> wrote:

>
> I use Chrome. How do I know Chrome isn't scanning my local drives
> and sending stuff somewhere? I don't.
>

It wouldn't have to scan your local drives. It would only have
to scan the very few directories named "MY DOCUMENTS" and
"MY VIDEOS" and "MY EMAIL" which have conveniently been established
by the omnipotent and omniscient desktop environment. Within
these universal and standardized storage areas can be found
everything that snooping software would need to find.

I am only being partly facetious. This does represent the trend.
We have standardized locations that are shared across many different
programs. But the programs aren't really different because they
are produced by the same desktop conglomerate or because they
must employ the toolkits and widgets of said conglomerate.

The job of the NSA is getting easier. Those terrorist documents
will no longer be buried within terabytes of disjoint hard drive
space. They will all be nicely tucked into an "ALL DOCUMENTS ARE HERE"
standardized directory that nobody had better modify because
the entire system will crash.

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

1i5t5.duncan at cox

Aug 5, 2014, 4:20 PM

Post #9 of 23 (7467 views)

Permalink

Frank Peters posted on Tue, 05 Aug 2014 16:36:57 -0400 as excerpted:

> It wouldn't have to scan your local drives. It would only have to scan
> the very few directories named "MY DOCUMENTS" and "MY VIDEOS" and "MY
> EMAIL" which have conveniently been established by the omnipotent and
> omniscient desktop environment. Within these universal and standardized
> storage areas can be found everything that snooping software would need
> to find.

Hmm... Some people (me) don't use those standardized locations. I have
a dedicated media partition -- large, still on spinning rust when most of
the system in terms of filenames (but not size) is on SSD, and where it's
mounted isn't standard and is unlikely to /be/ standard, simply because I
have my own rather nonconformist ideas of where I want stuff located and
how it should be organized.

OTOH, consider ~/.thumbnails/. Somebody already mentioned that google
case and the hashes they apparently scan for. ~/.thumbnails will
normally have thumbnails for anything in the system visited by normal
graphics programs, including both still images and video, and I think pdf
too unless that's always generated dynamically as is the case with txt
files, via various video-thumbnail addons. Those thumbnails are all
going to be standardized to one of a few standard sizes, and can either
be used effectively as (large) hashes directly, or smaller hashes of them
could be generated...

Tho some images programs (gwenview) have an option to wipe the thumbnails
dir when they're shutdown, but given the time creating those thumbnails
on any reasonably large collection takes, most people aren't going to
want to enable wiping...

Meanwhile, one of the things that has come out is that the NSA
effectively already considers anyone running a Linux desktop a radical,
likely on their watch-list already, just as is anyone running TOR, or
even simply visiting the TOR site or an article linking to them.

I guess I must be on their list several times over, what with the sigs I
use, etc, the security/privacy-related articles I read, the OS I run, and
the various lists I participate on...

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

Re: Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

james.a.elian at gmail

Aug 6, 2014, 5:14 AM

Post #10 of 23 (7465 views)

Permalink

E
Sent via BlackBerry from Vodafone Romania

-----Original Message-----
From: Duncan <1i5t5.duncan@cox.net>
Date: Tue, 5 Aug 2014 23:20:26
To: <gentoo-amd64@lists.gentoo.org>
Reply-to: gentoo-amd64@lists.gentoo.org
Subject: [gentoo-amd64] Re: "For What It's Worth" (or How do I know my Gentoo source code
hasn't been messed with?)

Frank Peters posted on Tue, 05 Aug 2014 16:36:57 -0400 as excerpted:

> It wouldn't have to scan your local drives. It would only have to scan
> the very few directories named "MY DOCUMENTS" and "MY VIDEOS" and "MY
> EMAIL" which have conveniently been established by the omnipotent and
> omniscient desktop environment. Within these universal and standardized
> storage areas can be found everything that snooping software would need
> to find.

Hmm... Some people (me) don't use those standardized locations. I have
a dedicated media partition -- large, still on spinning rust when most of
the system in terms of filenames (but not size) is on SSD, and where it's
mounted isn't standard and is unlikely to /be/ standard, simply because I
have my own rather nonconformist ideas of where I want stuff located and
how it should be organized.

OTOH, consider ~/.thumbnails/. Somebody already mentioned that google
case and the hashes they apparently scan for. ~/.thumbnails will
normally have thumbnails for anything in the system visited by normal
graphics programs, including both still images and video, and I think pdf
too unless that's always generated dynamically as is the case with txt
files, via various video-thumbnail addons. Those thumbnails are all
going to be standardized to one of a few standard sizes, and can either
be used effectively as (large) hashes directly, or smaller hashes of them
could be generated...

Tho some images programs (gwenview) have an option to wipe the thumbnails
dir when they're shutdown, but given the time creating those thumbnails
on any reasonably large collection takes, most people aren't going to
want to enable wiping...

Meanwhile, one of the things that has come out is that the NSA
effectively already considers anyone running a Linux desktop a radical,
likely on their watch-list already, just as is anyone running TOR, or
even simply visiting the TOR site or an article linking to them.

I guess I must be on their list several times over, what with the sigs I
use, etc, the security/privacy-related articles I read, the OS I run, and
the various lists I participate on...

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

Re: Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

james.a.elian at gmail

Aug 6, 2014, 5:14 AM

Post #11 of 23 (7462 views)

Permalink

Re: Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

markknecht at gmail

Aug 6, 2014, 2:33 PM

Post #12 of 23 (7481 views)

Permalink

Hi Duncan

On Mon, Aug 4, 2014 at 10:52 PM, Duncan <1i5t5.duncan@cox.net> wrote:
<SNIP>
>
> 3) While #1 applies to the tree in general when it is rsynced, gentoo
> does have a somewhat higher security sync method for the paranoid and to
> support users behind firewalls which don't pass rsync. Instead of
> running emerge sync, this method uses the emerge-webrsync tool, which
> downloads the entire main gentoo tree as a gpg-signed tarball. If you
> have FEATURES=webrsync-gpg set (see the make.conf manpage, FEATURES,
> webrsync-gpg), portage will verify the gpg signature on this tarball.
>

I'm finally able to investigate this today. I'm not finding very
detailed instructions anywhere , more like notes people would use if
they've done this before and understand all the issues. Being that
it's my first excursion down this road I have much to learn.

OK, I've modified make.conf as such:

FEATURES="buildpkg strict webrsync-gpg"
PORTAGE_GPG_DIR="/etc/portage/gpg"

and created /etc/portage/gpg:

c2RAID6 portage # ls -al
total 72
drwxr-xr-x 13 root root 4096 Aug 6 14:25 .
drwxr-xr-x 87 root root 4096 Aug 6 09:10 ..
drwxr-xr-x 2 root root 4096 Apr 27 10:26 bin
-rw-r--r-- 1 root root 22 Jan 1 2014 categories
drwxr-xr-x 2 root root 4096 Jul 6 09:42 env
drwx------ 2 root root 4096 Aug 6 14:03 gpg
-rw-r--r-- 1 root root 1573 Aug 6 14:03 make.conf
lrwxrwxrwx 1 root root 63 Mar 5 2013 make.profile ->
../../usr/portage/profiles/default/linux/amd64/13.0/desktop/kde
[the rest deleted...]

eix-sync seems to be working but it may (or may not) be caught in some
loop where it just keeps looking for older data. I let it go until it
got back into July and then did a Ctrl-C:

c2RAID6 portage # eix-sync -wa
* Running emerge-webrsync
Fetching most recent snapshot ...
Trying to retrieve 20140805 snapshot from http://gentoo.osuosl.org ...
Fetching file portage-20140805.tar.xz.md5sum ...
Fetching file portage-20140805.tar.xz.gpgsig ...
Fetching file portage-20140805.tar.xz ...
Checking digest ...
Checking signature ...
gpg: Signature made Tue Aug 5 17:55:23 2014 PDT using RSA key ID C9189250
gpg: Can't check signature: No public key
Fetching file portage-20140805.tar.bz2.md5sum ...
Fetching file portage-20140805.tar.bz2.gpgsig ...
Fetching file portage-20140805.tar.bz2 ...
Checking digest ...
Checking signature ...
gpg: Signature made Tue Aug 5 17:55:22 2014 PDT using RSA key ID C9189250
gpg: Can't check signature: No public key
Fetching file portage-20140805.tar.gz.md5sum ...
20140805 snapshot was not found
Trying to retrieve 20140804 snapshot from http://gentoo.osuosl.org ...
Fetching file portage-20140804.tar.xz.md5sum ...
Fetching file portage-20140804.tar.xz.gpgsig ...
Fetching file portage-20140804.tar.xz ...
Checking digest ...
Checking signature ...
gpg: Signature made Mon Aug 4 17:55:27 2014 PDT using RSA key ID C9189250
gpg: Can't check signature: No public key

QUESTIONS:

1) Is the 'No public key' message talking about me, or something at
the source? I haven't got any keys so maybe i need to generate one?

2) Once I do get this working correctly it would make sense to me that
I need to delete all existing distfiles to ensure that anything on my
system actually came from this tarball. Is that correct?

<SNIP>
> So sync-method bottom line, if you're paranoid or simply want additional
> gpg-signed security, use emerge-webrsync along with FEATURES=webrsync-gpg,
> instead of normal rsync-based emerge sync. That pretty well ensures that
> you're getting exactly the gentoo tree tarball gentoo built and signed,
> which is certainly far more secure than normal rsync syncing, but because
> the tarballing and signing is automated and covers the entire tree,
> there's still the possibility that one or more files in that tarball are
> compromised and that it hasn't been detected yet.

Or, as we both have eluded to, the bad guy is intercepting the
transmission and giving me a different tarball...

For now, it's more than enough to take a baby first step.

Thanks for all your sharing of info!

Cheers,
Mark

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

1i5t5.duncan at cox

Aug 6, 2014, 5:58 PM

Post #13 of 23 (7474 views)

Permalink

Mark Knecht posted on Wed, 06 Aug 2014 14:33:28 -0700 as excerpted:

> OK, I've modified make.conf as such:
>
> FEATURES="buildpkg strict webrsync-gpg"
> PORTAGE_GPG_DIR="/etc/portage/gpg"
>
> and created /etc/portage/gpg:

> drwxr-xr-x 2 root root 4096 Jul 6 09:42

> eix-sync seems to be working but it may (or may not) be caught in some
> loop where it just keeps looking for older data. I let it go until it
> got back into July and then did a Ctrl-C:
>
> c2RAID6 portage # eix-sync -wa
> * Running emerge-webrsync
> Fetching most recent snapshot ...
> Trying to retrieve 20140805 snapshot from http://gentoo.osuosl.org ...
> Fetching file portage-20140805.tar.xz.md5sum ...
> Fetching file portage-20140805.tar.xz.gpgsig ...
> Fetching file portage-20140805.tar.xz ...
> Checking digest ...
> Checking signature ...
> gpg: Signature made Tue Aug 5 17:55:23 2014 PDT using RSA key ID
> C9189250
> gpg: Can't check signature: No public key
> Fetching file [repeat in a loop with older dates]

> QUESTIONS:
>
> 1) Is the 'No public key' message talking about me, or something at the
> source? I haven't got any keys so maybe i need to generate one?

It's saying you need to (separately) download the *GENTOO* key using some
other method, and put it in the appropriate place so it can verify the
signatures its finding.

Note that while I've not used webrsync, for some years (until I switched
from signed kernel tarball to git-cloned kernel) I ran a script that I
wrote up myself, that downloaded the kernel tarball as well as its gpg-
signatures, and gpg-verified the signature on the tarball before
unpacking it and going ahead with reconfiguration and build.

So I have a reasonable grasp of the general concepts -- good enough I
could script it -- but I don't know the webrsync specifics.

But that's definitely a missing separately downloaded public key, so it
can't verify the signatures on the tarballs it's downloading, and is thus
rejecting them.

Of course in this case such a rejection is a good thing, since if it was
acting as if nothing was wrong and simply trusting the tarball even when
it couldn't verify the signature, it would be broken in security terms
anyway! =:^)

So that's what you'll need to do, presumably based on instructions you
find for getting that key and putting it in the right spot so webrsync
can access it. But unfortunately since I've not used it myself, I can't
supply you those instructions.

Or wait! Actually I can, as google says that's actually part of the
gentoo handbook! =:^) (Watch the link-wrap and reassemble as necessary,
I'm lazy today. The arch doesn't matter for this bit so x86/amd64, it's
all the same.)

https://www.gentoo.org/doc/en/handbook/handbook-x86.xml?
part=2&chap=3#webrsync-gpg

Based on the above, it seems you've created the gpg directory and set
appropriate permissions, but either you haven't downloaded the keys as
described in the link above, or perhaps you're missing the PORTAGE_GPG_DIR
setting.

> 2) Once I do get this working correctly it would make sense to me that I
> need to delete all existing distfiles to ensure that anything on my
> system actually came from this tarball. Is that correct?

Not unless you're extremely paranoid, tho it wouldn't hurt anything, just
mean you blew away your cache and have more downloading to do until you
have it again.

Once you're verifying the tarball, part of the tarball's signed and
verified contents is going to be the distfile digests. Once they're
coming from the tarball, you have a reasonable confidence that they
haven't been tampered with, and given the multi-algorithm hashing, even
if one algorithm was hacked and the file could be faked based on only it,
the other hashes should catch the problem.

Of course, once you're doing this, it's even MORE important not to simply
redigest the occasional source or ebuild that comes up with an error due
to a bad digest. For people not verifying things to do that is one
thing, but once you're verifying, simply doing a redigest yourself on
anything that comes up bad directly bypasses all that extra safety you're
trying to give yourself. So if a distfile tarball comes up bad, go ahead
and delete it and try again, but if it comes up bad repeatedly, don't
just redigest it, either check for a bad-digest bug on that package and
file one if necessary, or simply wait a day or two and try again, as the
problem is generally caught and fixed by then.

(I've seen rumors of people on the forums, etc, suggesting a redigest at
any failure, and it always scares me. Those digests are there for a
reason, and if they're failing, you better make **** sure you know it's a
legitimate bug (say kde making a last minute change to the tarballs after
release to the distros for testing, but before public release, as they do
occasionally) before redigesting. And even then, if you can just wait a
couple days it should normally work itself out, without you having to
worry about it or take that additional risk.)

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

max at maxandcarrie

Aug 7, 2014, 8:36 AM

Post #14 of 23 (7472 views)

Permalink

Hello all,

I've been very interested in this topic myself, so I'll pile on my
question after answering one of Mark's

<Snip>
On 05/08/2014 1:50 PM, Mark Knecht wrote:
> I'm sitting here writing R code. I do it in R-Studio. How do I
> know that every bit of code I run in that tool isn't being sent out to
> some
> server? Most likely no one has done an audit of that GUI so I'm trusting
> that the company isn't nefarious in nature.
>
> I use Chrome. How do I know Chrome isn't scanning my local drives
> and sending stuff somewhere? I don't.
>
> In the limit, how would I even know if the Linux kernel was doing this? I
> got source through emerge, built code using gcc, installed it by hand,
> but I don't know what's really there and never will. I suspect the kernel
> is likely one of the safer things on my box.
>

The answer to most things security related seems to be independent
verification. If you're going to be the person to do that verification
because you don't trust others to do it or can't find proof that it's
been done, then there are two factors at play; time and money.

Where you're only running your own traffic through your system (unlike
Duncan's TOR example) this is relatively easy and cheap to accomplish.
For ~$100 you can buy a consumer grade switch with a configurable
mirroring port which will effectively passively sniff all the traffic
going through the switch. You then connect this mirrored port to a spare
junker computer running optimally a different distro of linux like
Security Onion or anything else with TCPDump capturing full packet
captures which you can do analytics on. I do the same for my home
network to detect compromised hosts and to see if I'm under attack for
any reason. Things I find useful for getting a finger on the pulse are:

- DNS Query monitoring to see who my home network is reaching out to
- GeoIPLookup mappings against bandwidth usage to see if lots of data
is being slurped out of my environment
- BroIDS, Snorby and Squert (security onion suite of tools) for at a
glance view of things going wrong and the ability to dig into events quickly

My question is what kind of independent validation, or even peer review,
is done over the core of Gentoo? Now that new users are being pushed to
use the Stage3 tarball and genkernel, is seems to me that much of the
core of the Gentoo system is a "just trust me" package. What I love
about the Stage 1 approach is you get all the benefits of compiling the
system as you go, essentially from scratch and customized for your
system, and all the benefits of the scrutiny Duncan mentioned applying
to ebuilds is applied. There is much more control in the hands of the
person using Stage 1, and it's a smaller footprint for someone to
independently validate malicious code didn't get introduced into it.
Should someone have been manipulated to put something malicious into the
stage3 tarball it could much more easily give a permanent foothold over
your system to a malicious 3rd party (think rootkit) then stage 1 would
allow.

Thanks to anyone who can provide light on the topic,
Max

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

lie.1296 at gmail

Aug 7, 2014, 9:06 AM

Post #15 of 23 (7465 views)

Permalink

With you having to compile thousands of stuffs if you build from stage 1, I
doubt that you will be able to verify every single thing you compile and
detect if something is actually doing sneaky stuff AND still have the time
to enjoy your system. Also, even if you build from stage 1 and manage to
verify all the source code, you still need to download a precompiled
compiler which could possibly inject the malicious code into the programs
it compiles, and which can also inject itself if you try to compile another
compiler from source. If there is a single software that is worth a gold
mine to inject with malware to gain illicit access to all Linux system,
then it would be gcc. Once you infect a compiler, you're invincible.

Also, did you apply the same level of scrutiny to your hardware?

For the truly paranoid, I recommend unplugging.

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

1i5t5.duncan at cox

Aug 7, 2014, 10:20 AM

Post #16 of 23 (7461 views)

Permalink

Lie Ryan posted on Fri, 08 Aug 2014 02:06:14 +1000 as excerpted:

> With you having to compile thousands of stuffs if you build from stage
> 1, I doubt that you will be able to verify every single thing you
> compile and detect if something is actually doing sneaky stuff AND still
> have the time to enjoy your system. Also, even if you build from stage 1
> and manage to verify all the source code, you still need to download a
> precompiled compiler which could possibly inject the malicious code into
> the programs it compiles, and which can also inject itself if you try to
> compile another compiler from source. If there is a single software that
> is worth a gold mine to inject with malware to gain illicit access to
> all Linux system, then it would be gcc. Once you infect a compiler,
> you're invincible.

Actually, that brings up a good question. The art of compiling is
certainly somewhat magic to me tho I guess I somewhat understand the
concept in a vague, handwavy way, but...

From my understanding, that's one reason why the gcc build is multi-stage
and uses simpler (and thus easier to audit) tools such as lex and bison
in its bootstrapping process. I'm not actually sure whether gcc actually
requires a previous gcc (or other full compiler) to build or not, but I
do know it goes to quite some lengths to bootstrap in multiple stages,
building things up from the simple to the complex as it goes and testing
each stage in the process so that if something goes wrong, there's some
idea /where/ it went wrong.

Clearly one major reason for that is proving functionality at each step
such that if the process goes wrong, there's some place to start as to
why and how, but it certainly doesn't hurt in helping to prove or at
least somewhat establish the basic security situation either, tho as
we've already established, it's basically impossible to prove both the
hardware and the software back thru all the multiple generations.

Of course the simpler tools, lex, bison, etc, must have been built from
something, but because they /are/ simpler, they're also easier to audit
and prove basic functionality, including disassembly and analysis of
individual machine instructions for a fuller audit.

So anyway, to the gcc experts that know, and to non-gcc CS folks who have
actually built their own simple compilers and can at least address the
concept, is a previous gcc or other full compiler actually required to
build a new gcc, or does it sufficiently bootstrap itself from the more
basic tools such that unlike most code, it doesn't actually need a full
compiler to build and reasonably optimize at all? That's a question I've
had brewing in the back of my mind for some time, and this seemed the
perfect opportunity to ask it. =:^)

Meanwhile, I suppose it must be possible at least at some level, else how
would new hardware archs come to be supported. Gotta start /somewhere/
on the toolchain, and "simpler" stuff like lex and bison can I believe
run on a previous arch, generating the basic executable building blocks
that ultimately become the first executable code actually run by the new
target arch.

And of course gcc has long been one of the most widely arch-supporting
compilers, precisely because it /is/ open source and /is/ designed to be
bootstrapped in stages like that. I guess clang/llvm is giving gcc some
competition in that area now, in part because it's more modern and
modular and in part because unlike gcc it /can/ legally be taken private
and supplied to others without offering sources and some companies are
evil that way, but gcc's the one with the long history in that area, and
given that history I'd guess it'll be some time before clang/llvm catches
up, even if it's getting most of the new platforms right now, which I've
no idea whether it's the case or not.

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

Re: Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

markknecht at gmail

Aug 7, 2014, 11:16 AM

Post #17 of 23 (7480 views)

Permalink

This is a bit long but it's mostly just stuff copied from my terminal
for completeness.
-MWK

On Wed, Aug 6, 2014 at 5:58 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> Mark Knecht posted on Wed, 06 Aug 2014 14:33:28 -0700 as excerpted:
>
>> OK, I've modified make.conf as such:
>>
>> FEATURES="buildpkg strict webrsync-gpg"
>> PORTAGE_GPG_DIR="/etc/portage/gpg"
>>
>> and created /etc/portage/gpg:
>
>> drwxr-xr-x 2 root root 4096 Jul 6 09:42
>
<SNIP>
>
> Or wait! Actually I can, as google says that's actually part of the
> gentoo handbook! =:^) (Watch the link-wrap and reassemble as necessary,
> I'm lazy today. The arch doesn't matter for this bit so x86/amd64, it's
> all the same.)
>
> https://www.gentoo.org/doc/en/handbook/handbook-x86.xml?
> part=2&chap=3#webrsync-gpg
>

Great link! Thanks. So I think the important stuff is here, the first
2 lines I managed
on my own, but the gpg part is what's new to me:

[QUOTE]
# mkdir -p /etc/portage/gpg
# chmod 0700 /etc/portage/gpg
(... Substitute the keys with those mentioned on the release
engineering site ...)
# gpg --homedir /etc/portage/gpg --keyserver subkeys.pgp.net
--recv-keys 0xDB6B8C1F96D8BF6D
# gpg --homedir /etc/portage/gpg --edit-key 0xDB6B8C1F96D8BF6D trust
[/QOUTE]

From the comment about the Release Engineering site, I think that's here:

https://www.gentoo.org/proj/en/releng/

And the keys match with is good.

Anyway, running the first command is fine. The second command wants me to
make a choice. For now I chose to 'ultimately trust'. (Aren't I gullible!?!)

[COPY]
c2RAID6 ~ # gpg --homedir /etc/portage/gpg --edit-key 0xDB6B8C1F96D8BF6D trust
gpg (GnuPG) 2.0.25; Copyright (C) 2013 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

pub 4096R/96D8BF6D created: 2011-11-25 expires: 2015-11-24 usage: C
trust: unknown validity: unknown
sub 4096R/C9189250 created: 2011-11-25 expires: 2015-11-24 usage: S
[ unknown] (1). Gentoo Portage Snapshot Signing Key (Automated Signing Key)

pub 4096R/96D8BF6D created: 2011-11-25 expires: 2015-11-24 usage: C
trust: unknown validity: unknown
sub 4096R/C9189250 created: 2011-11-25 expires: 2015-11-24 usage:
S
[ unknown] (1). Gentoo Portage Snapshot Signing Key (Automated Signing
Key)

Please decide how far you trust this user to correctly verify other
users' keys
(by looking at passports, checking fingerprints from different
sources, etc.)

1 = I don't know or won't say
2 = I do NOT trust
3 = I trust marginally
4 = I trust fully
5 = I trust ultimately
m = back to the main menu

Your decision? 5
Do you really want to set this key to ultimate trust? (y/N) y

pub 4096R/96D8BF6D created: 2011-11-25 expires: 2015-11-24 usage:
C
trust: ultimate validity: unknown
sub 4096R/C9189250 created: 2011-11-25 expires: 2015-11-24 usage:
S
[ unknown] (1). Gentoo Portage Snapshot Signing Key (Automated Signing
Key)
Please note that the shown key validity is not necessarily correct
unless you restart the program.

gpg> list

pub 4096R/96D8BF6D created: 2011-11-25 expires: 2015-11-24 usage: C
trust: ultimate validity: unknown
sub 4096R/C9189250 created: 2011-11-25 expires: 2015-11-24 usage: S
[ unknown] (1)* Gentoo Portage Snapshot Signing Key (Automated Signing Key)

gpg> check
uid Gentoo Portage Snapshot Signing Key (Automated Signing Key)
sig!3 96D8BF6D 2011-11-25 [self-signature]
6 signatures not checked due to missing keys

gpg> quit
c2RAID6 ~ #

[/COPY]

I'm not sure how to short of a reboot 'restart the program', nor what the line

6 signatures not checked due to missing keys

really means. That said it appears to be working better than yesterday:

c2RAID6 ~ # eix-sync -w
* Running emerge-webrsync
Fetching most recent snapshot ...
Trying to retrieve 20140806 snapshot from http://gentoo.osuosl.org ...
Fetching file portage-20140806.tar.xz.md5sum ...
Fetching file portage-20140806.tar.xz.gpgsig ...
Fetching file portage-20140806.tar.xz ...
Checking digest ...
Checking signature ...
gpg: Signature made Wed Aug 6 17:55:26 2014 PDT using RSA key ID C9189250
gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0 valid: 1 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 1u
gpg: next trustdb check due at 2015-11-24
gpg: Good signature from "Gentoo Portage Snapshot Signing Key
(Automated Signing Key)" [ultimate]
Getting snapshot timestamp ...
Syncing local tree ...

Number of files: 178933
Number of files transferred: 6846
Total file size: 327.27M bytes
Total transferred file size: 19.96M bytes
Literal data: 19.96M bytes
Matched data: 0 bytes
File list size: 4.32M
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 12.38M
Total bytes received: 156.23K

sent 12.38M bytes received 156.23K bytes 166.03K bytes/sec
total size is 327.27M speedup is 26.11
Cleaning up ...
* Copying old database to /var/cache/eix/previous.eix
* Running eix-update
Reading Portage settings ..
<SNIP>
[474] "zx2c4" layman/zx2c4 (cache: eix*
/tmp/eix-remote.MbcFER9d/zx2c4.eix [*/zx2c4])
Reading Packages .. Finished
Applying masks ..
Calculating hash tables ..
Writing database file /var/cache/eix/remote.eix ..
Database contains 31587 packages in 234 categories.
* Calling eix-diff
Diffing databases (17596 -> 17598 packages)
[>] == games-util/umodpack (0.5_beta16-r1 -> 0.5_beta16-r2):
portable and useful [un]packer for Unreal Tournament's Umod files
[U] == media-libs/libbluray (0.5.0-r1{tbz2}@06/19/14;
(~)0.5.0-r1{tbz2} -> (~)0.6.1): Blu-ray playback libraries
[>] == net-misc/chrony (1.30^t -> 1.30-r1^t): NTP client and server programs
[U] == sys-devel/gnuconfig (20131128{tbz2}@02/18/14; 20131128{tbz2}
-> 20140212): Updated config.sub and config.guess file from GNU
[U] == virtual/libgudev (215(0/0){tbz2}@08/05/14; 215(0/0){tbz2} ->
215-r1(0/0)): Virtual for libgudev providers
[U] == virtual/libudev (215(0/1){tbz2}@08/05/14; 215(0/1){tbz2} ->
215-r1(0/1)): Virtual for libudev providers
[D] == www-client/google-chrome-beta
(37.0.2062.58_p1{tbz2}@08/05/14; (~)37.0.2062.58_p1^msd{tbz2} ->
~37.0.2062.68_p1^msd): The web browser from Google
[U] == www-client/google-chrome-unstable
(38.0.2107.3_p1{tbz2}@08/06/14; (~)38.0.2107.3_p1^msd{tbz2} ->
(~)38.0.2114.2_p1^msd): The web browser from Google
[N] >> dev-ruby/prawn-table (~0.1.0): Provides support for tables in Prawn
[N] >> sys-apps/cv (~0.4.1): Coreutils Viewer: show progress for cp,
rm, dd, and so forth
* Time statistics:
136 seconds for syncing
43 seconds for eix-update
2 seconds for eix-diff
197 seconds total
c2RAID6 ~ #

So that's all looking pretty good, as a first step. If it's a matter
of 3 1/2 minutes instead of 1-2 minutes then I can live with that
part. However that's just (I think) the portage tree and not signed
source code, correct?

Now, is the idea that I have a validated portage snapshot at this
point and stiff have to actually get the code using the regular emerge
which will do the checking because I have:

FEATURES="buildpkg strict webrsync-gpg"

I don't see any evidence that emerge checked what it downloaded, but
maybe those checks are only done when I really build the code?

c2RAID6 ~ # emerge -fDuN @world
Calculating dependencies... done!

>>> Fetching (1 of 5) sys-devel/gnuconfig-20140212
>>> Downloading 'http://gentoo.osuosl.org/distfiles/gnuconfig-20140212.tar.bz2'
--2014-08-07 11:12:11--
http://gentoo.osuosl.org/distfiles/gnuconfig-20140212.tar.bz2
Resolving gentoo.osuosl.org... 140.211.166.134
Connecting to gentoo.osuosl.org|140.211.166.134|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 44808 (44K) [application/x-bzip2]
Saving to: '/usr/portage/distfiles/gnuconfig-20140212.tar.bz2'

100%[================================================================>]
44,808 113KB/s in 0.4s

2014-08-07 11:12:13 (113 KB/s) -
'/usr/portage/distfiles/gnuconfig-20140212.tar.bz2' saved
[44808/44808]

* gnuconfig-20140212.tar.bz2 SHA256 SHA512 WHIRLPOOL size ;-) ...
[ ok ]

>>> Fetching (2 of 5) media-libs/libbluray-0.6.1
>>> Downloading 'http://gentoo.osuosl.org/distfiles/libbluray-0.6.1.tar.bz2'
--2014-08-07 11:12:13--
http://gentoo.osuosl.org/distfiles/libbluray-0.6.1.tar.bz2
Resolving gentoo.osuosl.org... 140.211.166.134
Connecting to gentoo.osuosl.org|140.211.166.134|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 586646 (573K) [application/x-bzip2]
Saving to: '/usr/portage/distfiles/libbluray-0.6.1.tar.bz2'

100%[================================================================>]
586,646 716KB/s in 0.8s

2014-08-07 11:12:15 (716 KB/s) -
'/usr/portage/distfiles/libbluray-0.6.1.tar.bz2' saved [586646/586646]

* libbluray-0.6.1.tar.bz2 SHA256 SHA512 WHIRLPOOL size ;-) ...
[ ok ]

>>> Fetching (3 of 5) virtual/libudev-215-r1

>>> Fetching (4 of 5) virtual/libgudev-215-r1

>>> Fetching (5 of 5) www-client/google-chrome-unstable-38.0.2114.2_p1
>>> Downloading 'http://dl.google.com/linux/chrome/deb/pool/main/g/google-chrome-unstable/google-chrome-unstable_38.0.2114.2-1_amd64.deb'
--2014-08-07 11:12:16--
http://dl.google.com/linux/chrome/deb/pool/main/g/google-chrome-unstable/google-chrome-unstable_38.0.2114.2-1_amd64.deb
Resolving dl.google.com... 74.125.239.2, 74.125.239.6, 74.125.239.4, ...
Connecting to dl.google.com|74.125.239.2|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 47472462 (45M) [application/x-debian-package]
Saving to: '/usr/portage/distfiles/google-chrome-unstable_38.0.2114.2-1_amd64.deb'

100%[================================================================>]
47,472,462 6.81MB/s in 7.1s

2014-08-07 11:12:23 (6.37 MB/s) -
'/usr/portage/distfiles/google-chrome-unstable_38.0.2114.2-1_amd64.deb'
saved [47472462/47472462]

* google-chrome-unstable_38.0.2114.2-1_amd64.deb SHA256 SHA512
WHIRLPOOL size ;-) ... [ ok ]
c2RAID6 ~ #

Cheers,
Mark

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

markknecht at gmail

Aug 7, 2014, 12:29 PM

Post #18 of 23 (7470 views)

Permalink

On Thu, Aug 7, 2014 at 9:06 AM, Lie Ryan <lie.1296@gmail.com> wrote:
<SNIP>
>
> Also, did you apply the same level of scrutiny to your hardware?
>

That's the basis of the now well known NSA hack on Cisco routers.
Intercept the box, modify the hardware, send the box onto some foreign
land and the router lets them in. No hacking required.

> For the truly paranoid, I recommend unplugging.
>

In the aforementioned book that's pretty much exactly what Snowden
required of the reporter & documentary film maker he started out
disclosing the info to. They had to buy new laptops and never attach
them to the net. He apparently used PGP encryption to chat & transfer
files over normal nets but (as I understand it) the encrypted files
are never opened on anything other than your off-the-net machine.

Of course, according to Snowden the NSA can enable the microphone on
my cell phone and listen to me talking in the house. He required
batteries be removed or cell phones be placed in a freezer.

I recently saw a similar story about new TVs having built in cameras
(for game interfaces I suppose) which could be enabled over the net to
watch what's going on in my living room. If the TV has power applied,
even if I'm not using it, what do I know really about what it's doing?

All of that argues for Max's suggestion about sniffing the network
full time, assuming I can relay on the sniffer not being hacked... ;-)

Re: Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

markknecht at gmail

Aug 7, 2014, 12:38 PM

Post #19 of 23 (7476 views)

Permalink

On Thu, Aug 7, 2014 at 10:20 AM, Duncan <1i5t5.duncan@cox.net> wrote:
> Lie Ryan posted on Fri, 08 Aug 2014 02:06:14 +1000 as excerpted:
>
>> With you having to compile thousands of stuffs if you build from stage
>> 1, I doubt that you will be able to verify every single thing you
>> compile and detect if something is actually doing sneaky stuff AND still
>> have the time to enjoy your system. Also, even if you build from stage 1
>> and manage to verify all the source code, you still need to download a
>> precompiled compiler which could possibly inject the malicious code into
>> the programs it compiles, and which can also inject itself if you try to
>> compile another compiler from source. If there is a single software that
>> is worth a gold mine to inject with malware to gain illicit access to
>> all Linux system, then it would be gcc. Once you infect a compiler,
>> you're invincible.
>
> Actually, that brings up a good question. The art of compiling is
> certainly somewhat magic to me tho I guess I somewhat understand the
> concept in a vague, handwavy way, but...

<SNIP>
>
> So anyway, to the gcc experts that know, and to non-gcc CS folks who have
> actually built their own simple compilers and can at least address the
> concept, is a previous gcc or other full compiler actually required to
> build a new gcc, or does it sufficiently bootstrap itself from the more
> basic tools such that unlike most code, it doesn't actually need a full
> compiler to build and reasonably optimize at all? That's a question I've
> had brewing in the back of my mind for some time, and this seemed the
> perfect opportunity to ask it. =:^)
>

And beyond Duncan's question (good question!) if I try to rebuild gcc
like it was an empty box using my current machine I see this sort of thing
where gcc is about the 350th of 385 packages getting built. It seems to
me that _any_ package that has programs running at the same or higher
level as emerge could be hacked and control what's actually placed on the
machine.

It's an endless problem if you cannot trust anything, and for most people,
and certainly for me, unverifiable the ways the tools work today.

c2RAID6 ~ # emerge -pve gcc

These are the packages that would be merged, in order:

Calculating dependencies... done!
[ebuild R ] app-arch/xz-utils-5.0.5-r1 USE="nls threads
-static-libs" ABI_X86="(64) (-32) (-x32)" 1,276 kB
[ebuild R ] virtual/libintl-0-r1 ABI_X86="(64) -32 (-x32)" 0 kB
[ebuild R ] app-arch/bzip2-1.0.6-r6 USE="-static -static-libs"
ABI_X86="(64) (-32) (-x32)" 0 kB
[ebuild R ] dev-libs/expat-2.1.0-r3 USE="unicode -examples
-static-libs" ABI_X86="(64) (-32) (-x32)" 550 kB
[ebuild R ] virtual/libiconv-0-r1 ABI_X86="(64) (-32) (-x32)" 0 kB
[ebuild R ] dev-lang/python-exec-2.0.1-r1:2
PYTHON_TARGETS="(jython2_5) (jython2_7) (pypy) (python2_7) (python3_2)
(python3_3) (-python3_4)" 0 kB
[ebuild R ] sys-devel/gnuconfig-20140212 0 kB
[ebuild R ] media-libs/libogg-1.3.1 USE="-static-libs"
ABI_X86="(64) (-32) (-x32)" 0 kB
[ebuild R ] app-misc/mime-types-9 16 kB
[ebuild R ] sys-apps/baselayout-2.2 USE="-build" 40 kB
[ebuild R ] sys-devel/gcc-config-1.7.3 15 kB

<SNIP, SNIP, SNIP>

[ebuild R ] media-libs/phonon-4.6.0-r1 USE="gstreamer (-aqua)
-debug -pulseaudio -vlc (-zeitgeist)" 275 kB
[ebuild R ] sys-libs/glibc-2.19-r1:2.2 USE="(multilib) -debug
-gd (-hardened) -nscd -profile (-selinux) -suid -systemtap -vanilla" 0
kB
[ebuild R ] sys-devel/gcc-4.7.3-r1:4.7 USE="cxx fortran
(multilib) nls nptl openmp (-altivec) -awt -doc (-fixed-point) -gcj
-go -graphite (-hardened) (-libssp) -mudflap (-multislot) -nopie
-nossp -objc -objc++ -objc-gc -regression-test -vanilla" 81,022 kB
[ebuild R ] sys-libs/pam-1.1.8-r2 USE="berkdb cracklib nls
-audit -debug -nis (-selinux) {-test} -vim-syntax" ABI_X86="(64) (-32)
(-x32)" 0 kB
[ebuild R ] dev-db/mysql-5.1.70 USE="community perl ssl
-big-tables -cluster -debug -embedded -extraengine -latin1
-max-idx-128 -minimal -pbxt -profiling (-selinux) -static {-test}
-xtradb" 24,865 kB
[ebuild R ] sys-devel/llvm-3.3-r3:0/3.3 USE="libffi
static-analyzer xml -clang -debug -doc -gold -multitarget -ocaml
-python {-test} -udis86" ABI_X86="(64) (-32) (-x32)"
PYTHON_TARGETS="python2_7 (-pypy) (-pypy2_0%) (-python2_6%)"
VIDEO_CARDS="-radeon" 0 kB
[ebuild R ] media-libs/mesa-10.0.4 USE="classic egl gallium llvm
nptl vdpau xvmc -bindist -debug -gbm -gles1 -gles2 -llvm-shared-libs
-opencl -openvg -osmesa -pax_kernel -pic -r600-llvm-compiler
(-selinux) -wayland -xa" ABI_X86="(64) (-32) (-x32)"
VIDEO_CARDS="(-freedreno) -i915 -i965 -ilo -intel -nouveau -r100 -r200
-r300 -r600 -radeon -radeonsi -vmware" 0 kB
[ebuild R ] x11-libs/cairo-1.12.16 USE="X glib opengl svg xcb
(-aqua) -debug -directfb -doc (-drm) (-gallium) (-gles2)
-legacy-drivers -openvg (-qt4) -static-libs -valgrind -xlib-xcb" 0 kB
[ebuild R ] app-text/poppler-0.24.5:0/44 USE="cairo cxx
introspection jpeg jpeg2k lcms png qt4 tiff utils -cjk -curl -debug
-doc" 0 kB
[ebuild R ] media-libs/harfbuzz-0.9.28:0/0.9.18 USE="cairo glib
graphite introspection truetype -icu -static-libs {-test}"
ABI_X86="(64) (-32) (-x32)" 0 kB
[ebuild R ] x11-libs/pango-1.36.5 USE="X introspection -debug"
ABI_X86="(64) (-32) (-x32)" 0 kB
[ebuild R ] x11-libs/gtk+-2.24.24:2 USE="introspection xinerama
(-aqua) -cups -debug -examples {-test} -vim-syntax" ABI_X86="(64)
(-32) (-x32)" 0 kB
[ebuild R ] x11-libs/gtk+-3.12.2:3 USE="X introspection xinerama
(-aqua) -cloudprint -colord -cups -debug -examples {-test} -vim-syntax
-wayland" 0 kB
[ebuild R ] dev-db/libiodbc-3.52.7 USE="gtk" 1,015 kB
[ebuild R ] app-crypt/pinentry-0.8.2 USE="gtk ncurses qt4 -caps
-static" 419 kB
[ebuild R ] dev-java/icedtea-bin-6.1.13.3-r3:6 USE="X alsa -cjk
-cups -doc -examples -nsplugin (-selinux) -source -webstart" 0 kB
[ebuild R ] dev-libs/soprano-2.9.4 USE="dbus raptor redland
virtuoso -debug -doc {-test}" 1,913 kB
[ebuild R ] app-crypt/gnupg-2.0.25 USE="bzip2 ldap nls readline
usb -adns -doc -mta (-selinux) -smartcard -static" 0 kB
[ebuild R ] gnome-extra/polkit-gnome-0.105 304 kB
[ebuild R ] kde-base/kdelibs-4.12.5-r2:4/4.12 USE="acl alsa
bzip2 fam handbook jpeg2k mmx nls opengl (policykit) semantic-desktop
spell sse sse2 ssl udev udisks upower -3dnow (-altivec) (-aqua) -debug
-doc -kerberos -lzma -openexr {-test} -zeroconf" 0 kB
[ebuild R ] sys-auth/polkit-kde-agent-0.99.0-r1:4 USE="(-aqua)
-debug" LINGUAS="-ca -ca@valencia -cs -da -de -en_GB -eo -es -et -fi
-fr -ga -gl -hr -hu -is -it -ja -km -lt -mai -ms -nb -nds -nl -pa -pt
-pt_BR -ro -ru -sk -sr -sr@ijekavian -sr@ijekavianlatin -sr@latin -sv
-th -tr -uk -zh_TW" 34 kB
[ebuild R ] kde-base/nepomuk-core-4.12.5:4/4.12 USE="exif pdf
(-aqua) -debug -epub -ffmpeg -taglib" 0 kB
[ebuild R ] kde-base/katepart-4.12.5:4/4.12 USE="handbook
(-aqua) -debug" 0 kB
[ebuild R ] kde-base/kdesu-4.12.5:4/4.12 USE="handbook (-aqua)
-debug" 0 kB
[ebuild R ] net-libs/libproxy-0.4.11-r2 USE="kde -gnome -mono
-networkmanager -perl -python -spidermonkey {-test} -webkit"
ABI_X86="(64) (-32) (-x32)" PYTHON_TARGETS="python2_7" 0 kB
[ebuild R ] kde-base/nepomuk-widgets-4.12.5:4/4.12 USE="(-aqua)
-debug" 0 kB
[ebuild R ] kde-base/khelpcenter-4.12.5:4/4.12 USE="(-aqua) -debug" 0 kB
[ebuild R ] net-libs/glib-networking-2.40.1-r1 USE="gnome
libproxy ssl -smartcard {-test}" ABI_X86="(64) (-32) (-x32)" 0 kB
[ebuild R ] net-libs/libsoup-2.46.0-r1:2.4 USE="introspection
ssl -debug -samba {-test}" ABI_X86="(64) (-32) (-x32)" 0 kB
[ebuild R ] media-plugins/gst-plugins-soup-0.10.31-r1:0.10
ABI_X86="(64) (-32) (-x32)" 0 kB
[ebuild R ] media-libs/phonon-gstreamer-4.6.3 USE="alsa network
-debug" 71 kB

Total: 385 packages (385 reinstalls), Size of downloads: 355,030 kB
c2RAID6 ~ #

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

1i5t5.duncan at cox

Aug 7, 2014, 12:53 PM

Post #20 of 23 (7472 views)

Permalink

Mark Knecht posted on Thu, 07 Aug 2014 11:16:23 -0700 as excerpted:

> From the comment about the Release Engineering site, I think that's
> here:
>
> https://www.gentoo.org/proj/en/releng/
>
> And the keys match with is good.
>
> Anyway, running the first command is fine. The second command wants me
> to make a choice. For now I chose to 'ultimately trust'. (Aren't I
> gullible!?!)

[...]

> Please decide how far you trust this user to correctly verify other
> users' keys (by looking at passports, checking fingerprints from
> different sources, etc.)
>
> 1 = I don't know or won't say
> 2 = I do NOT trust
> 3 = I trust marginally
> 4 = I trust fully
> 5 = I trust ultimately
> m = back to the main menu
>
> Your decision? 5
> Do you really want to set this key to ultimate trust? (y/N) y

GPG is built on a "web of trust" idea. Basically, the idea is that if
you know and trust someone (well, in this case their key), then they can
vouch for someone else that you don't know.

At various community conferences and the like, there's often a
"key signing party". Attending devs and others active in the community
check passports and etc, in theory validating the identity of the other
person, then sign their key, saying they've checked and the person using
this key is really who they say they are.

What you're doing here is giving gpg some idea of how much trust you want
to put in the key, not just to verify that whatever that person sends you
did indeed get signed with their key, but more importantly, to what
extent you trust that key to vouch for OTHER keys it has signed that you
don't know about yet.

If an otherwise unknown-trust key is signed by an ultimate-trust key,
it'll automatically be considered valid, tho it won't itself be trusted
to sign /other/ keys until you specifically assign a level of trust to
it, too.

OTOH, it'd probably take a fully-trusted key plus perhaps a marginally
trusted key, to validate an otherwise unknown key signed by both but not
signed by an ultimately-trusted key.

And it'd take more (at least three, maybe five or something, I get the
idea but have forgotten the specifics) marginal trust key signatures to
verify an otherwise unknown key in the absence of a stronger-trust key
signature of it as well.

Don't know or won't say I think means it doesn't count either way, and do
NOT trust probably counts as a negative vote, thus requiring more votes
from at least marginal-trust signatures to validate than it would
otherwise. I'm sure the details are in the gpg docs if you're interested
in reading up...

Meanwhile, the key in question here is the gentoo snapshot-validation
key, which should only be used to sign the tree tarballs, not a personal
key, and gentoo should use a different key to, for instance, sign
personal gentoo dev keys, so you're not likely to see it used to sign
other keys and the above web-of-trust stuff doesn't matter so much in
this case.

OTOH... (more below)

> Please note that the shown key validity is not necessarily correct
> unless you restart the program.

> gpg> check uid Gentoo Portage Snapshot Signing Key
> (Automated Signing Key)
> sig!3 96D8BF6D 2011-11-25 [self-signature]
> 6 signatures not checked due to missing keys

> I'm not sure how to short of a reboot 'restart the program'

That's simply saying that you're in gpg interactive mode, and any edits
you make in that gpg session won't necessarily show up or become
effective until you quit gpg and start a new session.

For example, I believe if you change the level of trust of some key, then
in the same gpg interactive session check the validity of another key
that the first one signed, the edit to the trust level of the first key
won't necessarily be reflected in the validity assigned to the second key
signed by the first. If you quit gpg it'll write the change you made,
and restarting gpg should then give you an accurate assessment of the
second key, reflecting the now saved change you made to the trust level
of the first key that signed the second.

> nor what the line [means:]
>
> 6 signatures not checked due to missing keys

That simply indicates that the gentoo key is signed by a bunch of (six)
others, probably gentoo infra, perhaps the foundation, etc, that if you
had a larger web of trust already built, would vouch for the validity of
the portage snapshotting key. Since you don't have that web of trust
built yet, you gotta do without, but you gotta start somewhere...

... Which is the "more below" I referred to above. The snapshot-
validation key shouldn't be used to sign other keys, because that's not
its purpose. Restricting a key to a single purpose helps a bit to keep
it from leaking, but more importantly, restricts the damage should it
indeed leak. If the snapshotting key gets stolen, it means snapshots
signed by it can be no longer trusted, but since it's not used to sign
other keys, at least the thief can't use the stolen key to vouch for
other keys, because the key isn't used for that.

At least... he couldn't if you hadn't set the key to ultimate trust, that
you indeed trust it to vouch for other keys, alone, without any other
vouching for them as well.

So I'd definitely recommend editing that key again, and reducing the
trust level. I /think/ you can actually set it to do NOT trust for the
purpose of signing other keys, since that is indeed the case, without
affecting its usage for signing the portage tree snapshots. However, I'm
not positive of that. I'd test that to be sure, and simply set it back
to "don't want to say" or to "marginally", if that turns out to be
required to validate the snapshot with it. (Tho I don't believe it
should, because that would break the whole way the web of trust is
supposed to work and the concept of using a key for only one thing, not
letting you simply accept the signature for content signing, without also
requiring you to accept it as trustworthy for signing other keys.)

I believe I have a different aspect of your post to reply to as well, but
that can be a different reply...

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

1i5t5.duncan at cox

Aug 7, 2014, 2:18 PM

Post #21 of 23 (7471 views)

Permalink

Mark Knecht posted on Thu, 07 Aug 2014 11:16:23 -0700 as excerpted:

> So that's all looking pretty good, as a first step. If it's a matter of
> 3 1/2 minutes instead of 1-2 minutes then I can live with that part.
> However that's just (I think) the portage tree and not signed source
> code, correct?

[I just posted a reply to the gpg specific stuff.]

Technically correct, but not really so in implementation. See below...

> Now, is the idea that I have a validated portage snapshot at this point
> and stiff have to actually get the code using the regular emerge which
> will do the checking because I have:
>
> FEATURES="buildpkg strict webrsync-gpg"

No... It doesn't work that way.

> I don't see any evidence that emerge checked what it downloaded, but
> maybe those checks are only done when I really build the code?

Here's what happens.

FEATURES=webrsync-gpg simply tells the webrsync stuff to gpg-verify the
snapshot-tarball that webrsync downloads. Without that, it'd still
download it the same, but it wouldn't verify the signature. This allows
people who use the webrsync only because they're behind a firewall that
wouldn't allow normal rsync, but who don't care about the gpg signing
security stuff, to use the same tool as the people who actually use
webrsync for the security aspect, regardless of whether they could use
normal rsync or not.

So that gets you a signed and verified tree. Correct so far.

But as part of that tree, there are digest files for each package that
verify the integrity of the ebuild as well as of the sources tarballs
(distfiles).

Now it's important to grasp the difference between gpg signing and simple
hash digests, here.

Anybody with the appropriate tools (md5sum, for example, does md5 hashes,
but there's sha and other hashes as well, and the portage tree uses
several hash algorithms in case one is broken) can take a hash of a file,
and provided it's exactly the same bit-for-bit file they should get
exactly the same hash.

In fact, that's how portage checks the hashes of both the ebuild files
and the distfiles it uses, regardless of this webrsync-gpg stuff. The
tree ships the hash values that the gentoo package maintainer took of the
files in its digest files, and portage takes its own hash of the files
and compares it to the hash value stored in the digest files. If they
match, portage is happy. If they don't, depending on how strict you have
portage set to be (FEATURES=strict), it will either warn about (without
strict) or entirely refuse to merge that package (with strict), until
either the digest is updated, or a new file matching the old digest is
downloaded.

So far so good, but while the hashes protect against accidental damage as
the file was being downloaded, because anyone can take a hash of the
file, without something stronger, if say one of the mirror operators was
a bad guy, they could replace the files with hacked files and as long as
they replaced the digest files with the new ones they created for the
hacked files at the same time, portage wouldn't know.

So while hashes/digests alone protect quite well from accidental damage,
they can't protect, by themselves, from deliberate replacement of those
files with malware infested copies.

Which is where the gpg signed tree snapshots come in. But before we can
understand how they help, we need to understand how gpg signing differs
from simple hashes.

PGP, gpg, and various other public/private-pair key signing (and
encryption) take advantage of a particular mathematical relationship
property between the public and private keys. I'm not a cryptographer
nor a mathematician, so I'm content to leave it at that rather handwavy
assertion and not get into the details, but enough people I trust say the
same thing about the details, and enough of our modern Internet banking
and the like, depends upon the same idea, that I'm relatively confident
in the general principle, at least.

It works like this. People keep the private key from the pair private --
if it gets out, they've lost the secret. But people publish the public
half of the key. The relationship of the keys is such that people can't
figure out the private key from the public key, but if you have the
private key, you can sign stuff with it, and people with the public key
can verify the signature and thus trust that it really was the person
with that key that signed the content. Similarly, people can use the
public key to encrypt something, and only the person with the private key
will be able to decrypt it -- having the public key doesn't help.

Actually, as I understand it signing is simply a combination of hashing
and encryption, such that a hash of the content to be signed is taken,
and then that hash is encrypted with the private key. Now anyone with
the public key can "decrypt" the hash and verify the content with it,
thereby verifying that the private key used to sign the content by
encrypting the hash was the one used. If some other key had been used,
attempting to decrypt the hash with an unmatched public key would simply
produce gibberish, and the supposedly "decrypted" hash wouldn't be the
hash produced when checking the content, thereby failing to verify that
the signed content actually came from the person that it was claimed to
have come from.

OK, we've now established that hashes simply verify that the content
didn't get modified in transit, but they do NOT by themselves verify who
SENT that content, so indeed, a man-in-the-middle could have replaced
BOTH the content and the hash, and someone relying on just hashes
couldn't tell the difference.

And we've also established that a signature verifies that the content
actually came from the person who had the private key matching the public
key used to verify it, by mechanism of encrypting the hash of that
content with the private key, so only by "decrypting" it with the
matching public key, does the hash of the content match the one taken at
the other end and encrypted with the private key.

*NOW* we're equipped to see how the portage tree snapshot signing method
actually allows us to verify distfiles as well. Because the tree
includes digests that we can now verify came from our trusted source,
gentoo, NOW those digests can be used to verify the distfiles, because
the digests were part of the signed tree and nobody could tamper with
that signed tree including those digests without detection.

If our nefarious gentoo mirror operator tried to switch out the source
tarballs AND the digests, he could do so for normal rsync users, and for
webrsync users not doing gpg verification, without detection. But should
he try that with someone that's using webrsync-gpg, he has no way to sign
the tampered with tarball with the correct private key since he doesn't
have it, and those using webrsync with FEATURES=webrsync-gpg would detect
the tampered tarball as portage (via webrsync, via eix in your case)
would reject that tarball as unverified.

So the hash-digest method used to protect ordinary rsync users (and
webrsync users without webrsync-gpg turned on) from ACCIDENTAL damage,
now protects webrsync-gpg users from DELIBERATE man-in-the-middle attacks
as well, not because the digests themselves are different, but because we
can now trust and verify that they came from a legitimate source.

Tho it should be noted that "legitimate source" is defined as anyone
having access to that that private signing key. So should someone breakin
to the snapshotting server and steal that private key doing the signing,
they now become a "legitimate source" as far as webrsync-gpg is concerned.

So where does that leave us in practice?

Basically here:

You're now verifying that the snapshot tarballs are coming from a source
with the private signing key, and we're assuming that gentoo security
hasn't been broken and thus that only gentoo's snapshot signing servers
(and their admins, of course) have access to the private signing key,
which in turn means we're assuming the machine with that signing key must
be gentoo, and thus that the snapshotted tarballs are legit.

But it's actually webrsync in combination with FEATURES=webrsync-gpg
that's doing that verification.

Once the verified tarball is actually unpacked on our system, portage
operate just as it normally does, simply verifying the usual hash digests
against the ebuilds and the distfiles /exactly/ as it normally would.

Repeating in different words to hopefully ensure it's understood:

It's *ONLY* the fact that we have actually gpg-verified that snapshot
tarball and thus the digests within it, that gives us any more security
than an ordinary rsync user. After that's downloaded, verified and
unpacked, portage operates exactly as it normally does.

Meanwhile, part of that normal operation includes FEATURES=strict, if
you've set it, which causes portage to refuse to merge the package if
those digests don't match. But that part of things is just normal
portage operation. Rsync users get it too -- they just don't have the
additional assurance that those digest files actually came from gentoo
(or at least from someone with gentoo's private signing key), that
webrsync with FEATURES=webrsync-gpg provides.

(Meanwhile, one further personal note FWIW. You may think that all these
long explanations take quite some time to type up, and you'd be correct.
But don't make the mistake of thinking that I don't get a benefit from it
myself. My dad was a teacher, and one of the things he used to say that
I've found to be truer than true, is that the best way to /learn/
something is to try to teach it to someone. That's exactly what I'm
doing, and all the unexpected questions and corner cases that I'd have
never thought about on my own, that people bring up and force me to think
about in ordered to answer them, help me improve my own previously more
handwavy and fuzzy "general concept" understanding as well. I'm much
more confident in my own understanding of the general public/private key
concepts, how gpg actually uses them and how its web-of-trust works, and
more specifically, how portage can use that via webrsync-gpg to actually
improve the gentooer's own security, than I ever was before.

And it has been quite some time since I worked with gpg and saw it in
interactive mode like that, too, and it turns out that in the intervening
years, I've actually understood quite a bit more about how it all works
than I did back then, thus my ability to dig that all up and present it
here, while back a few years ago, I was just as clueless about how all
that web-of-trust stuff worked, and make exactly the same mistake of
"ultimately trusting" the distro's package-signing key, for exactly the
same reasons. Turns out I absorbed rather more from all those security
and encryption articles I've read over the years than I realized, but it
actually took my replies right here in this thread to lay it all out
logically so I too realized how much more I understand what's going on
now, than I did back then.)

So... Thanks for the thread! =:^)

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman

Re: Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

markknecht at gmail

Aug 8, 2014, 11:34 AM

Post #22 of 23 (7470 views)

Permalink

Hi Duncan,
Responding to one thing here, the rest in-line:

[QUOTE]
(Meanwhile, one further personal note FWIW. You may think that all these
long explanations take quite some time to type up, and you'd be correct.
But don't make the mistake of thinking that I don't get a benefit from it
myself. My dad was a teacher, and one of the things he used to say that
I've found to be truer than true, is that the best way to /learn/
something is to try to teach it to someone.
[/QUOTE]

I couldn't agree more and appreciate your efforts. And even if I might
already understand some of what you document I'm sure there are
others that come later looking for answers who get lots from these
conversations, solve problems and we never hear about it. Anyway,
a big thanks.

On Thu, Aug 7, 2014 at 2:18 PM, Duncan <1i5t5.duncan@cox.net> wrote:
> Mark Knecht posted on Thu, 07 Aug 2014 11:16:23 -0700 as excerpted:
>
>> So that's all looking pretty good, as a first step. If it's a matter of
>> 3 1/2 minutes instead of 1-2 minutes then I can live with that part.
>> However that's just (I think) the portage tree and not signed source
>> code, correct?
>
> [I just posted a reply to the gpg specific stuff.]
>
> Technically correct, but not really so in implementation. See below...
>
>> Now, is the idea that I have a validated portage snapshot at this point
>> and stiff have to actually get the code using the regular emerge which
>> will do the checking because I have:
>>
>> FEATURES="buildpkg strict webrsync-gpg"
>
> No... It doesn't work that way.
>
>> I don't see any evidence that emerge checked what it downloaded, but
>> maybe those checks are only done when I really build the code?
>
> Here's what happens.
>
> FEATURES=webrsync-gpg simply tells the webrsync stuff to gpg-verify the
> snapshot-tarball that webrsync downloads. Without that, it'd still
> download it the same, but it wouldn't verify the signature. This allows
> people who use the webrsync only because they're behind a firewall that
> wouldn't allow normal rsync, but who don't care about the gpg signing
> security stuff, to use the same tool as the people who actually use
> webrsync for the security aspect, regardless of whether they could use
> normal rsync or not.
>

And to clarify, I believe this step is responsible for putting into place on
a Gentoo machine much of what's in /usr/portage, most specifically in the
app categorization directories. In the old days the Gentoo Install Guide
used to have us download the portage snapshots for a location such as

http://distfiles.gentoo.org/snapshots/

That's now been replaced by a call to emerge-webrsync so newbies
might not have that view. Additionally, even if we're downloading the
snapshot tarball it appears, at least on my system, it's deleted after
it's expanded/ Or at least it's not showing up in a locate command.

> So that gets you a signed and verified tree. Correct so far.
>
> But as part of that tree, there are digest files for each package that
> verify the integrity of the ebuild as well as of the sources tarballs
> (distfiles).
>

Yep.

> Now it's important to grasp the difference between gpg signing and simple
> hash digests, here.
>
> Anybody with the appropriate tools (md5sum, for example, does md5 hashes,
> but there's sha and other hashes as well, and the portage tree uses
> several hash algorithms in case one is broken) can take a hash of a file,
> and provided it's exactly the same bit-for-bit file they should get
> exactly the same hash.
>
> In fact, that's how portage checks the hashes of both the ebuild files
> and the distfiles it uses, regardless of this webrsync-gpg stuff. The
> tree ships the hash values that the gentoo package maintainer took of the
> files in its digest files, and portage takes its own hash of the files
> and compares it to the hash value stored in the digest files. If they
> match, portage is happy. If they don't, depending on how strict you have
> portage set to be (FEATURES=strict), it will either warn about (without
> strict) or entirely refuse to merge that package (with strict), until
> either the digest is updated, or a new file matching the old digest is
> downloaded.
>
> So far so good, but while the hashes protect against accidental damage as
> the file was being downloaded, because anyone can take a hash of the
> file, without something stronger, if say one of the mirror operators was
> a bad guy, they could replace the files with hacked files and as long as
> they replaced the digest files with the new ones they created for the
> hacked files at the same time, portage wouldn't know.
>
> So while hashes/digests alone protect quite well from accidental damage,
> they can't protect, by themselves, from deliberate replacement of those
> files with malware infested copies.
>
> Which is where the gpg signed tree snapshots come in. But before we can
> understand how they help, we need to understand how gpg signing differs
> from simple hashes.
>

Some years ago (1997/98) I purchased one of Bruce Schneier's books - looking
at Amazon I recollect "Applied Cryptography: Protocols, Algorithms, and
Source Code in C" - so I've been through a lot of this in the area of
semiconductor
design. (5C Encryption model for 'protecting' movie content. What a joke...)

> PGP, gpg, and various other public/private-pair key signing (and
> encryption) take advantage of a particular mathematical relationship
> property between the public and private keys. I'm not a cryptographer
> nor a mathematician, so I'm content to leave it at that rather handwavy
> assertion and not get into the details, but enough people I trust say the
> same thing about the details, and enough of our modern Internet banking
> and the like, depends upon the same idea, that I'm relatively confident
> in the general principle, at least.
>
> It works like this. People keep the private key from the pair private --
> if it gets out, they've lost the secret. But people publish the public
> half of the key. The relationship of the keys is such that people can't
> figure out the private key from the public key, but if you have the
> private key, you can sign stuff with it, and people with the public key
> can verify the signature and thus trust that it really was the person
> with that key that signed the content. Similarly, people can use the
> public key to encrypt something, and only the person with the private key
> will be able to decrypt it -- having the public key doesn't help.
>
> Actually, as I understand it signing is simply a combination of hashing
> and encryption, such that a hash of the content to be signed is taken,
> and then that hash is encrypted with the private key. Now anyone with
> the public key can "decrypt" the hash and verify the content with it,
> thereby verifying that the private key used to sign the content by
> encrypting the hash was the one used. If some other key had been used,
> attempting to decrypt the hash with an unmatched public key would simply
> produce gibberish, and the supposedly "decrypted" hash wouldn't be the
> hash produced when checking the content, thereby failing to verify that
> the signed content actually came from the person that it was claimed to
> have come from.
>

If I recall correctly the flow looks like:

File -> (Sender Private/Receiver Public) -> Encrypted File

Encrypted File -> (Sender Public/Receiver Private) -> File

and this should be safe, albeit Rich's comment early on was

"3. Have an army of the best cryptographers in the world, etc."

coupled with lots of compute power leaves me with little doubt it's
not a 100% thing...

>
> OK, we've now established that hashes simply verify that the content
> didn't get modified in transit, but they do NOT by themselves verify who
> SENT that content, so indeed, a man-in-the-middle could have replaced
> BOTH the content and the hash, and someone relying on just hashes
> couldn't tell the difference.
>
> And we've also established that a signature verifies that the content
> actually came from the person who had the private key matching the public
> key used to verify it, by mechanism of encrypting the hash of that
> content with the private key, so only by "decrypting" it with the
> matching public key, does the hash of the content match the one taken at
> the other end and encrypted with the private key.
>
> *NOW* we're equipped to see how the portage tree snapshot signing method
> actually allows us to verify distfiles as well. Because the tree
> includes digests that we can now verify came from our trusted source,
> gentoo, NOW those digests can be used to verify the distfiles, because
> the digests were part of the signed tree and nobody could tamper with
> that signed tree including those digests without detection.
>

Correct. Hashes for all that stuff is in the Manifest files and I don't create
my own Manifests ever.

> If our nefarious gentoo mirror operator tried to switch out the source
> tarballs AND the digests, he could do so for normal rsync users, and for
> webrsync users not doing gpg verification, without detection. But should
> he try that with someone that's using webrsync-gpg, he has no way to sign
> the tampered with tarball with the correct private key since he doesn't
> have it, and those using webrsync with FEATURES=webrsync-gpg would detect
> the tampered tarball as portage (via webrsync, via eix in your case)
> would reject that tarball as unverified.
>

Well, maybe yes, maybe no as per the comment above, but agreed in general.

> So the hash-digest method used to protect ordinary rsync users (and
> webrsync users without webrsync-gpg turned on) from ACCIDENTAL damage,
> now protects webrsync-gpg users from DELIBERATE man-in-the-middle attacks
> as well, not because the digests themselves are different, but because we
> can now trust and verify that they came from a legitimate source.
>
> Tho it should be noted that "legitimate source" is defined as anyone
> having access to that that private signing key. So should someone breakin
> to the snapshotting server and steal that private key doing the signing,
> they now become a "legitimate source" as far as webrsync-gpg is concerned.
>

Yep.

>
> So where does that leave us in practice?
>
> Basically here:
>
> You're now verifying that the snapshot tarballs are coming from a source
> with the private signing key, and we're assuming that gentoo security
> hasn't been broken and thus that only gentoo's snapshot signing servers
> (and their admins, of course) have access to the private signing key,
> which in turn means we're assuming the machine with that signing key must
> be gentoo, and thus that the snapshotted tarballs are legit.
>
> But it's actually webrsync in combination with FEATURES=webrsync-gpg
> that's doing that verification.
>
> Once the verified tarball is actually unpacked on our system, portage
> operate just as it normally does, simply verifying the usual hash digests
> against the ebuilds and the distfiles /exactly/ as it normally would.
>

Understood.

> Repeating in different words to hopefully ensure it's understood:
>
> It's *ONLY* the fact that we have actually gpg-verified that snapshot
> tarball and thus the digests within it, that gives us any more security
> than an ordinary rsync user. After that's downloaded, verified and
> unpacked, portage operates exactly as it normally does.
>
>
> Meanwhile, part of that normal operation includes FEATURES=strict, if
> you've set it, which causes portage to refuse to merge the package if
> those digests don't match. But that part of things is just normal
> portage operation. Rsync users get it too -- they just don't have the
> additional assurance that those digest files actually came from gentoo
> (or at least from someone with gentoo's private signing key), that
> webrsync with FEATURES=webrsync-gpg provides.
>

Yep, I set that first before I got the gpg stuff working. I'll leave
it in place
for now.

>
> (Meanwhile, one further personal note FWIW. You may think that all these
> long explanations take quite some time to type up, and you'd be correct.
> But don't make the mistake of thinking that I don't get a benefit from it
> myself. My dad was a teacher, and one of the things he used to say that
> I've found to be truer than true, is that the best way to /learn/
> something is to try to teach it to someone. That's exactly what I'm
> doing, and all the unexpected questions and corner cases that I'd have
> never thought about on my own, that people bring up and force me to think
> about in ordered to answer them, help me improve my own previously more
> handwavy and fuzzy "general concept" understanding as well. I'm much
> more confident in my own understanding of the general public/private key
> concepts, how gpg actually uses them and how its web-of-trust works, and
> more specifically, how portage can use that via webrsync-gpg to actually
> improve the gentooer's own security, than I ever was before.
>
> And it has been quite some time since I worked with gpg and saw it in
> interactive mode like that, too, and it turns out that in the intervening
> years, I've actually understood quite a bit more about how it all works
> than I did back then, thus my ability to dig that all up and present it
> here, while back a few years ago, I was just as clueless about how all
> that web-of-trust stuff worked, and make exactly the same mistake of
> "ultimately trusting" the distro's package-signing key, for exactly the
> same reasons. Turns out I absorbed rather more from all those security
> and encryption articles I've read over the years than I realized, but it
> actually took my replies right here in this thread to lay it all out
> logically so I too realized how much more I understand what's going on
> now, than I did back then.)
>
> So... Thanks for the thread! =:^)
>
> --
> Duncan - List replies preferred. No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master." Richard Stallman
>
>

Re: "For What It's Worth" (or How do I know my Gentoo source code hasn't been messed with?) [ In reply to ]

1i5t5.duncan at cox

Aug 8, 2014, 6:38 PM

Post #23 of 23 (7470 views)

Permalink

Mark Knecht posted on Fri, 08 Aug 2014 11:34:54 -0700 as excerpted:

> On Thu, Aug 7, 2014 at 2:18 PM, Duncan <1i5t5.duncan@cox.net> wrote:
>> Mark Knecht posted on Thu, 07 Aug 2014 11:16:23 -0700 as excerpted:
>>
>>> I don't see any evidence that emerge checked what it downloaded, but
>>> maybe those checks are only done when I really build the code?
>>
>> Here's what happens.
>>
>> FEATURES=webrsync-gpg simply tells the webrsync stuff to gpg-verify the
>> snapshot-tarball that webrsync downloads. Without that, it'd still
>> download it the same, but it wouldn't verify the signature. This
>> allows people who use the webrsync only because they're behind a
>> firewall that wouldn't allow normal rsync, but who don't care about the
>> gpg signing security stuff, to use the same tool as the people who
>> actually use webrsync for the security aspect, regardless of whether
>> they could use normal rsync or not.
>>
> And to clarify, I believe this step is responsible for putting into
> place on a Gentoo machine much of what's in /usr/portage, most
> specifically in the app categorization directories.

Yes. It's basically the entire $PORTDIR tree (/usr/portage/ by default),
the app categories and ebuilds plus digest files and patches, eclasses,
metadata, the profiles, the whole thing.

That's what emerge sync would normally update (via rsync), and
emerge-webrsync replaces the normal emerge sync with a tarball download,
signature verify if FEATURES=webrsyncgpg, and tarball extraction to
$PORTDIR (while normally /usr/portage/, my $PORTDIR is set to put it
elsewhere).

The only bits of $PORTDIR that wouldn't be included would be $DISTDIR
(/usr/portage/distfiles/ by default, but again I have mine set to
something else), as source files are downloaded and hash-verified against
against the hash-digest stored in the digest files (which are part of the
signed tarball), and $PKGDIR (/usr/portage/packages/ by default, but
again, I've set the variable to put them elsewhere), since that's binpkgs
that portage creates if you have FEATURES=buildpkg or FEATURES=buildsyspkg
set.

Additionally, anything else that you configure to be placed in $PORTDIR
won't be in the tarball, as you've configured that for yourself. Here, I
have layman's overlays under $PORTDIR as well (the storage setting in
layman.conf, by default set to /var/lib/layman), with an appropriate
rsync-exclude set so emerge sync doesn't clear them out when I sync.
Were I to switch to webrsync I might have to do something different as I
guess webrsync would clear them out.

Which reminds me, in all the discussion so far we've not talked about
overlays or layman. But since that is optional, it can be treated as a
separate topic. Suffice it to say here that the webrsync discussion
does /not/ cover overlays, etc, only the main gentoo tree.

> In the old days the Gentoo Install Guide used to have us download the
> portage snapshots for a location such as
>
> http://distfiles.gentoo.org/snapshots/
>
> That's now been replaced by a call to emerge-webrsync so newbies might
> not have that view.

Good point. I had noticed that change in passing when I found and
referenced the handbook webrsync stuff too, but didn't think it worth
mentioning. But you're correct, without the perspective of what it
replaced, newbies might miss the connection.

> Additionally, even if we're downloading the snapshot
> tarball it appears, at least on my system, it's deleted after it's
> expanded/ Or at least it's not showing up in a locate command.

Interesting. Deleting by default after extraction does make sense,
however, since otherwise you'd have an ever-growing cache of mostly
identical content with only incremental change, tho I imagine there's
some sort of config option to turn it off, in case you don't want it
deleted.

Tho I don't use locate here and in fact don't even have it installed.
I never found it particularly useful. But are you sure locate would show
it anyway, given that locate only knows about what is indexed, and the
indexing only runs periodically, once a day or week or some such? If it
hasn't indexed files since you started doing the emerge-webrsync thing,
it probably won't know anything about them, even if they are kept.

(Actually, that was my problem with locate in the first place. My
schedule is never regular enough to properly select a time when the
computer will be on to do the indexing, yet I won't be using it for
something else so it can do the indexing without bothering whatever else
I'm doing. Additionally, since it only knows about what it has already
indexed, I could never completely rely on it having indexed the file I
was looking for anyway, so it was easier to simply forget about locate
and to use other means to find files. So at some point, when I was doing
an update and the locate/slocate/whatever package was set to update,
since I had never actually used it in years, I just decided to unmerge it
instead. That must have been years ago now, and I've never missed it...)

--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman