Mailing List Archive

Incoming NATTkA upgrade
Hello, everyone.

TL;DR:

1. There has been a few NATTkA misfires around 2 PM UTC today.
I'm sorry for the noise.

2. In the next hour, a major NATTkA + pkgcore upgrade should roll out.
No problems are expected but please contact me if you see weird
behavior after the upgrade (especially incorrect sanity-check
results).

3. A workaround has been added that should hopefully finally fix
occasional misbehavior due to Bugzilla race conditions. As a side
effect, NATTkA may be a bit slower in responding to new bugs (up to
4 minutes of delay).

Full explanation follows.


Infra's been running an old version of NATTkA for quite some time.
The previous upgrade attempt (that involved an incompatible pkgcheck API
change) failed due to some cryptic bugs. A lot of stable/keywording
requests suddenly started failing -- and it seemed that pkgcheck was
checking keyworded ebuilds in the temporary against old dependencies
in /usr/portage.

I've been doing some new development in NATTkA today, and in order to
deploy it cleanly I've finally decided to try figuring out what's wrong
with new NATTkA + pkgcore. I've installed the new versions on martin
(the Infra host that used to run NATTkA in the past), and started
testing them.

I didn't notice that puppet has failed to remove the old NATTkA cronjob
from martin. So when NATTkA was installed again, the cronjob started
running the broken NATTkA version, and it started fighting with
the correct instance over bugs. As a result, a few bugs has seen ping-
pong between sanity-check+ and sanity-check- results. After noticing
the problem, I've removed the old cronjob. I apologize for the bugspam
caused by this.

Good news is that I've discovered that upgrading to the latest ~arch
pkgcore & co. (unmasked versions) resolves the problem in question.
Since NATTkA is run on a different host than other services requiring
old pkgcore, I am going to deploy the full set of new versions shortly.
The initial testing run didn't yield any suspicious results, so
hopefully there will be no major problems this time.

The new version also includes a workaround for weird NATTkA behavior --
you might have noticed in the past that NATTkA was readding arch teams
to fixed stabilization requests, or that today it reverted 'package
list' to an earlier state while expanding it. I've been trying to
figure out what's wrong with NATTkA's logic for a long time, and I've
finally came to the conclusion that the problem is actually in Bugzilla.

I haven't verified the exact cause but it's most likely that Bugzilla is
executing multiple SELECT queries while performing the bug search,
and therefore could end up with combination of bug properties before
and after an update. This is the only way I can explain bug #779535.
In a single action, CC-ARCHES was added to the bug and the package list
was changed. However, NATTkA has reverted to the old package list while
expanding -- which can happen only if the bug had CC-ARCHES already.
Both keywords and package list is grabbed from Bugzilla via a single
REST API query, so my only explanation for this is that Bugzilla API has
returned new keywords but old package list.

To avoid this, NATTkA now skips bugs that were updated later than 60
seconds prior to running the search. These bugs will be deferred to
the next run (i.e. 4 minutes later), and Bugzilla should sync up until
then. Of course, this is going to work only if the 'last change time'
field is updated no later than other bug data.

If you have any questions or problems, please do not hesitate to contact
me or report a bug (either on Gentoo Bugzilla, or on NATTkA's GitHub
issue tracker). That said, I realize there's a quite a number of
problems reported already, and I hope I'll be able to start addressing
them ~next month.

[1] https://bugs.gentoo.org/779535#c8

--
Best regards,
Micha? Górny