Mailing List Archive: unique_id discussion/problem

unique_id discussion/problem

Jun 4, 2003, 9:52 AM

Post #1 of 14 (2769 views)

Hello,

We're still having issues with unique_id's not always being
unique, so - in the past there have been 2 proposed solutions,
the first to just use the message_idnr (which already exists, is
guaranteed unique and takes no additional db storage space), the
second is to use UUID's (universally unique id's). I'd be glad
to work on one or the other, but just wanted to head down the
right path.

Is there any good reason to not use the message_idnr? It looks
like the unique_id is sent right to the pop3 client in response to
a UIDL command, so it might be handy to still have that present for
migration purposes (if someone wanted to load the unique_id's with
the same values as their previos mail server had) - what if it
uses unique_id if present, and message_idnr if NULL (or simply save
message_idnr into that field, checking for duplicates)?

UUID's would work, but aside from seeming to be almost superfluous
overkill, there are a couple issues to work out, namely the proper
place for non-volitale storage of state info (disk file vs. database),
and testing on mulitple platforms. I've been referring to the draft at http://lists.research.netsol.com/pipermail/urn-nid/2002-September/000323.html
as a reference. I suppose if we didn't use real mac addr's for the
node part, but a randomly generated number, there would be no platform
compatibility issues (ie. how to lookup the mac addr). The
implimentation included in the above url says it works for linux and
windows, but I can't test anything but linux myself.

Comments / etc.?

Jesse

--
Jesse Norell
jesse (at) kci.net

RE: unique_id discussion/problem [ In reply to ]

jesse at kci

Jun 11, 2003, 3:17 PM

Post #2 of 14 (2730 views)

Permalink

Hello,

I'm working on implimenting UUID's into dbmail, and need both
a place for non-volitale storage, for which I plan on using a
/etc/dbmail/ directory (with a "nodeid" and "state" file), and
need an inter-process locking mechanism for which I'm planning on
using flock() on the state file. Does anyone see any obvious
red flags going up? Is there a better/more portable/whatever
way I should be locking the file?

On the issue of unique_id in general, I think our whole problem
was because we added constraints to guarantee that that field actually
was unique - that is not necessary, and even arguably wrong. The
rfc actually says that a client should be able to handle multiple
messages with the same unique id if it's a duplicate message. This
consideration may actually come into play if/when there are shared
folders, if you can move messages from one folder to a shared folder
so... just don't actually force unique_id to be unique. :)

I plan on making one function to generate uuid's (for unique_id or
anywhere else, if someone needs them), and then make all the pop3
stuff use it everywhere it currently generates a unique_id (in
multiple functions).

Also, with having an /etc/dbmail/ directory, should that be the
default location for dbmail.conf? Debian packages already do that,
and it's cleaner if there are multiple conf/etc. files.

Jesse

---- Original Message ----
From: Jesse Norell <jesse@kci.net>
To: jesse@kci.net
Subject: unique_id discussion/problem
Sent: Wed, 4 Jun 2003 10:52:58 -0600

> Hello,
>
> We're still having issues with unique_id's not always being
> unique, so - in the past there have been 2 proposed solutions,
> the first to just use the message_idnr (which already exists, is
> guaranteed unique and takes no additional db storage space), the
> second is to use UUID's (universally unique id's). I'd be glad
> to work on one or the other, but just wanted to head down the
> right path.
>
> Is there any good reason to not use the message_idnr? It looks
> like the unique_id is sent right to the pop3 client in response to
> a UIDL command, so it might be handy to still have that present for
> migration purposes (if someone wanted to load the unique_id's with
> the same values as their previos mail server had) - what if it
> uses unique_id if present, and message_idnr if NULL (or simply save
> message_idnr into that field, checking for duplicates)?
>
> UUID's would work, but aside from seeming to be almost superfluous
> overkill, there are a couple issues to work out, namely the proper
> place for non-volitale storage of state info (disk file vs. database),
> and testing on mulitple platforms. I've been referring to the draft at
> http://lists.research.netsol.com/pipermail/urn-nid/2002-September/000323.html
> as a reference. I suppose if we didn't use real mac addr's for the
> node part, but a randomly generated number, there would be no platform
> compatibility issues (ie. how to lookup the mac addr). The
> implimentation included in the above url says it works for linux and
> windows, but I can't test anything but linux myself.
>
> Comments / etc.?
>
> Jesse
>
>
> --
> Jesse Norell
> jesse (at) kci.net
>
>
-- End Original Message --

--
Jesse Norell
jesse (at) kci.net

Re: RE: unique_id discussion/problem [ In reply to ]

Magnus.Sundberg at dican

Jun 12, 2003, 12:41 AM

Post #3 of 14 (2733 views)

Permalink

Hi,
I have a small question, that I don't understand.
What reason is there to not store this data in the database?

/Magnus

Jesse Norell wrote:
> Hello,
>
> I'm working on implimenting UUID's into dbmail, and need both
> a place for non-volitale storage, for which I plan on using a
> /etc/dbmail/ directory (with a "nodeid" and "state" file), and
> need an inter-process locking mechanism for which I'm planning on
> using flock() on the state file. Does anyone see any obvious
> red flags going up? Is there a better/more portable/whatever
> way I should be locking the file?
>
> On the issue of unique_id in general, I think our whole problem
> was because we added constraints to guarantee that that field actually
> was unique - that is not necessary, and even arguably wrong. The
> rfc actually says that a client should be able to handle multiple
> messages with the same unique id if it's a duplicate message. This
> consideration may actually come into play if/when there are shared
> folders, if you can move messages from one folder to a shared folder
> so... just don't actually force unique_id to be unique. :)
>
> I plan on making one function to generate uuid's (for unique_id or
> anywhere else, if someone needs them), and then make all the pop3
> stuff use it everywhere it currently generates a unique_id (in
> multiple functions).
>
> Also, with having an /etc/dbmail/ directory, should that be the
> default location for dbmail.conf? Debian packages already do that,
> and it's cleaner if there are multiple conf/etc. files.
>
> Jesse
>
>
> ---- Original Message ----
> From: Jesse Norell <jesse@kci.net>
> To: jesse@kci.net
> Subject: unique_id discussion/problem
> Sent: Wed, 4 Jun 2003 10:52:58 -0600
>
>
>>Hello,
>>
>> We're still having issues with unique_id's not always being
>>unique, so - in the past there have been 2 proposed solutions,
>>the first to just use the message_idnr (which already exists, is
>>guaranteed unique and takes no additional db storage space), the
>>second is to use UUID's (universally unique id's). I'd be glad
>>to work on one or the other, but just wanted to head down the
>>right path.
>>
>> Is there any good reason to not use the message_idnr? It looks
>>like the unique_id is sent right to the pop3 client in response to
>>a UIDL command, so it might be handy to still have that present for
>>migration purposes (if someone wanted to load the unique_id's with
>>the same values as their previos mail server had) - what if it
>>uses unique_id if present, and message_idnr if NULL (or simply save
>>message_idnr into that field, checking for duplicates)?
>>
>> UUID's would work, but aside from seeming to be almost superfluous
>>overkill, there are a couple issues to work out, namely the proper
>>place for non-volitale storage of state info (disk file vs. database),
>>and testing on mulitple platforms. I've been referring to the draft at
>>http://lists.research.netsol.com/pipermail/urn-nid/2002-September/000323.html
>>as a reference. I suppose if we didn't use real mac addr's for the
>>node part, but a randomly generated number, there would be no platform
>>compatibility issues (ie. how to lookup the mac addr). The
>>implimentation included in the above url says it works for linux and
>>windows, but I can't test anything but linux myself.
>>
>>Comments / etc.?
>>
>>Jesse
>>
>>
>>--
>>Jesse Norell
>>jesse (at) kci.net
>>
>>
>
> -- End Original Message --
>
>
> --
> Jesse Norell
> jesse (at) kci.net
>
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>

Re: RE: unique_id discussion/problem [ In reply to ]

Magnus.Sundberg at dican

Jun 12, 2003, 4:42 AM

Post #4 of 14 (2736 views)

Permalink

Jesse Norell wrote:
> Hello,
<snip>
> On the issue of unique_id in general, I think our whole problem
> was because we added constraints to guarantee that that field actually
> was unique - that is not necessary, and even arguably wrong. The
> rfc actually says that a client should be able to handle multiple
> messages with the same unique id if it's a duplicate message. This
> consideration may actually come into play if/when there are shared
> folders, if you can move messages from one folder to a shared folder
> so... just don't actually force unique_id to be unique. :)

I beleive Roel mentioned that he was changing the database
structure in version 2.0, with links to a single message, for
performance improvement. This way, there will still only be one
message block.

>
> I plan on making one function to generate uuid's (for unique_id or
> anywhere else, if someone needs them), and then make all the pop3
> stuff use it everywhere it currently generates a unique_id (in
> multiple functions).

Is it the mail sender or the mail receiver that generates the UUID?

>
> Also, with having an /etc/dbmail/ directory, should that be the
> default location for dbmail.conf? Debian packages already do that,
> and it's cleaner if there are multiple conf/etc. files.

How many are the configuration files?

<snip>

Magnus

Re: RE: unique_id discussion/problem [ In reply to ]

aaron at engr

Jun 12, 2003, 7:21 AM

Post #5 of 14 (2723 views)

Permalink

In fact, I would highly recommend that a database is used. I envision a
table that has a row for each server in the cluster and a "uuid prefix" or
something to the like. Synchronization information might also be stored in
this table, such as the IP address of each server as it links up with a
row in the database and the timestamp of when it last attached.

Naturally this table will have to be replicated, and so it should not have
an auto_increment column, but something else more unique. Hostnames or IP
addresses are an obvious answer, if not a good one ;-)

I'm not sure what the "non-volatile" storage is needed for in your
proposal beyond what I see as a unique prefix for each dbmail in the
cluster as it writes to a replicated database server...

Aaron

On Thu, 12 Jun 2003, Magnus Sundberg wrote:

> Hi,
> I have a small question, that I don't understand.
> What reason is there to not store this data in the database?
>
> /Magnus
>
> Jesse Norell wrote:
> > Hello,
> >
> > I'm working on implimenting UUID's into dbmail, and need both
> > a place for non-volitale storage, for which I plan on using a
> > /etc/dbmail/ directory (with a "nodeid" and "state" file), and
> > need an inter-process locking mechanism for which I'm planning on
> > using flock() on the state file. Does anyone see any obvious
> > red flags going up? Is there a better/more portable/whatever
> > way I should be locking the file?
> >
> > On the issue of unique_id in general, I think our whole problem
> > was because we added constraints to guarantee that that field actually
> > was unique - that is not necessary, and even arguably wrong. The
> > rfc actually says that a client should be able to handle multiple
> > messages with the same unique id if it's a duplicate message. This
> > consideration may actually come into play if/when there are shared
> > folders, if you can move messages from one folder to a shared folder
> > so... just don't actually force unique_id to be unique. :)
> >
> > I plan on making one function to generate uuid's (for unique_id or
> > anywhere else, if someone needs them), and then make all the pop3
> > stuff use it everywhere it currently generates a unique_id (in
> > multiple functions).
> >
> > Also, with having an /etc/dbmail/ directory, should that be the
> > default location for dbmail.conf? Debian packages already do that,
> > and it's cleaner if there are multiple conf/etc. files.
> >
> > Jesse
> >
> >
> > ---- Original Message ----
> > From: Jesse Norell <jesse@kci.net>
> > To: jesse@kci.net
> > Subject: unique_id discussion/problem
> > Sent: Wed, 4 Jun 2003 10:52:58 -0600
> >
> >
> >>Hello,
> >>
> >> We're still having issues with unique_id's not always being
> >>unique, so - in the past there have been 2 proposed solutions,
> >>the first to just use the message_idnr (which already exists, is
> >>guaranteed unique and takes no additional db storage space), the
> >>second is to use UUID's (universally unique id's). I'd be glad
> >>to work on one or the other, but just wanted to head down the
> >>right path.
> >>
> >> Is there any good reason to not use the message_idnr? It looks
> >>like the unique_id is sent right to the pop3 client in response to
> >>a UIDL command, so it might be handy to still have that present for
> >>migration purposes (if someone wanted to load the unique_id's with
> >>the same values as their previos mail server had) - what if it
> >>uses unique_id if present, and message_idnr if NULL (or simply save
> >>message_idnr into that field, checking for duplicates)?
> >>
> >> UUID's would work, but aside from seeming to be almost superfluous
> >>overkill, there are a couple issues to work out, namely the proper
> >>place for non-volitale storage of state info (disk file vs. database),
> >>and testing on mulitple platforms. I've been referring to the draft at
> >>http://lists.research.netsol.com/pipermail/urn-nid/2002-September/000323.html
> >>as a reference. I suppose if we didn't use real mac addr's for the
> >>node part, but a randomly generated number, there would be no platform
> >>compatibility issues (ie. how to lookup the mac addr). The
> >>implimentation included in the above url says it works for linux and
> >>windows, but I can't test anything but linux myself.
> >>
> >>Comments / etc.?
> >>
> >>Jesse
> >>
> >>
> >>--
> >>Jesse Norell
> >>jesse (at) kci.net
> >>
> >>
> >
> > -- End Original Message --
> >
> >
> > --
> > Jesse Norell
> > jesse (at) kci.net
> >
> >
> > _______________________________________________
> > Dbmail-dev mailing list
> > Dbmail-dev@dbmail.org
> > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> >
>
>
>
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>

Re: RE: unique_id discussion/problem [ In reply to ]

jesse at kci

Jun 12, 2003, 7:47 AM

Post #6 of 14 (2743 views)

Permalink

Hello,

The uuid generation needs to know the nodeid of the machine, and
save that and some state info (timestamp and a sequence number).
My plan was to make it read the mac addr off a hardware ethernet
card - but if for some reason that doesn't work, or if a user
doesn't want to give that info out in uuids, it would generate a
random nodeid. If this is kept in a database, it would be in a
table consisting of simply the nodeid and it's state info, and if
that entry did not previously exist, the program would create it
(ie. so it can save the updated state info). So.. if it's failing
to get a mac addr from the hardware, then every message that's
inserted ends up generating a new random nodeid and that table
keeps growing forever.

One solution would be the ability to specify a node id in
dbmail.conf, but then you have to make sure it's different on all
your machines, not whatever ships in the default dbmail.conf, and
generally seems like a nuisance from a user-friendliness perspective.
The more things that "just work" out of the box the better, imho.

There's also the issue of inter-process locking of state info.
That could be done in the database too, but would be dependant
on the capabilities of the specific database. Filesystem files just
seem much cleaner for this.

Jesse

---- Original Message ----
From: Magnus Sundberg <dbmail-dev@dbmail.org>
To: dbmail-dev@dbmail.org
Subject: Re: [Dbmail-dev] RE: unique_id discussion/problem
Sent: Thu, 12 Jun 2003 09:41:31 +0200

> Hi,
> I have a small question, that I don't understand.
> What reason is there to not store this data in the database?
>
> /Magnus
>
> Jesse Norell wrote:
> > Hello,
> >
> > I'm working on implimenting UUID's into dbmail, and need both
> > a place for non-volitale storage, for which I plan on using a
> > /etc/dbmail/ directory (with a "nodeid" and "state" file), and
> > need an inter-process locking mechanism for which I'm planning on
> > using flock() on the state file. Does anyone see any obvious
> > red flags going up? Is there a better/more portable/whatever
> > way I should be locking the file?
> >
> > On the issue of unique_id in general, I think our whole problem
> > was because we added constraints to guarantee that that field actually
> > was unique - that is not necessary, and even arguably wrong. The
> > rfc actually says that a client should be able to handle multiple
> > messages with the same unique id if it's a duplicate message. This
> > consideration may actually come into play if/when there are shared
> > folders, if you can move messages from one folder to a shared folder
> > so... just don't actually force unique_id to be unique. :)
> >
> > I plan on making one function to generate uuid's (for unique_id or
> > anywhere else, if someone needs them), and then make all the pop3
> > stuff use it everywhere it currently generates a unique_id (in
> > multiple functions).
> >
> > Also, with having an /etc/dbmail/ directory, should that be the
> > default location for dbmail.conf? Debian packages already do that,
> > and it's cleaner if there are multiple conf/etc. files.
> >
> > Jesse
> >
> >
> > ---- Original Message ----
> > From: Jesse Norell <jesse@kci.net>
> > To: jesse@kci.net
> > Subject: unique_id discussion/problem
> > Sent: Wed, 4 Jun 2003 10:52:58 -0600
> >
> >
> >>Hello,
> >>
> >> We're still having issues with unique_id's not always being
> >>unique, so - in the past there have been 2 proposed solutions,
> >>the first to just use the message_idnr (which already exists, is
> >>guaranteed unique and takes no additional db storage space), the
> >>second is to use UUID's (universally unique id's). I'd be glad
> >>to work on one or the other, but just wanted to head down the
> >>right path.
> >>
> >> Is there any good reason to not use the message_idnr? It looks
> >>like the unique_id is sent right to the pop3 client in response to
> >>a UIDL command, so it might be handy to still have that present for
> >>migration purposes (if someone wanted to load the unique_id's with
> >>the same values as their previos mail server had) - what if it
> >>uses unique_id if present, and message_idnr if NULL (or simply save
> >>message_idnr into that field, checking for duplicates)?
> >>
> >> UUID's would work, but aside from seeming to be almost superfluous
> >>overkill, there are a couple issues to work out, namely the proper
> >>place for non-volitale storage of state info (disk file vs. database),
> >>and testing on mulitple platforms. I've been referring to the draft at
> >>http://lists.research.netsol.com/pipermail/urn-nid/2002-September/000323.html
> >>as a reference. I suppose if we didn't use real mac addr's for the
> >>node part, but a randomly generated number, there would be no platform
> >>compatibility issues (ie. how to lookup the mac addr). The
> >>implimentation included in the above url says it works for linux and
> >>windows, but I can't test anything but linux myself.
> >>
> >>Comments / etc.?
> >>
> >>Jesse
> >>
> >>
> >>--
> >>Jesse Norell
> >>jesse (at) kci.net
> >>
> >>
> >
> > -- End Original Message --
> >
> >
> > --
> > Jesse Norell
> > jesse (at) kci.net
> >
> >
> > _______________________________________________
> > Dbmail-dev mailing list
> > Dbmail-dev@dbmail.org
> > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> >
>
>
>
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>
-- End Original Message --

--
Jesse Norell
jesse (at) kci.net

Re: RE: unique_id discussion/problem [ In reply to ]

jesse at kci

Jun 12, 2003, 7:54 AM

Post #7 of 14 (2737 views)

Permalink

Hello,

> > I plan on making one function to generate uuid's (for unique_id or
> > anywhere else, if someone needs them), and then make all the pop3
> > stuff use it everywhere it currently generates a unique_id (in
> > multiple functions).
>
> Is it the mail sender or the mail receiver that generates the UUID?

What I'm working on is the receiver, ie. anywhere that inserts a
message to the db (primarily dbmail-smtp, but should fix the supplied
conversion programs, too).

> > Also, with having an /etc/dbmail/ directory, should that be the
> > default location for dbmail.conf? Debian packages already do that,
> > and it's cleaner if there are multiple conf/etc. files.
>
> How many are the configuration files?

One conf file right now, and after these changes 2 "etc." files,
i.e. 'nodeid' and 'state'. There could be others in time, to...

--
Jesse Norell
jesse (at) kci.net

Re: RE: unique_id discussion/problem [ In reply to ]

jesse at kci

Jun 12, 2003, 8:11 AM

Post #8 of 14 (2734 views)

Permalink

Hello,

---- Original Message ----
From: Aaron Stone <dbmail-dev@dbmail.org>
To: dbmail-dev@dbmail.org
Subject: Re: [Dbmail-dev] RE: unique_id discussion/problem
Sent: Thu, 12 Jun 2003 07:21:42 -0700 (PDT)

> In fact, I would highly recommend that a database is used. I envision a
> table that has a row for each server in the cluster and a "uuid prefix" or
> something to the like. Synchronization information might also be stored in
> this table, such as the IP address of each server as it links up with a
> row in the database and the timestamp of when it last attached.
>
> Naturally this table will have to be replicated, and so it should not have
> an auto_increment column, but something else more unique. Hostnames or IP
> addresses are an obvious answer, if not a good one ;-)

I don't think ip addresses would be unique enough - some cluster
implimentations have multiple machines with the same address (eg. via
load-balancing hardware switches). Nor hostname (eg. we have multiple
machines for mail.kci.net - while they do have unique hostnames also,
there's no reason they would necessarily have to). The mac addr
seems like the best almost-always-unique identifier that's readily
available cross-platform.

> I'm not sure what the "non-volatile" storage is needed for in your
> proposal beyond what I see as a unique prefix for each dbmail in the
> cluster as it writes to a replicated database server...

Saved state info is basically for rollbacks in time (eg. machine
reboots) and to make multiple uuids generated w/in the same clock
tick be unique (because they're based largely upon time).

> Aaron
>
-- End Original Message --

--
Jesse Norell
jesse (at) kci.net

Re: unique_id discussion/problem [ In reply to ]

lou at 0xffff

Jun 12, 2003, 3:55 PM

Post #9 of 14 (2733 views)

Permalink

Jesse Norell writes:

>
> Hello,
>
> ---- Original Message ----
> From: Aaron Stone <dbmail-dev@dbmail.org>
> To: dbmail-dev@dbmail.org
> Subject: Re: [Dbmail-dev] RE: unique_id discussion/problem
> Sent: Thu, 12 Jun 2003 07:21:42 -0700 (PDT)
>
>> In fact, I would highly recommend that a database is used. I envision a
>> table that has a row for each server in the cluster and a "uuid prefix" or
>> something to the like. Synchronization information might also be stored in
>> this table, such as the IP address of each server as it links up with a
>> row in the database and the timestamp of when it last attached.
>>
>> Naturally this table will have to be replicated, and so it should not have
>> an auto_increment column, but something else more unique. Hostnames or IP
>> addresses are an obvious answer, if not a good one ;-)
>
> I don't think ip addresses would be unique enough - some cluster
> implimentations have multiple machines with the same address (eg. via
> load-balancing hardware switches). Nor hostname (eg. we have multiple
> machines for mail.kci.net - while they do have unique hostnames also,
> there's no reason they would necessarily have to). The mac addr
> seems like the best almost-always-unique identifier that's readily
> available cross-platform.

Why mac addr the entropy is constantly growing, there is nothing more unique
than a generated id, and there are tons of generators out there to do so,
however you can alwayes use specific sequences which are in a way
predictable where you basically know what exactly you're going to get,
synchronizing is a different matter which can be done on the fly, let say
if you have an N machine which joins an A cluster where the machine
identifies itself and waits for a specific id from which it derivates the
sequence factor. With these words I assume that you're aware of such things
like negotiation algorithms and so..

what i personally use: (except postfix and dbmail itself, with pgsql), the
begining was easy, let pgsql handle the sequence generations where _ANY_ one
who have access to the database can alter the sequence generation on the
fly, in other words the app will be able to re-assign new sequence to the
different servers by simply altering the sequence factor which is contained
in the sequence table itself.
Being aware of the algorithm which is used for the generation it can simply
calculate what would be the factor for the next server, and this can be
totally automated and in a way unique.

In my eyes a unique email addrs and login ids is a different case, where the
things get more complicated, but since in my approach I'd prefer to escape
from the collisions which are thereof produced by some not-finished-mad-mah
async replication processes. I'd search for more complex and sofisticated
solution.

However the above also solves the problem if any of those machines have to
work on its own due to link failure or whatever.. we wont get a bloody
collisions since each machine is already using a unique factor for this
generation, the IDs itself doesnt matter the factor is the one that should
be unique and it would be a huge advantage if it's predictable by any of the
servers.

Guys, if I'm being annoying or I'm not writing on the right topic pretty
please let me know, i dont want to be boring and stuff, but again if you
have any agruments against this solution spread them across the list before
jumping into something like UUIDs, not that i have something against it,
just email is so atomic that it'd not need such a complicated solution.

For clustering, here how my stuff work:
Postfix + PgSQL Patch
DBmail + some dumb connection checks. + PgSQL
PgSQL itself is used with PgReplicator.

I use pgsql sequences to generate the ids (which was the first approach with
dbmail) I chose PgSQL because sequences are highly granulated and it's easy
to control them. Each machine in the cluster has it's own ClusterID, aslo
with a kinda HeartBeat monitor it's aware how many servers are there and
what are their IDs, basically reading from a conf file where the primary
source for those settings is a table inside PgSQL, this file is just a
redundant option if somehow PgSQL on this machine fails to respond.
Basically both, database and files are updated at the same time.

I use them in the following order:
mx1: RR(A records) dmx1 and dmx2
mx2: RR(A records) dmx2 and dmx3
so sos

also when i install a dbmail system on a new cluster server, it generates
the PgSQL scheme on the fly, being aware what is the cluster ID which was
negotiated using the HB monitor, also a huge role is played by the
PgReplicator which gives me the ability _NOT_ to replicate sequence and
other tables like postfix aliases (which in fact are totally useless, but
somehow have to tell postfix to shut up with the annoying msg), for the case
this setup is not a free mailserver but a dedicated corporate use, so I dont
have users coming around and registering.

For now I havent seen much problems, not to say any.

One thing is for sure, after the crash I had with MySQL and I'm so staying
away from it, as my CTO's says MySqueel :)

>> I'm not sure what the "non-volatile" storage is needed for in your
>> proposal beyond what I see as a unique prefix for each dbmail in the
>> cluster as it writes to a replicated database server...
>
> Saved state info is basically for rollbacks in time (eg. machine
> reboots) and to make multiple uuids generated w/in the same clock
> tick be unique (because they're based largely upon time).

here you mean fail-over support which is supposed to be handled by the
replication process, or in dbmail itself for maximum portability? or I'm
assuming the wrong?

cheers,
-lou

Re: RE: unique_id discussion/problem [ In reply to ]

aaron at engr

Jun 12, 2003, 8:41 PM

Post #10 of 14 (2737 views)

Permalink

I think that this can be solved pretty simply by having each instance
generate a set of keys and insert them into the database along with a
timestamp. The database can be flushed regularly if these keys are
regenerated every X days / week / something; anything older than this can
be deleted from the table.

Simply inserting the generated key into the database assures its
uniqueness; if the insert fails due to uniqueness constraint, then
someone else has the key! So, generate a new one, insert again... etc.

Actually, that doesn't quite work, does it -- the insert for the same key
can succeed on two machines and then collide upon replication. HMM! This
is the real main problem; you *can't* generate keys without using some
kind of scheme to assure that two machines *can't* generate the same key.

I will assert, however, that MAC addresses are safe by definition. They
are frequently used for nodelocked licenses, for example, and they are
required to be unique [at the very least, within the local collision
domain] by the ARP specifications. The issues surrounding MAC addresses
are well known as well. IP addresses are probably not safe because you
certainly can get machines spoofing and faking for one another in a
failover heartbeat configuration.

I truly don't understand why the filesystem state and whatnot matters
here, because unless I'm truly missing something, what we're looking at is
the need to assume that the two messages inserted into two different
databases on two different dbmail hosts cannot ever have the same message
id. Our proposed approach is the prefix each key with some value that is
unique to the host / instance of dbmail which is doing the inserting and
therefore cannot be shared by any of the other machines in the cluster.

Aaron

On Thu, 12 Jun 2003, Jesse Norell wrote:

>
> Hello,
>
> The uuid generation needs to know the nodeid of the machine, and
> save that and some state info (timestamp and a sequence number).
> My plan was to make it read the mac addr off a hardware ethernet
> card - but if for some reason that doesn't work, or if a user
> doesn't want to give that info out in uuids, it would generate a
> random nodeid. If this is kept in a database, it would be in a
> table consisting of simply the nodeid and it's state info, and if
> that entry did not previously exist, the program would create it
> (ie. so it can save the updated state info). So.. if it's failing
> to get a mac addr from the hardware, then every message that's
> inserted ends up generating a new random nodeid and that table
> keeps growing forever.
>
> One solution would be the ability to specify a node id in
> dbmail.conf, but then you have to make sure it's different on all
> your machines, not whatever ships in the default dbmail.conf, and
> generally seems like a nuisance from a user-friendliness perspective.
> The more things that "just work" out of the box the better, imho.
>
> There's also the issue of inter-process locking of state info.
> That could be done in the database too, but would be dependant
> on the capabilities of the specific database. Filesystem files just
> seem much cleaner for this.
>
> Jesse
>
> ---- Original Message ----
> From: Magnus Sundberg <dbmail-dev@dbmail.org>
> To: dbmail-dev@dbmail.org
> Subject: Re: [Dbmail-dev] RE: unique_id discussion/problem
> Sent: Thu, 12 Jun 2003 09:41:31 +0200
>
> > Hi,
> > I have a small question, that I don't understand.
> > What reason is there to not store this data in the database?
> >
> > /Magnus
> >
> > Jesse Norell wrote:
> > > Hello,
> > >
> > > I'm working on implimenting UUID's into dbmail, and need both
> > > a place for non-volitale storage, for which I plan on using a
> > > /etc/dbmail/ directory (with a "nodeid" and "state" file), and
> > > need an inter-process locking mechanism for which I'm planning on
> > > using flock() on the state file. Does anyone see any obvious
> > > red flags going up? Is there a better/more portable/whatever
> > > way I should be locking the file?
> > >
> > > On the issue of unique_id in general, I think our whole problem
> > > was because we added constraints to guarantee that that field actually
> > > was unique - that is not necessary, and even arguably wrong. The
> > > rfc actually says that a client should be able to handle multiple
> > > messages with the same unique id if it's a duplicate message. This
> > > consideration may actually come into play if/when there are shared
> > > folders, if you can move messages from one folder to a shared folder
> > > so... just don't actually force unique_id to be unique. :)
> > >
> > > I plan on making one function to generate uuid's (for unique_id or
> > > anywhere else, if someone needs them), and then make all the pop3
> > > stuff use it everywhere it currently generates a unique_id (in
> > > multiple functions).
> > >
> > > Also, with having an /etc/dbmail/ directory, should that be the
> > > default location for dbmail.conf? Debian packages already do that,
> > > and it's cleaner if there are multiple conf/etc. files.
> > >
> > > Jesse
> > >
> > >
> > > ---- Original Message ----
> > > From: Jesse Norell <jesse@kci.net>
> > > To: jesse@kci.net
> > > Subject: unique_id discussion/problem
> > > Sent: Wed, 4 Jun 2003 10:52:58 -0600
> > >
> > >
> > >>Hello,
> > >>
> > >> We're still having issues with unique_id's not always being
> > >>unique, so - in the past there have been 2 proposed solutions,
> > >>the first to just use the message_idnr (which already exists, is
> > >>guaranteed unique and takes no additional db storage space), the
> > >>second is to use UUID's (universally unique id's). I'd be glad
> > >>to work on one or the other, but just wanted to head down the
> > >>right path.
> > >>
> > >> Is there any good reason to not use the message_idnr? It looks
> > >>like the unique_id is sent right to the pop3 client in response to
> > >>a UIDL command, so it might be handy to still have that present for
> > >>migration purposes (if someone wanted to load the unique_id's with
> > >>the same values as their previos mail server had) - what if it
> > >>uses unique_id if present, and message_idnr if NULL (or simply save
> > >>message_idnr into that field, checking for duplicates)?
> > >>
> > >> UUID's would work, but aside from seeming to be almost superfluous
> > >>overkill, there are a couple issues to work out, namely the proper
> > >>place for non-volitale storage of state info (disk file vs. database),
> > >>and testing on mulitple platforms. I've been referring to the draft at
> > >>http://lists.research.netsol.com/pipermail/urn-nid/2002-September/000323.html
> > >>as a reference. I suppose if we didn't use real mac addr's for the
> > >>node part, but a randomly generated number, there would be no platform
> > >>compatibility issues (ie. how to lookup the mac addr). The
> > >>implimentation included in the above url says it works for linux and
> > >>windows, but I can't test anything but linux myself.
> > >>
> > >>Comments / etc.?
> > >>
> > >>Jesse
> > >>
> > >>
> > >>--
> > >>Jesse Norell
> > >>jesse (at) kci.net
> > >>
> > >>
> > >
> > > -- End Original Message --
> > >
> > >
> > > --
> > > Jesse Norell
> > > jesse (at) kci.net
> > >
> > >
> > > _______________________________________________
> > > Dbmail-dev mailing list
> > > Dbmail-dev@dbmail.org
> > > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> > >
> >
> >
> >
> >
> > _______________________________________________
> > Dbmail-dev mailing list
> > Dbmail-dev@dbmail.org
> > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> >
> -- End Original Message --
>
>
> --
> Jesse Norell
> jesse (at) kci.net
>
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>

RE: RE: unique_id discussion/problem [ In reply to ]

vlads at tech

Jun 12, 2003, 10:30 PM

Post #11 of 14 (2728 views)

Permalink

Just some thouhts from me.

Is it possible to change the MAC address either in the hardware
or in the packets? If yes then this makes the system vulnearable to the
hackers.

But this idea to pregenerate the unique keys works to my mind.
There is only one instance of dbmail-maintenance that should be
executed now and then. This instance should pregenereate the keys.

best wishes,

Vlads

> -----Original Message-----
> From: dbmail-dev-admin@dbmail.org
> [mailto:dbmail-dev-admin@dbmail.org]On Behalf Of Aaron Stone
> Sent: Friday, June 13, 2003 6:41 AM
> To: dbmail-dev@dbmail.org
> Subject: Re: [Dbmail-dev] RE: unique_id discussion/problem
>
>
> I think that this can be solved pretty simply by having each instance
> generate a set of keys and insert them into the database along with a
> timestamp. The database can be flushed regularly if these keys are
> regenerated every X days / week / something; anything older than this can
> be deleted from the table.
>
> Simply inserting the generated key into the database assures its
> uniqueness; if the insert fails due to uniqueness constraint, then
> someone else has the key! So, generate a new one, insert again... etc.
>
> Actually, that doesn't quite work, does it -- the insert for the same key
> can succeed on two machines and then collide upon replication. HMM! This
> is the real main problem; you *can't* generate keys without using some
> kind of scheme to assure that two machines *can't* generate the same key.
>
> I will assert, however, that MAC addresses are safe by definition. They
> are frequently used for nodelocked licenses, for example, and they are
> required to be unique [at the very least, within the local collision
> domain] by the ARP specifications. The issues surrounding MAC addresses
> are well known as well. IP addresses are probably not safe because you
> certainly can get machines spoofing and faking for one another in a
> failover heartbeat configuration.
>
> I truly don't understand why the filesystem state and whatnot matters
> here, because unless I'm truly missing something, what we're looking at is
> the need to assume that the two messages inserted into two different
> databases on two different dbmail hosts cannot ever have the same message
> id. Our proposed approach is the prefix each key with some value that is
> unique to the host / instance of dbmail which is doing the inserting and
> therefore cannot be shared by any of the other machines in the cluster.
>
> Aaron
>
>
> On Thu, 12 Jun 2003, Jesse Norell wrote:
>
> >
> > Hello,
> >
> > The uuid generation needs to know the nodeid of the machine, and
> > save that and some state info (timestamp and a sequence number).
> > My plan was to make it read the mac addr off a hardware ethernet
> > card - but if for some reason that doesn't work, or if a user
> > doesn't want to give that info out in uuids, it would generate a
> > random nodeid. If this is kept in a database, it would be in a
> > table consisting of simply the nodeid and it's state info, and if
> > that entry did not previously exist, the program would create it
> > (ie. so it can save the updated state info). So.. if it's failing
> > to get a mac addr from the hardware, then every message that's
> > inserted ends up generating a new random nodeid and that table
> > keeps growing forever.
> >
> > One solution would be the ability to specify a node id in
> > dbmail.conf, but then you have to make sure it's different on all
> > your machines, not whatever ships in the default dbmail.conf, and
> > generally seems like a nuisance from a user-friendliness perspective.
> > The more things that "just work" out of the box the better, imho.
> >
> > There's also the issue of inter-process locking of state info.
> > That could be done in the database too, but would be dependant
> > on the capabilities of the specific database. Filesystem files just
> > seem much cleaner for this.
> >
> > Jesse
> >
> > ---- Original Message ----
> > From: Magnus Sundberg <dbmail-dev@dbmail.org>
> > To: dbmail-dev@dbmail.org
> > Subject: Re: [Dbmail-dev] RE: unique_id discussion/problem
> > Sent: Thu, 12 Jun 2003 09:41:31 +0200
> >
> > > Hi,
> > > I have a small question, that I don't understand.
> > > What reason is there to not store this data in the database?
> > >
> > > /Magnus
> > >
> > > Jesse Norell wrote:
> > > > Hello,
> > > >
> > > > I'm working on implimenting UUID's into dbmail, and need both
> > > > a place for non-volitale storage, for which I plan on using a
> > > > /etc/dbmail/ directory (with a "nodeid" and "state" file), and
> > > > need an inter-process locking mechanism for which I'm planning on
> > > > using flock() on the state file. Does anyone see any obvious
> > > > red flags going up? Is there a better/more portable/whatever
> > > > way I should be locking the file?
> > > >
> > > > On the issue of unique_id in general, I think our whole problem
> > > > was because we added constraints to guarantee that that
> field actually
> > > > was unique - that is not necessary, and even arguably wrong. The
> > > > rfc actually says that a client should be able to handle multiple
> > > > messages with the same unique id if it's a duplicate message. This
> > > > consideration may actually come into play if/when there are shared
> > > > folders, if you can move messages from one folder to a shared folder
> > > > so... just don't actually force unique_id to be unique. :)
> > > >
> > > > I plan on making one function to generate uuid's (for unique_id or
> > > > anywhere else, if someone needs them), and then make all the pop3
> > > > stuff use it everywhere it currently generates a unique_id (in
> > > > multiple functions).
> > > >
> > > > Also, with having an /etc/dbmail/ directory, should that be the
> > > > default location for dbmail.conf? Debian packages already do that,
> > > > and it's cleaner if there are multiple conf/etc. files.
> > > >
> > > > Jesse
> > > >
> > > >
> > > > ---- Original Message ----
> > > > From: Jesse Norell <jesse@kci.net>
> > > > To: jesse@kci.net
> > > > Subject: unique_id discussion/problem
> > > > Sent: Wed, 4 Jun 2003 10:52:58 -0600
> > > >
> > > >
> > > >>Hello,
> > > >>
> > > >> We're still having issues with unique_id's not always being
> > > >>unique, so - in the past there have been 2 proposed solutions,
> > > >>the first to just use the message_idnr (which already exists, is
> > > >>guaranteed unique and takes no additional db storage space), the
> > > >>second is to use UUID's (universally unique id's). I'd be glad
> > > >>to work on one or the other, but just wanted to head down the
> > > >>right path.
> > > >>
> > > >> Is there any good reason to not use the message_idnr? It looks
> > > >>like the unique_id is sent right to the pop3 client in response to
> > > >>a UIDL command, so it might be handy to still have that present for
> > > >>migration purposes (if someone wanted to load the unique_id's with
> > > >>the same values as their previos mail server had) - what if it
> > > >>uses unique_id if present, and message_idnr if NULL (or simply save
> > > >>message_idnr into that field, checking for duplicates)?
> > > >>
> > > >> UUID's would work, but aside from seeming to be almost superfluous
> > > >>overkill, there are a couple issues to work out, namely the proper
> > > >>place for non-volitale storage of state info (disk file vs.
> database),
> the draft at
> > >
>>http://lists.research.netsol.com/pipermail/urn-nid/2002-September/000323.h
tml
> > >>as a reference. I suppose if we didn't use real mac addr's for the
> > >>node part, but a randomly generated number, there would be no platform
> > >>compatibility issues (ie. how to lookup the mac addr). The
> > >>implimentation included in the above url says it works for linux and
> > >>windows, but I can't test anything but linux myself.
> > >>
> > >>Comments / etc.?
> > >>
> > >>Jesse
> > >>
> > >>
> > >>--
> > >>Jesse Norell
> > >>jesse (at) kci.net
> > >>
> > >>
> > >
> > > -- End Original Message --
> > >
> > >
> > > --
> > > Jesse Norell
> > > jesse (at) kci.net
> > >
> > >
> > > _______________________________________________
> > > Dbmail-dev mailing list
> > > Dbmail-dev@dbmail.org
> > > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> > >
> >
> >
> >
> >
> > _______________________________________________
> > Dbmail-dev mailing list
> > Dbmail-dev@dbmail.org
> > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> >
> -- End Original Message --
>
>
> --
> Jesse Norell
> jesse (at) kci.net
>
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>

_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://twister.fastxs.net/mailman/listinfo/dbmail-dev

Re: RE: unique_id discussion/problem [ In reply to ]

Magnus.Sundberg at dican

Jun 13, 2003, 1:18 AM

Post #12 of 14 (2737 views)

Permalink

Aaron Stone wrote:
<snip>
> I truly don't understand why the filesystem state and whatnot matters
> here, because unless I'm truly missing something, what we're looking at is
> the need to assume that the two messages inserted into two different
> databases on two different dbmail hosts cannot ever have the same message
> id. Our proposed approach is the prefix each key with some value that is
> unique to the host / instance of dbmail which is doing the inserting and
> therefore cannot be shared by any of the other machines in the cluster.
>
> Aaron
>
>
I have understood from reading the RFC that the UUID consists of
three parts, MAC+random_number+time. Where the random number is
reevaluated each time the delivery process starts.
What is the probability of two colliding UUIDs?
What happens when two UUIDs collide?
Is it worth all trouble doing avoiding duplicate UUIDs?

<snip>

Then
Vladimir Rüntü wrote:
> Just some thouhts from me.
>
> Is it possible to change the MAC address either in the hardware
> or in the packets? If yes then this makes the system
> vulnearable to the hackers.
>
> But this idea to pregenerate the unique keys works to my mind.
> There is only one instance of dbmail-maintenance that should be
> executed now and then. This instance should pregenereate the
> keys.
>
>
> best wishes,
>
> Vlads
>
You can change the MAC on some ethernet boards, i beleive the
command is "ifconfig".
To do this you have to be root. If a hacker has aquired root my
opinion is that you are cooked anyway.

/Magnus

RE: Re: unique_id discussion/problem [ In reply to ]

jesse at kci

Jun 13, 2003, 7:58 AM

Post #13 of 14 (2734 views)

Permalink

Hello,

I keep noticing everyone commenting on "is it worth all this?"
in this thread, and I've asked myself that several times... so
I'm going to guess no, it's not. The current implimentation
works pretty well for pop3 unique_id's - our problems were actually
caused by me ignorantly putting a unique constraint on that field.
There is a small probability that the random numbers generated for
each id will collide, and I'll probably give up on the uuid thing
and just make sure they never do by incorporating the message_idnr.

As far as security, the uuid stuff isn't meant to circumvent
crackers or anything, it's just a means of generating unique numbers
that will be unique on every machine at any time, and future id's
do fall within a very predictable set possibilities (which is the
basis of knowing that they have been/are/will be unique).

Lou - as for clustering, you seem quite a bit more experienced than
others on the list (or those posting, at least), and probably a lot
of the discussion there is meaningless rambling to some (I only catch
about half of what you explain in the algorithm stuff..), but don't
stop! Working the semantics and code out would be a huge boon to the
dbmail community, and I'm sure others will join in as they can. Plus
it'll all be logged for review, etc. in the future. :)

Later,
Jesse

---- Original Message ----
From: Lou Kamenov <dbmail-dev@dbmail.org>
To: dbmail-dev@dbmail.org
Subject: [Dbmail-dev] Re: unique_id discussion/problem
Sent: Thu, 12 Jun 2003 23:55:44 +0100

> Jesse Norell writes:
>
> >
> > Hello,
> >
> > ---- Original Message ----
> > From: Aaron Stone <dbmail-dev@dbmail.org>
> > To: dbmail-dev@dbmail.org
> > Subject: Re: [Dbmail-dev] RE: unique_id discussion/problem
> > Sent: Thu, 12 Jun 2003 07:21:42 -0700 (PDT)
> >
> >> In fact, I would highly recommend that a database is used. I envision a
> >> table that has a row for each server in the cluster and a "uuid prefix" or
> >> something to the like. Synchronization information might also be stored in
> >> this table, such as the IP address of each server as it links up with a
> >> row in the database and the timestamp of when it last attached.
> >>
> >> Naturally this table will have to be replicated, and so it should not have
> >> an auto_increment column, but something else more unique. Hostnames or IP
> >> addresses are an obvious answer, if not a good one ;-)
> >
> > I don't think ip addresses would be unique enough - some cluster
> > implimentations have multiple machines with the same address (eg. via
> > load-balancing hardware switches). Nor hostname (eg. we have multiple
> > machines for mail.kci.net - while they do have unique hostnames also,
> > there's no reason they would necessarily have to). The mac addr
> > seems like the best almost-always-unique identifier that's readily
> > available cross-platform.
>
> Why mac addr the entropy is constantly growing, there is nothing more unique
> than a generated id, and there are tons of generators out there to do so,
> however you can alwayes use specific sequences which are in a way
> predictable where you basically know what exactly you're going to get,
> synchronizing is a different matter which can be done on the fly, let say
> if you have an N machine which joins an A cluster where the machine
> identifies itself and waits for a specific id from which it derivates the
> sequence factor. With these words I assume that you're aware of such things
> like negotiation algorithms and so..
>
> what i personally use: (except postfix and dbmail itself, with pgsql), the
> begining was easy, let pgsql handle the sequence generations where _ANY_ one
> who have access to the database can alter the sequence generation on the
> fly, in other words the app will be able to re-assign new sequence to the
> different servers by simply altering the sequence factor which is contained
> in the sequence table itself.
> Being aware of the algorithm which is used for the generation it can simply
> calculate what would be the factor for the next server, and this can be
> totally automated and in a way unique.
>
> In my eyes a unique email addrs and login ids is a different case, where the
> things get more complicated, but since in my approach I'd prefer to escape
> from the collisions which are thereof produced by some not-finished-mad-mah
> async replication processes. I'd search for more complex and sofisticated
> solution.
>
> However the above also solves the problem if any of those machines have to
> work on its own due to link failure or whatever.. we wont get a bloody
> collisions since each machine is already using a unique factor for this
> generation, the IDs itself doesnt matter the factor is the one that should
> be unique and it would be a huge advantage if it's predictable by any of the
> servers.
>
>
> Guys, if I'm being annoying or I'm not writing on the right topic pretty
> please let me know, i dont want to be boring and stuff, but again if you
> have any agruments against this solution spread them across the list before
> jumping into something like UUIDs, not that i have something against it,
> just email is so atomic that it'd not need such a complicated solution.
>
> For clustering, here how my stuff work:
> Postfix + PgSQL Patch
> DBmail + some dumb connection checks. + PgSQL
> PgSQL itself is used with PgReplicator.
>
> I use pgsql sequences to generate the ids (which was the first approach with
> dbmail) I chose PgSQL because sequences are highly granulated and it's easy
> to control them. Each machine in the cluster has it's own ClusterID, aslo
> with a kinda HeartBeat monitor it's aware how many servers are there and
> what are their IDs, basically reading from a conf file where the primary
> source for those settings is a table inside PgSQL, this file is just a
> redundant option if somehow PgSQL on this machine fails to respond.
> Basically both, database and files are updated at the same time.
>
> I use them in the following order:
> mx1: RR(A records) dmx1 and dmx2
> mx2: RR(A records) dmx2 and dmx3
> so sos
>
> also when i install a dbmail system on a new cluster server, it generates
> the PgSQL scheme on the fly, being aware what is the cluster ID which was
> negotiated using the HB monitor, also a huge role is played by the
> PgReplicator which gives me the ability _NOT_ to replicate sequence and
> other tables like postfix aliases (which in fact are totally useless, but
> somehow have to tell postfix to shut up with the annoying msg), for the case
> this setup is not a free mailserver but a dedicated corporate use, so I dont
> have users coming around and registering.
>
> For now I havent seen much problems, not to say any.
>
> One thing is for sure, after the crash I had with MySQL and I'm so staying
> away from it, as my CTO's says MySqueel :)
>
>
> >> I'm not sure what the "non-volatile" storage is needed for in your
> >> proposal beyond what I see as a unique prefix for each dbmail in the
> >> cluster as it writes to a replicated database server...
> >
> > Saved state info is basically for rollbacks in time (eg. machine
> > reboots) and to make multiple uuids generated w/in the same clock
> > tick be unique (because they're based largely upon time).
>
> here you mean fail-over support which is supposed to be handled by the
> replication process, or in dbmail itself for maximum portability? or I'm
> assuming the wrong?
>
> cheers,
> -lou
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>
-- End Original Message --

--
Jesse Norell
jesse (at) kci.net

Re: Re: unique_id discussion/problem [ In reply to ]

lou at 0xffff

Jun 13, 2003, 8:21 AM

Post #14 of 14 (2737 views)

Permalink

In some email I received from "Jesse Norell" <jesse@kci.net> on Fri, 13 Jun 2003 08:58:38
-0600 (MDT), wrote:

>
> Hello,
>
> I keep noticing everyone commenting on "is it worth all this?"
> in this thread, and I've asked myself that several times... so
> I'm going to guess no, it's not. The current implimentation
> works pretty well for pop3 unique_id's - our problems were actually
> caused by me ignorantly putting a unique constraint on that field.
> There is a small probability that the random numbers generated for
> each id will collide, and I'll probably give up on the uuid thing
> and just make sure they never do by incorporating the message_idnr.

Ah okay, I read the draft you emailed on the list, but I didnt get much out of it,
may be because I dont see such a big problem which would be solved having UUIDs,
uniqueness is something that we cant go without, at least not now, we dont have a smart
system which will be able to distinguish who is who and what is what, may be some fuzzy
logic and pattern matching is not enuf.

Collisions are not good, in general.

> As far as security, the uuid stuff isn't meant to circumvent
> crackers or anything, it's just a means of generating unique numbers
> that will be unique on every machine at any time, and future id's
> do fall within a very predictable set possibilities (which is the
> basis of knowing that they have been/are/will be unique).
>
> Lou - as for clustering, you seem quite a bit more experienced than
> others on the list (or those posting, at least), and probably a lot
> of the discussion there is meaningless rambling to some (I only catch
> about half of what you explain in the algorithm stuff..), but don't
> stop! Working the semantics and code out would be a huge boon to the
> dbmail community, and I'm sure others will join in as they can. Plus
> it'll all be logged for review, etc. in the future. :)

Clustering is the least, it was just an example how I managed to workout this
unique ids without having them actually being unique or in other words not uniquely
defined in the database scheme, there's one very and very important ID which should be
unique and that's the login name and the alias itself. I'm pushing my head around to find
how we can have two of the same addresses and login names being the same and the same time
easily to distinguish, since mail is too simple to be pushed in such an environment.

basically my call is for few things:

1)Portability
2)Redundancy
3)Transparent Distribution
4) Real-time customization

Anyway that's further the road of a much smarter application which will do this and
would be able to have 1000mil john.doe@email.addr :))

Actually if you're interested in distributed programming/computing take a look at Plan 9,
http://plan9.bell-labs.com/plan9dist/ it has some damn good concepts :)

just my two bits

cheers,
-lou

> ---- Original Message ----
> From: Lou Kamenov <dbmail-dev@dbmail.org>
> To: dbmail-dev@dbmail.org
> Subject: [Dbmail-dev] Re: unique_id discussion/problem
> Sent: Thu, 12 Jun 2003 23:55:44 +0100
>
> > Jesse Norell writes:
> >
> > >
> > > Hello,
> > >
> > > ---- Original Message ----
> > > From: Aaron Stone <dbmail-dev@dbmail.org>
> > > To: dbmail-dev@dbmail.org
> > > Subject: Re: [Dbmail-dev] RE: unique_id discussion/problem
> > > Sent: Thu, 12 Jun 2003 07:21:42 -0700 (PDT)
> > >
> > >> In fact, I would highly recommend that a database is used. I envision a
> > >> table that has a row for each server in the cluster and a "uuid prefix" or
> > >> something to the like. Synchronization information might also be stored in
> > >> this table, such as the IP address of each server as it links up with a
> > >> row in the database and the timestamp of when it last attached.
> > >>
> > >> Naturally this table will have to be replicated, and so it should not have
> > >> an auto_increment column, but something else more unique. Hostnames or IP
> > >> addresses are an obvious answer, if not a good one ;-)
> > >
> > > I don't think ip addresses would be unique enough - some cluster
> > > implimentations have multiple machines with the same address (eg. via
> > > load-balancing hardware switches). Nor hostname (eg. we have multiple
> > > machines for mail.kci.net - while they do have unique hostnames also,
> > > there's no reason they would necessarily have to). The mac addr
> > > seems like the best almost-always-unique identifier that's readily
> > > available cross-platform.
> >
> > Why mac addr the entropy is constantly growing, there is nothing more unique
> > than a generated id, and there are tons of generators out there to do so,
> > however you can alwayes use specific sequences which are in a way
> > predictable where you basically know what exactly you're going to get,
> > synchronizing is a different matter which can be done on the fly, let say
> > if you have an N machine which joins an A cluster where the machine
> > identifies itself and waits for a specific id from which it derivates the
> > sequence factor. With these words I assume that you're aware of such things
> > like negotiation algorithms and so..
> >
> > what i personally use: (except postfix and dbmail itself, with pgsql), the
> > begining was easy, let pgsql handle the sequence generations where _ANY_ one
> > who have access to the database can alter the sequence generation on the
> > fly, in other words the app will be able to re-assign new sequence to the
> > different servers by simply altering the sequence factor which is contained
> > in the sequence table itself.
> > Being aware of the algorithm which is used for the generation it can simply
> > calculate what would be the factor for the next server, and this can be
> > totally automated and in a way unique.
> >
> > In my eyes a unique email addrs and login ids is a different case, where the
> > things get more complicated, but since in my approach I'd prefer to escape
> > from the collisions which are thereof produced by some not-finished-mad-mah
> > async replication processes. I'd search for more complex and sofisticated
> > solution.
> >
> > However the above also solves the problem if any of those machines have to
> > work on its own due to link failure or whatever.. we wont get a bloody
> > collisions since each machine is already using a unique factor for this
> > generation, the IDs itself doesnt matter the factor is the one that should
> > be unique and it would be a huge advantage if it's predictable by any of the
> > servers.
> >
> >
> > Guys, if I'm being annoying or I'm not writing on the right topic pretty
> > please let me know, i dont want to be boring and stuff, but again if you
> > have any agruments against this solution spread them across the list before
> > jumping into something like UUIDs, not that i have something against it,
> > just email is so atomic that it'd not need such a complicated solution.
> >
> > For clustering, here how my stuff work:
> > Postfix + PgSQL Patch
> > DBmail + some dumb connection checks. + PgSQL
> > PgSQL itself is used with PgReplicator.
> >
> > I use pgsql sequences to generate the ids (which was the first approach with
> > dbmail) I chose PgSQL because sequences are highly granulated and it's easy
> > to control them. Each machine in the cluster has it's own ClusterID, aslo
> > with a kinda HeartBeat monitor it's aware how many servers are there and
> > what are their IDs, basically reading from a conf file where the primary
> > source for those settings is a table inside PgSQL, this file is just a
> > redundant option if somehow PgSQL on this machine fails to respond.
> > Basically both, database and files are updated at the same time.
> >
> > I use them in the following order:
> > mx1: RR(A records) dmx1 and dmx2
> > mx2: RR(A records) dmx2 and dmx3
> > so sos
> >
> > also when i install a dbmail system on a new cluster server, it generates
> > the PgSQL scheme on the fly, being aware what is the cluster ID which was
> > negotiated using the HB monitor, also a huge role is played by the
> > PgReplicator which gives me the ability _NOT_ to replicate sequence and
> > other tables like postfix aliases (which in fact are totally useless, but
> > somehow have to tell postfix to shut up with the annoying msg), for the case
> > this setup is not a free mailserver but a dedicated corporate use, so I dont
> > have users coming around and registering.
> >
> > For now I havent seen much problems, not to say any.
> >
> > One thing is for sure, after the crash I had with MySQL and I'm so staying
> > away from it, as my CTO's says MySqueel :)
> >
> >
> > >> I'm not sure what the "non-volatile" storage is needed for in your
> > >> proposal beyond what I see as a unique prefix for each dbmail in the
> > >> cluster as it writes to a replicated database server...
> > >
> > > Saved state info is basically for rollbacks in time (eg. machine
> > > reboots) and to make multiple uuids generated w/in the same clock
> > > tick be unique (because they're based largely upon time).
> >
> > here you mean fail-over support which is supposed to be handled by the
> > replication process, or in dbmail itself for maximum portability? or I'm
> > assuming the wrong?
> >
> > cheers,
> > -lou
> >
> > _______________________________________________
> > Dbmail-dev mailing list
> > Dbmail-dev@dbmail.org
> > http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> >
> -- End Original Message --
>
>
> --
> Jesse Norell
> jesse (at) kci.net
>
>
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
>

--
Lou Kamenov / Network Infrastructure/Security Analyst
AEYE R&D - http://www.aeye.net lou.k@hq.aeye.net
AEYE Technologies - http://www.aeyetech.co.uk lou.k@aeyetech.co.uk
phone: +44 (0) 20 8879 9832 fax: +44 (0) 7092 129079
mobile: +44 (0) 79 3945 3026 PGP Key ID - 0xA297084A

AEYE(=AI) stands for Artificial Intelligence.