Mailing List Archive

AWS S3 DNS load balancer
They seem to do something a little unusual where every DNS request provides a different IP out of a small pool with those IPs not changing very frequently. (I’m talking specifically about S3 not Route5x or whatever the DNS product is).

Basically like round robin, but instead of providing all of the IPs they are only offering one. This eliminates options for the client DNS resolvers, but may make some things more deterministic.

Is this a “normal” or expected solution or just some local hackery?

Thanks in advance,

DJ
Re: AWS S3 DNS load balancer [ In reply to ]
On Tue, 2021-06-15 at 11:37 +0000, Deepak Jain wrote:
> (I’m talking specifically about S3 not Route5x or whatever the DNS
> product is).

Route53.

Not sure what you mean by "S3 DNS". I wasn't aware S3 had any DNS
functionality at all... on the other hand, there is much indeed that I
do not know.

Regards, K.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karl Auer (kauer@biplane.com.au)
http://www.biplane.com.au/kauer
Re: AWS S3 DNS load balancer [ In reply to ]
The IP addresses for S3 do not change very often, and are region specific (as you would expect).

You are correct that this can cause problems for clients that never re-resolve (eg Java networkaddress.cache.ttl=-1)

You may be interested in the (periodically updated) list of AWS IP ranges by using their IP ranges JSON API. Refer to:
* https://ip-ranges.amazonaws.com/ip-ranges.json
* https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html

To get all S3 IP ranges currently in use:
"""
curl -sf 'https://ip-ranges.amazonaws.com/ip-ranges.json' \
| jq '.prefixes | map(select(.service == "S3"))'
"""

To get all S3 IP ranges in your region:
"""
curl -sf 'https://ip-ranges.amazonaws.com/ip-ranges.json' \
| jq '.prefixes | map(select(.service == "S3" and .region == "eu-central-1"))'
"""

These ranges are not (to my knowledge) queryable via DNS.

In terms of this as a general behaviour, it is not uncommon. If I remember correctly this is how Route53 weighted records are implemented. So at least anyone using that feature of Route53 would be doing the same.

Met vriendelijke groeten,

Toby Lorne

??????? Original Message ???????

On Tuesday, June 15th, 2021 at 13:37, Deepak Jain <deepak@ai.net> wrote:

> They seem to do something a little unusual where every DNS request provides a different IP out of a small pool with those IPs not changing very frequently. (I’m talking specifically about S3 not Route5x or whatever the DNS product is).
>
> Basically like round robin, but instead of providing all of the IPs they are only offering one. This eliminates options for the client DNS resolvers, but may make some things more deterministic.
>
> Is this a “normal” or expected solution or just some local hackery?
>
> Thanks in advance,
>
> DJ
Re: AWS S3 DNS load balancer [ In reply to ]
On Tue, Jun 15, 2021 at 8:07 AM Karl Auer <kauer@biplane.com.au> wrote:

> On Tue, 2021-06-15 at 11:37 +0000, Deepak Jain wrote:
> > (I’m talking specifically about S3 not Route5x or whatever the DNS
> > product is).
>
> Route53.
>
> Not sure what you mean by "S3 DNS". I wasn't aware S3 had any DNS
> functionality at all... on the other hand, there is much indeed that I
> do not know.
>
>
Maybe Deepak means:
"When I ask for an S3 endpoint I get 1 answer, which is 1 of a set of N.
Why would
the 'loadbalancer' send me all N?"

(I don't know a aws s3 url to test this out with, an example from Deepak
would be handy)


> Regards, K.
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Karl Auer (kauer@biplane.com.au)
> http://www.biplane.com.au/kauer
>
>
>
>
>
Re: AWS S3 DNS load balancer [ In reply to ]
On Tue, 2021-06-15 at 10:33 -0400, Christopher Morrow wrote:
> Maybe Deepak means:
> "When I ask for an S3 endpoint I get 1 answer, which is 1 of a set
> of N.
> Why would
> the 'loadbalancer' send me all N?"

I've just taken a squiz at an S3-based website we have, and via the S3
URL it is a CNAME with a 60-secod TTL pointing at a set of A records
with 5-second TTLs.

Any one dig returns the CNAME and a single IP address:

dig our-domain.s3-website-ap-southeast-2.amazonaws.com.
our-domain.s3-website-ap-southeast-2.amazonaws.com. 14 IN CNAME s3-
website-ap-southeast-2.amazonaws.com.
s3-website-ap-southeast-2.amazonaws.com. 5 IN A 52.95.134.145

If the query is multiply repeated, the returned IP address changes,
roughly every five seconds.

What's interesting is the name attached to the A records, which does
not include "our-domain". It seems to be a record pointing to ALL S3
websites in the region. And all of the addresses I saw reverse-resolve
to that one name. So there is definitely some under-the-bonnet magic
discrimination going on.

In Route53 the picture is very different, with the published website
host name (think "our-domain.com.au") resolving to four IP addresses
that are all returned in the response to a single dig query. There is
an A-ALIAS (a non-standard AWS record type) that points to a CloudFront
distribution that has the relevant S3 bucket as its origin.

Using the CNAME bypasses the CloudFront distribution unless steps are
taken to forbid direct access to the bucket. It would be usual to use
(and enforce) access via CloudFront, if for no other reason than to
provide for HTTPS access.

Regards, K.


--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karl Auer (kauer@biplane.com.au)
http://www.biplane.com.au/kauer
RE: AWS S3 DNS load balancer [ In reply to ]
Maybe Deepak means:
"When I ask for an S3 endpoint I get 1 answer, which is 1 of a set of N. Why would
the 'loadbalancer' send me all N?"

(I don't know a aws s3 url to test this out with, an example from Deepak would be handy)

Regards, K.

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Karl Auer (kauer@biplane.com.au<mailto:kauer@biplane.com.au>)
http://www.biplane.com.au/kauer



First, thanks for translating “Deepak” for everyone.
Second, I was in the back of a car, so I didn’t have a convenient dig prompt. I considered it, but went for it anyway. I’ll blame the time of day and a lack of caffeine.
You’ll see from the time stamps that these were done in rapid succession at a command prompt. Even though I used 8.8.8.8, I can replicate the results with a single recursive server. I just wanted something easy for anyone to replicate.
[deleted the dig information, for giggles run:
dig @8.8.8.8 s3.amazonaws.com a few times in rapid succession.
The TLDR is that I got this set of IPs. With more runs, I might get more. There is an obvious operational impact here, say AWS is doing Geo-based load balancing and spitting things out, and networks with eyeballs are doing their own things for traffic management and trying to do shortest paths to things – and responsible operators want to minimize the non-desirable and non-deterministic behaviors.
s3.amazonaws.com. 3 IN A 52.216.105.101
s3.amazonaws.com. 1 IN A 52.216.171.13
s3.amazonaws.com. 2 IN A 52.216.236.45
s3.amazonaws.com. 2 IN A 52.216.105.101
s3.amazonaws.com. 2 IN A 52.216.138.197
s3.amazonaws.com. 2 IN A 52.217.107.14
s3.amazonaws.com. 3 IN A 52.216.206.53
s3.amazonaws.com. 2 IN A 52.217.129.32
s3.amazonaws.com. 1 IN A 52.216.236.45
s3.amazonaws.com. 3 IN A 52.216.243.22
The question is how are they spitting out 1 IP from their pool programmatically? There are a lot of reasons why someone may want this… particularly to manage *other* people geo-basing their transport, but is this a local hack or is this a feature of one of the major auth-DNS packages. If its local hackery, trying to manage for it becomes a thankless activity. If there is a standard or published method, then the feedback loop stuff can be curtailed.
Thanks again!
Deepak
Re: AWS S3 DNS load balancer [ In reply to ]
Hello,

On Tue, 15 Jun 2021 at 13:37, Deepak Jain <deepak@ai.net> wrote:
> Is this a “normal” or expected solution or just some local hackery?

It's absolutely normal and expected for a huge service like this to
keep round robin at the DNS server side. YMMV with client side DNS
based round robin (Amazon needs to be in control, not your client
application) and steering traffic from one edge location or host to
another is perfectly legitimate. Also likely as a service provider of
such a huge service you want to keep breaking connections from
applications with clearly hardcoded (or "resolve at startup only") IP
addresses, so that client applications never use this approach (in the
long term at least). After all, as a service provider you want to
avoid hitting the news cycle for a legitimate DNS change, just because
you are not doing it very often and that change triggered a myriad of
outages because of broken customer applications at the same time. So
they just do it often or all the time.

Amazon needs to stay in control of what edge nodes and locations the
clients are hitting, just like CDN's and other endpoints with major
traffic volumes.


None of this is local hackery, it's just basic DNS.


Lukas



Lukas
Re: AWS S3 DNS load balancer [ In reply to ]
Hello,


> AWS is doing Geo-based load balancing and spitting things out,
> and networks with eyeballs are doing their own things for traffic
> management and trying to do shortest paths to things – and responsible
> operators want to minimize the non-desirable and non-deterministic
> behaviors.

You can't use DNS to get "all" service IP's of a service like S3 or a
CDN for traffic engineering purposes. That will not work, ever (for
services of such scale).

The hackery is assuming you can build a list of service IP's by querying DNS.


> There are a lot of reasons why someone may want this… particularly
> to manage *other* people geo-basing their transport, but is this a
> local hack or is this a feature of one of the major auth-DNS packages.
> If its local hackery, trying to manage for it becomes a thankless activity.

CDN's and huge service work like this, and they use the standardized
tools like DNS they have at their disposal.

Building lists of service IP's from DNS is what the "local-hackery" here is.


Toby explained the proper way to get the IP ranges. It's not via DNS,
it never was.


Lukas
Re: AWS S3 DNS load balancer [ In reply to ]
Hi Deepak.

Amazon documents the IPs for their public and private cloud services:
https://docs.aws.amazon.com/general/latest/gr/aws-ip-ranges.html

(I know this because Batfish uses these in its reachability analysis, for
example, "Make sure all outgoing flows to S3 are permitted by the
firewall".)

Thanks,
Dan

On Tue, Jun 15, 2021 at 5:07 AM Karl Auer <kauer@biplane.com.au> wrote:

> On Tue, 2021-06-15 at 11:37 +0000, Deepak Jain wrote:
> > (I’m talking specifically about S3 not Route5x or whatever the DNS
> > product is).
>
> Route53.
>
> Not sure what you mean by "S3 DNS". I wasn't aware S3 had any DNS
> functionality at all... on the other hand, there is much indeed that I
> do not know.
>
> Regards, K.
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Karl Auer (kauer@biplane.com.au)
> http://www.biplane.com.au/kauer
>
>
>
>
>
RE: AWS S3 DNS load balancer [ In reply to ]
I've just taken a squiz at an S3-based website we have, and via the S3 URL it is a CNAME with a 60-secod TTL pointing at a set of A records with 5-second TTLs.

Any one dig returns the CNAME and a single IP address:

dig our-domain.s3-website-ap-southeast-2.amazonaws.com.
our-domain.s3-website-ap-southeast-2.amazonaws.com. 14 IN CNAME s3-
website-ap-southeast-2.amazonaws.com.
s3-website-ap-southeast-2.amazonaws.com. 5 IN A 52.95.134.145

If the query is multiply repeated, the returned IP address changes, roughly every five seconds.

What's interesting is the name attached to the A records, which does not include "our-domain". It seems to be a record pointing to ALL S3 websites in the region. And all of the addresses I saw reverse-resolve to that one name. So there is definitely some under-the-bonnet magic discrimination going on.

In Route53 the picture is very different, with the published website host name (think "our-domain.com.au") resolving to four IP addresses that are all returned in the response to a single dig query. There is an A-ALIAS (a non-standard AWS record type) that points to a CloudFront distribution that has the relevant S3 bucket as its origin.

Using the CNAME bypasses the CloudFront distribution unless steps are taken to forbid direct access to the bucket. It would be usual to use (and enforce) access via CloudFront, if for no other reason than to provide for HTTPS access.

---

So, depending on what query you make... you get very different answers. For example. If you try s3.amazon.com you get a CNAME to a rewrite.amazon.com which seems reasonable for any subdomain request that they would have a better response for.

I don't remember, and they may be moving to deterministic subdomains as you've shown above, and only "legacy" uses go to s3.amazonaws.com. I remember hearing a big uproar about it. Perhaps an AWS person will chime in with some color on this.

So deterministic subdomain to a group of relatively deterministic endpoints, even round-robin, makes sense to me as in... "usual in the practice of the art." Even if those systems end up being load balancers for other systems behind them.

The s3.amazonaws.com is different than that. I'm guessing that no one (else) uses this sort of single IP from a pool trick and therefore it's not standard. Further, given that AWS appears to be moving *back* to the traditional way of doing things, there must be undesirable limitations to this model.

[just spitballing here]

Deepak
RE: AWS S3 DNS load balancer [ In reply to ]
You can't use DNS to get "all" service IP's of a service like S3 or a CDN for traffic engineering purposes. That will not work, ever (for services of such scale).

The hackery is assuming you can build a list of service IP's by querying DNS.

> There are a lot of reasons why someone may want this… particularly to
> manage *other* people geo-basing their transport, but is this a local
> hack or is this a feature of one of the major auth-DNS packages.
> If its local hackery, trying to manage for it becomes a thankless activity.

CDN's and huge service work like this, and they use the standardized tools like DNS they have at their disposal.

Building lists of service IP's from DNS is what the "local-hackery" here is.


Toby explained the proper way to get the IP ranges. It's not via DNS, it never was.

----------------------

I'm not sure where you got the idea that I wanted a list of all of their IPs. Sorry for any confusion and any offense at using the word "hackery" in a way you deemed inappropriate.

Deepak
Re: AWS S3 DNS load balancer [ In reply to ]
On Tue, Jun 15, 2021 at 10:33 AM Christopher Morrow <morrowc.lists@gmail.com>
wrote:

>
> On Tue, Jun 15, 2021 at 8:07 AM Karl Auer <kauer@biplane.com.au> wrote:
>
>> On Tue, 2021-06-15 at 11:37 +0000, Deepak Jain wrote:
>> > (I’m talking specifically about S3 not Route5x or whatever the DNS
>> > product is).
>>
>> Route53.
>>
>> Not sure what you mean by "S3 DNS". I wasn't aware S3 had any DNS
>> functionality at all... on the other hand, there is much indeed that I
>> do not know.
>>
>>
> Maybe Deepak means:
> "When I ask for an S3 endpoint I get 1 answer, which is 1 of a set of N.
> Why would
> the 'loadbalancer' send me all N?"
>
> (I don't know a aws s3 url to test this out with, an example from Deepak
> would be handy)
>
>

also, just for grins:
$ while /bin/true; do dig +short s3.amazonaws.com @ns-63.awsdns-07.com.>>
/tmp/aws; sleep 1; done

after a time:
$ wc -l /tmp/aws
17787 /tmp/aws

and:
$ sort -n /tmp/aws | uniq -c | sort -rn | wc -l
6457

Some of the results appear ~11 times? most likely only 1x.


> Regards, K.
>>
>> --
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> Karl Auer (kauer@biplane.com.au)
>> http://www.biplane.com.au/kauer
>>
>>
>>
>>
>>
Re: AWS S3 DNS load balancer [ In reply to ]
AWS or S3 is not the only service where you will see a single IP returned
for a DNS query, www.microsoft.com and www.apple.com (via Akamai) do the
same - see further below.

When you look up <bucketname>.s3.amazonaws.com you get back an answer that
directs you to the correct region where the S3 bucket is located. For
example test.s3.amazonaws.com points to s3-w.us-east-1.amazonaws.com
because the bucket exists in the us-east-1 region.

The endpoints are listed at
https://docs.aws.amazon.com/general/latest/gr/s3.html and the DNS format is
described at
https://docs.aws.amazon.com/AmazonS3/latest/userguide/WebsiteEndpoints.html
.


$ dig www.microsoft.com
...
;; ANSWER SECTION:
www.microsoft.com. 1281 IN CNAME www.microsoft.com-c-3.edgekey.net.
www.microsoft.com-c-3.edgekey.net. 664 IN CNAME
www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net.
www.microsoft.com-c-3.edgekey.net.globalredir.akadns.net. 664 IN CNAME
e13678.dscb.akamaiedge.net.
e13678.dscb.akamaiedge.net. 19 IN A 23.40.73.65

$ dig www.apple.com
...
;; ANSWER SECTION:
www.apple.com. 368 IN CNAME www.apple.com.edgekey.net.
www.apple.com.edgekey.net. 8138 IN CNAME
www.apple.com.edgekey.net.globalredir.akadns.net.
www.apple.com.edgekey.net.globalredir.akadns.net. 2168 IN CNAME
e6858.dscx.akamaiedge.net.
e6858.dscx.akamaiedge.net. 14 IN A 23.1.23.66


Disclaimer: I work for AWS.

Regards,
Andras


On Wed, Jun 16, 2021 at 2:40 AM Deepak Jain <deepak@ai.net> wrote:

>
>
>
> I've just taken a squiz at an S3-based website we have, and via the S3 URL
> it is a CNAME with a 60-secod TTL pointing at a set of A records with
> 5-second TTLs.
>
> Any one dig returns the CNAME and a single IP address:
>
> dig our-domain.s3-website-ap-southeast-2.amazonaws.com.
> our-domain.s3-website-ap-southeast-2.amazonaws.com. 14 IN CNAME s3-
> website-ap-southeast-2.amazonaws.com.
> s3-website-ap-southeast-2.amazonaws.com. 5 IN A 52.95.134.145
>
> If the query is multiply repeated, the returned IP address changes,
> roughly every five seconds.
>
> What's interesting is the name attached to the A records, which does not
> include "our-domain". It seems to be a record pointing to ALL S3 websites
> in the region. And all of the addresses I saw reverse-resolve to that one
> name. So there is definitely some under-the-bonnet magic discrimination
> going on.
>
> In Route53 the picture is very different, with the published website host
> name (think "our-domain.com.au") resolving to four IP addresses that are
> all returned in the response to a single dig query. There is an A-ALIAS (a
> non-standard AWS record type) that points to a CloudFront distribution that
> has the relevant S3 bucket as its origin.
>
> Using the CNAME bypasses the CloudFront distribution unless steps are
> taken to forbid direct access to the bucket. It would be usual to use (and
> enforce) access via CloudFront, if for no other reason than to provide for
> HTTPS access.
>
> ---
>
> So, depending on what query you make... you get very different answers.
> For example. If you try s3.amazon.com you get a CNAME to a
> rewrite.amazon.com which seems reasonable for any subdomain request that
> they would have a better response for.
>
> I don't remember, and they may be moving to deterministic subdomains as
> you've shown above, and only "legacy" uses go to s3.amazonaws.com. I
> remember hearing a big uproar about it. Perhaps an AWS person will chime in
> with some color on this.
>
> So deterministic subdomain to a group of relatively deterministic
> endpoints, even round-robin, makes sense to me as in... "usual in the
> practice of the art." Even if those systems end up being load balancers for
> other systems behind them.
>
> The s3.amazonaws.com is different than that. I'm guessing that no one
> (else) uses this sort of single IP from a pool trick and therefore it's not
> standard. Further, given that AWS appears to be moving *back* to the
> traditional way of doing things, there must be undesirable limitations to
> this model.
>
> [just spitballing here]
>
> Deepak
>