This excessive inter-AZ data transfer pricing is distorting engineering best practices. It _should_ be cheap to operate HA systems across 2-3 AZs, but because of this price distortion on inter-AZ traffic charges, we lean towards designs that either silo data within an AZ, or that leverage S3 or other hosted solutions as a sort of accounting workaround (i.e. there are no data transfer charges to read/write an S3 bucket from any AZ in the same region).
While AWS egress pricing gets a lot of attention, I think that the high cost of inter-AZ traffic is much less defensible. This is transfer on short fat pipes completely owned by Amazon. And at $0.01/GB, that's 2~10X what smaller providers charge for _internet_ egress.
However I do work for a company with >1 million servers. Scaling inter datacentre bandwidth is quite hard. Sure the datacentres might be geographically close, but laying network cables over distance is expensive. Moreover unless you spend uber millions, you're never going to get as much bandwidth as you have inside the datacentre.
So you either apply hard limits per account, or price it so that people think twice about using it.
In Ashburn, VA, I can buy Dark Fiber for $750 MRC to any datacenter in the same city. I can buy Dark Fiber for $3-5K MRC to any random building in the same city.
That Duplex Dark Fiber with DWDM can run 4TBPS of capacity at 100GE (40x 100GE). Each 100GE transceiver costs $2-4K NRC dependent on manufacturer - $160K NRC for 40x. (There are higher densities as well, like 200/400/800GE, 100GE is just getting cheap.)
In AWS, utilizing 1x100GE will cost you >$1MM MRC. For significantly less than that, let's say absolutely worst-case 5K MRC + 200K NRC, you can get 40x100GE.
Now you have extra money for 4x redundancy, fancy routers, over-spec'd servers, world-class talent, and maybe a yacht if your heart desires.
I’m just throwing out a hypothetical, so I may be completely off base: perhaps aws charges high inter-AZ bandwidth prices to keep users from tunneling traffic between availabilities zones to arbitrage lower Internet/egress costs at AZ 1 vs AZ 3.
Outside of my statement above, I do agree that the cost Amazon pays for bandwidth between their sites, has to be practically nothing at their scale/size (and thus they should charge their customers very little for it, especially considering easy-multi AZ is a big differentiator for cloud vs self-hosting / colo). The user above’s dark fiber MRC prices are spot on.
OK but $10/TB has gotta be like >99% profit margin for AWS. After massively jacking up their prices, Hetzner internet egress is only €1/TB. Also AWS encourages / in some cases practically forces you to do multi-AZ.
I remember switching to autoscaling spot instances to save a few bucks, then occasionally spot spinup would fail due to lack of availability within an AZ so I enabled multi-AZ spot. Then got hit with the inter-AZ bandwidth charges and wasn't actually saving any money vs single-AZ reserved. This was about the point I decided DIY Kubernetes was simpler to reason about.
Apples and Oranges. Hetzner doesn't even have multiple AZ's by AWS's definition - all Hetzners DCs eg Falkenstein 1-14 would be the same AZ zone.
AWS network is designed with a lot more internal capacity and reliability than Hetzner which costs a lot more - multiple uplinks to independent switches, etc.
AWS is also buying current gen network gear which is much more pricey - Hetzner is mostly doing 1 Gig ports or 10 gig at a push which means they can get away with >10 year old switches (if you think they buy new switches I have a bridge you might be interested in buying).
I agree with this post that Hetzner is a bad example. They are focused on a budget deployment.
I do not agree that a state-of-the-art high capacity deployment is as expensive as you think it is. If an organization pays MSRP on everything, has awful procurement with nonexistent negotiation, and multiple project failures, sure, maybe. In the real world though, we're not all working for the federal government (-:
While your caveats are all noteworthy, I'll add that Hetzner also offers unlimited/free bandwidth between their datacenters in Germany and Finland. That's sort of like AWS offering free data transfer between us-east-1 and us-east-2.
I assume it’s to discourage people from architecting designs that abuse the network. A good example would be collecting every single metric, every second, across every instance for no real business reason.
Price discrimination is when you charge different amounts for the same thing to different customers. And usually the difference in those prices are not made apparent. Like when travel websites quote iOS users more than Android because they generally can afford to pay more.
It's a bit of a mix, but price discrimination isn't far off. It's like the SSO tax; all organizations are paying for effectively the same service, but the provider has found a minor way to cripple the service that selectively targets people who can afford to pay more.
If we want to call this just regular ole pricing, it's not a leap to call most textbook cases of price discrimination "regular ole pricing" as well. An online game charges more if your IP is from a certain geography? That's not discrimination; we've simply priced the product differently if you live in Silicon Valley; don't buy it it you don't want it.
Price discrimination has a clear definition. It’s not illegal in the US (when consumers are the victims anyway) but it has a clear meaning and you’re blurring the lines for I’m not sure what reason. Your example of a video game doing regional pricing is a perfect example of textbook price discrimination.
> Price discrimination ("differential pricing",[1][2] "equity pricing", "preferential pricing",[3] "dual pricing",[4] "tiered pricing",[5] and "surveillance pricing"[6]) is a microeconomic pricing strategy where identical or largely similar goods or services are sold at different prices by the same provider to different buyers based on which market segment they are perceived to be part of.
That sounds exactly like what is happening here.
Intra-zone and inter-zone network traffic are two very similar services. One is free and one costs 1¢ per GB. And customers who need inter-az traffic are probably in a different market segment. Now, it is more expensive for AWS to build the infrastructure for inter-zone networking, so it isn't exclusively price discrimination, but assuming that getting more money from wealthier clients was a motivation, it seems to match the definition to me.
Re: excludability, yes it is excludable since there is a price, but that doesn't have much to do with how the price is much higher than the cost to AWS for providing the service.
Alright, let's take a look at that first link as if it's gospel. AWS charging excessively for inter-AZ networking is:
1. a microeconomic pricing strategy
2. where largely similar goods (AWS with or without substantial inter-AZ bandwidth)
3. are sold at different prices (excessive inter-AZ networking fees) to different buyers
4. based on perceived market segments (most customers don't need (or don't know they need till they're locked in) much inter-AZ bandwidth, but larger, richer corporations likely do)
I'm not trying to blur the lines. On top of any juggling of our favorite sources of definitions, that particular pricing strategy has all the qualitative hallmarks of price discrimination. Everyone still buys AWS, most customers are unaffected by the lack of bulk inter-AZ bandwidth, and AWS can successfully charge much more to those who can afford to pay.
Sending traffic between AZs doesn’t necessarily improve availability and can decrease it. Each of your services can be multi-az, but have hosts that talk just to other endpoints in their AZ.
Unless your app is completely stateless, you will need some level of communication across AZs.
And you often want cross-zone routing on your load balancers so that if you lose all the instances in one AZ traffic will still get routed to healthy instances.
It very much is, because scaling bandwidth between phyical datacenters which are not located next to each other is very expensive. So pricing it means that people don't use it as much as if it was free.
> It _should_ be cheap to operate HA systems across 2-3 AZs,
In the steady state. HA systems tend towards large data bursts when failures or upgrades occur.
> And at $0.01/GB, that's 2~10X what smaller providers charge for _internet_ egress.
It's a lower latency network with a high SLA and automatic credits if the SLA isn't maintained. I think the inter-AZ option provides a level of service that's much higher than what most people want or need.
It might be nice if there was a "best effort" inter-AZ network. This would probably fit better with the synchronization methods built into most HA software anyways.
So, to me, it's a good product, it's just designed for a very niche segment of the market and often mistaken for something more general than it actually is.
I've used VictoriaMetrics in past (~4 years ago) for collection of not just service monitoring data but also for network switch and cell tower module metrics. At the time I found it to be the most efficient Prometheus-like service in terms of query speed, data compression and, more importantly, being able to handle high cardinality (over 10s or 100s of millions of series).
However, I later switched to Clickhouse because I needed extra flexibility of running occasional async updates or deletes. In VictoriaMetrics you usually need to wipe out the entire series and re-ingest it. That may not be possible or would be quite annoying if you are dealing with a long history and you just wanted to update/delete some bad data in a month.
So, if you want a more efficient Prometheus drop-in replacement and don't think limited update/delete ability is an issue then I highly recommend VictoriaMetrics. Otherwise, Clickhouse (larger scale) or Timescale (smaller scale) has been my go to for anything time series.
When I see a license on a project I expect that project will provide the code under that license and function fully at runtime, not play games of "Speak to a sales rep to flip that bit or three to enable that codepath".
I find it frustrating it is not immediately clear it is open core (in which case we shall never touch it as per our lawyers). So hopefully people will keep commenting on that.
I'd love to see a comparison with Mimir. Some of the problems that this article describes with Prometheus are also solved by Mimir. I'm running it in single binary mode, and everything is stored in S3. I'm deploying Prometheus in agent mode so it just scrapes and remote writes to Mimir, but doesn't store anything. The helm chart is a bit hairy because I have to use a fork for single binary mode, but it has actually been extremely stable and cheap to run. The same AZ cost saving rules apply, but my traffic is low enough right now for it not to matter. But I suppose I could also run ingesters per AZ to eliminate cross-AZ traffic.
I was on a team once where we ran agent-mode Prometheus into a Mimir cluster and it was endless pain and suffering.
Parts of it would time out and blow up, one of the dozen components (slight hyperbole) they have you run would go down and half the cluster would go with it. It often had to be nursed back to health by hand, it was expensive to run, queries ate not even that fast.
Absolutely would not repeat the experience. We cheered the afternoon we landed the PR to dump it.
I definitely think that running the microservices deployment of Mimir (and Loki) looks hairy. But the monolithic deployments can handle pretty large volumes.
Interesting. I'm fairly new to the field, but would this configuration help reduce the cost of logging security events from multiple zones/regions/providers to a colocated cluster?
Not really. On AWS, you're always going to pay an egress cost to get those logs out of AWS to your colo. If you were to ship your security logs to S3 and host your security log indexing and search services on EC2 within the same AWS region as the S3 bucket, you wouldn't have to worry about egress.
> while it’s tempting to use the infinitely-scalable object storage (like S3), the good old block storage is just cheaper and more performant
How is it cheaper? Object storage is cheaper per GB. Does using s3 have another component that is more expensive, maybe a caching layer? Is the storage format significantly less efficient? Are you not using a vpc enpoint to avoid egress charges?
You are correct that storage is cheaper in S3, but S3 charges per request to GET, LIST, POST, COPY, etc objects in your bucket. Block storage can be cheaper when you are frequently modifying or querying your data.
It is, but it's not _that_ many. AWS pricing is complicated, but for fairly standard services and assuming bulk discounts at ~100TB level, your break-even points for requests/network vs storage happens at:
1. (modifications) 4200 requests per GB stored per month
2. (bandwidth) Updating each byte more than once every 70 days
You'll hit the break-even sooner, typically, since you incur both bandwidth and request charges.
That might sound like a lot, but updating some byte in each 250KB chunk of your data once a month isn't that hard to imagine. Say each user has 1KB of data, 1% are active each month, and you record login data. You'll have 2.5x the break-even request count and pay 2.5x more for requests than storage, and that's only considering the mutations, not the accesses.
You can reduce request costs (not bandwidth though) if you can batch them, but that's not even slightly tenable till a certain scale because of latency, and even when it is you might find that user satisfaction and retention are more expensive than the extra requests you're trying to avoid. Batching is a tool to reduce costs for offline workloads.
Ok, there are definitely cases where it would be more expensive, like using it for user login data.
But for metrics, like you would use for prometheus:
- Data is usually write-only. There isn't usually any reason to modify the metrics after you have recorded them.
- The bulk of your data isn't going to be used very often. It will probably be processed by monitors/alerts and maybe the most recent data will be shown in dashboard (and that data could be cached on disk or in memory). But most if it is just going to sit there until you need to look at it for an ad-hoc query, and you should probably have an index to reduce how much data you need to read for those.
- This metrics data is very amenable to batching. You do probably want to make recent data available from memory or disk for alerts, dashboards, queries, etc. But for longer term storage it is very reasonable to use chunks of at least several megabytes. If your metrics volume is low enough that you have to use tiny objects, then you probably aren't storing enough to be worried about the cost anyway.
I like the new generation of metric database, but are there systems that allow for true distributed deployments? E.g. where some machines might be offline for a few days, and need to sync (send some/receive some) metrics with other machines when back online?
While AWS egress pricing gets a lot of attention, I think that the high cost of inter-AZ traffic is much less defensible. This is transfer on short fat pipes completely owned by Amazon. And at $0.01/GB, that's 2~10X what smaller providers charge for _internet_ egress.