Skip to main content

Command Palette

Search for a command to run...

Your AWS Bill Isn’t Wrong - It’s Just Following Defaults

Published
16 min read
Your AWS Bill Isn’t Wrong - It’s Just Following Defaults

There’s a pattern I’ve seen often enough that it no longer feels like a coincidence.

A team builds something sensible. They separate environments properly - Development, Acceptance, Production.
They follow best practices: tagging, scaling, monitoring… the works.

And then, a few months later, someone opens the bill and asks:

“Why is this still costing so much when nothing’s really happening?”

That’s the interesting part.

Because in most cases, nothing is happening.

At least not in the way the system was originally designed.

The workloads are quiet. The deployments are stable.
The traffic is predictable.

But the infrastructure?

That’s still behaving like it’s launch day.


This article looks at three places where AWS defaults quietly keep things running (and billing), even when your workload has moved on:

  • Compute that doesn’t know when to rest

  • Logs that remember more than anyone reads

  • Storage that outlives its purpose

And more importantly - what it looks like when you design with intent instead.


1) Compute That Never Learned When to Stop

Let’s start with compute, because it’s the easiest to reason about (and the easiest to ignore).

The Shape of a Real Environment

In most modern AWS setups we have something like:

Development

  • A few small EC2 instances

  • Created and destroyed somewhat unpredictably

  • Often left running longer than intended

Acceptance

  • Auto Scaling groups

  • Lower baseline than production

  • Scales up during testing, then… doesn’t always scale back down

Production

  • Auto Scaling done properly (usually)

  • But includes non-customer-facing components that don’t need 24/7 uptime

So the problem isn’t just “instances running too long.”

It’s:

Parts of the system behaving like they’re always needed, even when they’re not.

On a side note: one could also do this with RDS, but for the purposes of this blog I will stick to EC2.


Where Scheduling Still Matters (Even with Auto Scaling)

There’s a common assumption:

“We use Auto Scaling, so we’re already optimized.”

That’s only partially true.

Auto Scaling optimises for load.
It does not optimise (by default) for:

  • Time-of-day usage patterns

  • Human working hours

  • Internal systems that nobody uses at night

So you still end up with:

  • Baseline capacity that never drops to zero

  • Dev instances quietly running overnight

  • Acceptance environments idling between test cycles


The Pattern: Scheduled Intent

The solution is not replacing Auto Scaling.

It’s complementing it.

  • Dev EC2 → scheduled stop/start

  • Acceptance ASG → scheduled min capacity = 0 off-hours

  • Prod (select workloads) → scheduled scale-down or shutdown

Using:

  • EventBridge schedules

  • Lambda functions

  • Tag-driven targeting


Pricing Context (eu-west-1)

  • c7g.large: ~$0.078/hour

  • c7g.xlarge: ~$0.155/hour

Monthly saving per instance (~510 hours avoided):

  • Dev-sized: ~$40/month

  • Prod-sized: ~$79/month


What This Looks Like

Small environment

  • Dev: 3 instances

  • Acc: baseline 2

  • Prod: 1 schedulable internal workload

Savings:

  • Dev: $120

  • Acc: $80

  • Prod: $79

~$280/month


Medium environment

  • Dev: ~10 instances

  • Acc: baseline ~6

  • Prod: internal workloads

~$1,100/month


Large environment

  • Dev: dozens of instances

  • Acc: multiple ASGs

  • Prod: internal services at scale

~$3,500–$4,000/month


The Subtle Realisation

Auto Scaling optimises for demand.
Scheduling optimises for the absence of demand.

You need both.


2) CloudWatch Logs: The Cost of Remembering Everything

Logs are comforting.

They make us feel prepared, responsible, and mildly in control.

They also quietly accumulate into something nobody explicitly designed.

CloudWatch Logs defaults to never expire.

Which is great - right up until you realise you’re paying to remember things nobody reads anymore.


The Pattern: Make Retention Explicit

Instead of relying on good intentions:

  • Dev → 7 days

  • Acc → 30 days

  • Prod → 90 days

And enforce it using:

  • AWS Config custom rule

  • SSM auto-remediation (or Lambda function if you need custom logic)

This turns retention into a system, not a habit.


The Important Bit Most People Miss

CloudWatch Logs has two separate cost drivers:

  1. Ingestion (writing logs)

  2. Storage (keeping them)

Retention only affects storage.

Which means:

You don’t reduce logging cost by deleting logs.
You reduce storage cost.

That distinction matters more than it should.


A Note on Compression (Because This Gets Confusing)

CloudWatch Logs stores data in a compressed format, but AWS does not publish a fixed compression ratio.

In their pricing examples, log data is often shown compressing to roughly 15–20% of its original size (roughly 5:1 to 6:1).

Actual compression depends heavily on:

  • log format (JSON vs text)

  • repetition

  • structure

For the purposes of these examples, we’ll assume ~5:1 compression.

Not because it’s exact — but because it’s a reasonable approximation for comparative analysis.


Pricing Assumptions (eu-west-1)

  • Ingestion: $0.57/GB

  • Storage: $0.03/GB-month

  • Free tier: 5 GB

  • Compression ratio of 5:1


Small Environment

Log volume

  • Dev: 5 GB/day

  • Acc: 10 GB/day

  • Prod: 30 GB/day

Total: 45 GB/day → 1,350 GB/month


Ingestion cost (unchanged by retention)

(1,350 - 5) × \(0.57 = \)766.65/month

This is the baseline reality.

Retention won’t fix it.


Storage with retention (7/30/90)

We compress and store:

  • Dev: (5 × 7) ÷ 5 = 7 GB

  • Acc: (10 × 30) ÷ 5 = 60 GB

  • Prod: (30 × 90) ÷ 5 = 540 GB

Total: 607 GB

(607 - 5) × \(0.03 = \)18.06/month


Storage without retention (~1 year)

Total stored:

  • 3,285 GB

(3,285 - 5) × \(0.03 = \)98.40/month


Savings

$98.40 - $18.06 = $80.34/month


Cost of enforcing it

Assume ~40 log groups/month:

  • Config: $0.12

  • Rule eval: $0.04

$0.16/month


Net

~$80/month saved


Medium Environment

  • 225 GB/day → 6,750 GB/month

Storage:

  • With retention: $90.90

  • Without: $492.60

Savings:

~$401/month

Governance cost:

~$0.80

Net:

~$400/month


Large Environment

  • 900 GB/day → 27,000 GB/month

Storage:

  • With retention: $364.05

  • Without: $1,970.85

Savings:

~$1,606/month

Governance:

~$4/month

Net:

~$1,602/month


What the Numbers Are Actually Saying

Two things become clear:

  1. **Retention is still worth it
    **Even though it doesn’t touch ingestion

  2. Ingestion is the real problem at scale

Which leads to the natural next step:


The Better Pattern

Split responsibilities:

  • CloudWatch → recent, searchable logs

  • S3 → long-term storage

This gives you:

  • Fast debugging

  • Cheap retention

Without paying CloudWatch rates forever.


3) Storage That Doesn’t Leave

Storage is different.

It doesn’t spike.
It doesn’t scale dynamically.
It doesn’t politely disappear when the workload that created it has moved on.

It just stays.

That’s part of why storage waste is so persistent. Compute waste tends to at least look busy. Storage waste just sits there quietly, billing with excellent emotional discipline.

There are two recurring offenders here:

  • Unattached EBS volumes

  • Snapshots with no lifecycle

They look different on the bill, but the root cause is usually the same: nobody came back to decide what should happen after the original workload stopped mattering.


3a) Unattached EBS Volumes: Paying for “Maybe We Still Need It”

Unattached EBS volumes are one of the simplest forms of waste in AWS.

They usually come from very ordinary events:

  • an EC2 instance was terminated, but the volume was retained

  • someone detached a volume during troubleshooting

  • a migration got halfway through and then became “tomorrow’s problem”

  • a test environment was cleaned up, except for the storage, which apparently was granted diplomatic immunity

The volume state in EBS for these is available, which is AWS’s way of saying: “This is attached to nothing, but still very much attached to your bill.”

The pattern

The cleanup pattern here is straightforward:

  1. Run a daily Lambda

  2. Find volumes in state available

  3. Tag newly found volumes with:

    • CleanupCandidate=true

    • FirstSeenAvailable=

    • DeleteAfter=<date+7d>

  4. On subsequent runs, if the volume is still available and not exempt, delete it

That 7-day grace period matters. It keeps the automation practical rather than reckless. You want to clean up forgotten storage, not turn someone’s live troubleshooting session into a memorable learning experience.

The underlying EBS pricing model is simple: you pay for provisioned storage until you release it. AWS’s public EBS pricing examples use $0.08/GB-month for gp3 volume storage in a region that charges that rate.

The cost model

For the purposes of calculation, I’ll use:

Monthly unattached volume cost = Provisioned GB × $0.08

That gives us a clean, conservative baseline using AWS’s own public gp3 example rate.


Small environment

A small estate might not look dramatic at all:

  • Development: 1.0 TB of detached test volumes spread across a few teams

  • Acceptance: 0.5 TB left behind from refreshes and release validation

  • Production: 0.5 TB of retained-but-unused storage from old hosts or cutovers

That’s 2.0 TB total, or 2,048 GB.

Calculation:

2,048 GB × \(0.08/GB-month = \)163.84/month

So even in a fairly small environment, you’re at:

→ $163.84/month

Not catastrophic. Just pointless.

That’s usually the theme with unattached volumes: not one bad decision, but a pile of harmless-looking ones.


Medium environment

Now move to a more typical multi-team setup:

  • Development: 4 TB of ad hoc test storage and abandoned instance volumes

  • Acceptance: 3 TB from repeated environment refreshes and short-lived testing cycles

  • Production: 3 TB from older migrations, retained rollbacks, and “leave it there for now” volumes

That’s 10 TB total, or 10,240 GB.

Calculation:

10,240 GB × \(0.08/GB-month = \)819.20/month

So the monthly waste becomes:

→ $819.20/month

This is usually where the conversation changes.

At small scale, unattached storage is an annoyance.
At medium scale, it becomes a recurring bill for infrastructure that is literally helping nobody.


Large environment

At larger scale, unattached storage becomes less of an exception and more of a background condition:

  • Development: 20 TB across many teams, ephemeral workloads, and inconsistent clean-up

  • Acceptance: 15 TB across multiple shared services and test refreshes

  • Production: 15 TB of retained volumes from migrations, replacements, and rollback caution

That’s 50 TB total, or 51,200 GB.

Calculation:

51,200 GB × \(0.08/GB-month = \)4,096.00/month

So now the cost is:

→ $4,096/month

At that point, unattached EBS is no longer a housekeeping issue.
It’s a line item.

And the unpleasant thing is that the fix is still not complicated. It’s the same daily scan, just applied consistently.


What this is really saying

The important detail with unattached volumes is that the cost is wonderfully boring.

It doesn’t depend on CPU.
It doesn’t depend on traffic.
It doesn’t depend on whether the application is doing anything useful.

You are simply paying for storage that exists.

That makes it one of the cleanest cost-optimisation targets you’ll find in AWS, because there’s very little ambiguity about whether the spend is justified. If the volume is unattached for a week and no one objected, the answer is probably no.


3b) Snapshots: The Slow Archive Nobody Meant to Build

Snapshots start with good intentions.

They are created for safety.
For rollback.
For recovery.
For “just in case.”

All reasonable things.

The problem is that snapshots are usually very easy to create and much less often given a proper lifecycle afterward. So they accumulate. Quietly. Incrementally. With a sort of patient confidence that deserves respect, if not continued funding.

Two details matter here

First, EBS snapshots in the standard tier are incremental. AWS stores only the blocks that have changed, not a full copy every time. That means snapshot cost should be based on actual billed snapshot data, not the raw size of the source volumes.

Second, if you don’t set lifecycle intentionally, you tend to keep:

  • too many daily restore points

  • too many monthly “for safety” snapshots

  • too much old data in the standard tier

And that is where Data Lifecycle Manager helps.


The pattern: Amazon Data Lifecycle Manager (DLM)

Amazon Data Lifecycle Manager lets you automate the creation, retention, and deletion of EBS snapshots based on tags. AWS documents DLM as a complete backup solution for EC2 instances and EBS volumes at no additional cost.

A practical configuration looks like this:

Tag strategy

Apply tags such as:

  • SnapshotPolicy=dev

  • SnapshotPolicy=acc

  • SnapshotPolicy=prod

This is important because it lets policy follow the resource rather than relying on manual selection, which tends to age poorly.

Retention strategy

Use different schedules by environment:

  • Development: daily snapshots, retain 7–14 days

  • Acceptance: daily snapshots, retain 30 days

  • Production: daily snapshots, retain 90 days, plus monthly snapshots retained longer

DLM supports custom policies for EBS snapshots and can automate retention and deletion.

Archive strategy

For older snapshots that are rarely restored, use EBS Snapshot Archive.

AWS documents:

  • Standard snapshot storage: $0.05/GB-month

  • Archive storage: $0.0125/GB-month

  • Archive retrieval: $0.03/GB

  • Minimum archive period: 90 days

So the simplest cost model becomes:

Current monthly snapshot cost = Standard-tier GB × $0.05

If you move older snapshots to archive and keep only a smaller working set in standard:

New monthly cost = (Standard-tier GB kept hot × \(0.05) + (Archived GB × \)0.0125)

For the examples below, I’ll use a pragmatic split:

  • 25% kept in standard tier

  • 75% moved to archive

That is not universal, but it is easy to explain and fairly realistic for environments where recent snapshots matter more than old ones.


Small environment

Assume the actual billed snapshot footprint across the three accounts is:

  • Development: 0.5 TB

  • Acceptance: 1.0 TB

  • Production: 2.5 TB

That’s 4 TB total, or 4,096 GB.

Without lifecycle

Everything stays in standard tier:

4,096 GB × \(0.05 = \)204.80/month

With DLM + archive

Keep 25% hot:

1,024 GB × \(0.05 = \)51.20

Archive 75%:

3,072 GB × \(0.0125 = \)38.40

Total:

\(51.20 + \)38.40 = $89.60/month

Savings:

$204.80 - $89.60 = $115.20/month

So even in a small environment:

→ $115.20/month saved

Again, not dramatic. But also not nothing. And unlike a lot of “optimisation” work, this doesn’t require a philosophical debate about whether the resource is really needed.


Medium environment

Now assume a more typical billed snapshot footprint:

  • Development: 2 TB

  • Acceptance: 6 TB

  • Production: 12 TB

That’s 20 TB total, or 20,480 GB.

Without lifecycle

20,480 GB × \(0.05 = \)1,024.00/month

With DLM + archive

Keep 25% in standard:

5,120 GB × \(0.05 = \)256.00

Archive 75%:

15,360 GB × \(0.0125 = \)192.00

Total:

\(256.00 + \)192.00 = $448.00/month

Savings:

$1,024.00 - $448.00 = $576.00/month

So the medium environment saves:

→ $576/month

This is where snapshot lifecycle starts to become one of those unusually polite cost controls: low drama, predictable outcome, very little downside if you configure it sensibly.


Large environment

At larger scale, snapshot history becomes a storage system in its own right.

Assume:

  • Development: 10 TB

  • Acceptance: 30 TB

  • Production: 60 TB

That’s 100 TB total, or 102,400 GB.

Without lifecycle

102,400 GB × \(0.05 = \)5,120.00/month

With DLM + archive

Keep 25% hot:

25,600 GB × \(0.05 = \)1,280.00

Archive 75%:

76,800 GB × \(0.0125 = \)960.00

Total:

\(1,280.00 + \)960.00 = $2,240.00/month

Savings:

$5,120.00 - $2,240.00 = $2,880.00/month

So the large environment saves:

→ $2,880/month

And the key point is that this is not the result of aggressive deletion. It is simply the result of matching storage tier to restore likelihood.

That is usually the whole game in cloud cost work: not removing value, just stopping the most expensive version of value from being the default.


What the storage numbers are really saying

There are two different stories here.

Unattached volumes

This is straightforward waste.
If a volume is detached, untouched, and no one claims it during a grace period, there is very little strategic ambiguity.

Snapshot lifecycle

This is not usually waste in the same way. It is more often good intent with poor follow-through.

The snapshots were created for valid reasons.
They just weren’t moved, expired, or tiered afterward.

That distinction matters. One is clean-up. The other is lifecycle design.


Storage section summary

Size

Unattached EBS savings

Snapshot lifecycle savings

Total storage savings

Small

\(163.84

\)115.20

\(279.04/month

Medium

\)819.20

\(576.00

\)1,395.20/month

Large

\(4,096.00

\)2,880.00

$6,976.00/month

These are not exotic savings. They are the result of asking two fairly plain questions:

  • Should this volume still exist?

  • Should this snapshot still live in the expensive tier?

Those questions are not glamorous, but they do tend to produce results.


Final Thought

Nothing here is particularly clever.

There’s no trick.

Just intent.

  • Compute learns when it’s not needed

  • Logs learn when to expire

  • Storage learns when to leave

Defaults don’t do that.

They can’t.

Because defaults are designed for safety, not specificity.

And the longer you rely on them, the more your infrastructure behaves like it’s still day one.

Even when your workload clearly isn’t.


If you want to take this further, the next step isn’t more automation.

It’s making these behaviours your platform defaults so that nobody has to remember them in the first place.

Which, in the end, is the only kind of optimisation that really scales.

Foundations First

Part 1 of 1

Foundations First is your warm‑up lap for the cloud world - simple, practical lessons to help you build real confidence before diving deeper. Think of it as me handing you the map AND the snacks so you don’t get lost on the way.

More from this blog

4 Crying Out Cloud

3 posts

Cloud builds, lessons, opinions, and the occasional oops‑moment. Real solutions, honest reflections, and a bit of humour from a cloud architect who’s seen things and wants to make your path easier.