There’s a pattern I’ve seen often enough that it no longer feels like a coincidence.

A team builds something sensible. They separate environments properly - Development, Acceptance, Production.
They follow best practices: tagging, scaling, monitoring… the works.

And then, a few months later, someone opens the bill and asks:

“Why is this still costing so much when nothing’s really happening?”

That’s the interesting part.

Because in most cases, nothing is happening.

At least not in the way the system was originally designed.

The workloads are quiet. The deployments are stable.
The traffic is predictable.

But the infrastructure?

That’s still behaving like it’s launch day.

This article looks at three places where AWS defaults quietly keep things running (and billing), even when your workload has moved on:

Compute that doesn’t know when to rest
Logs that remember more than anyone reads
Storage that outlives its purpose

And more importantly - what it looks like when you design with intent instead.

1) Compute That Never Learned When to Stop

Let’s start with compute, because it’s the easiest to reason about (and the easiest to ignore).

The Shape of a Real Environment

In most modern AWS setups we have something like:

Development

A few small EC2 instances
Created and destroyed somewhat unpredictably
Often left running longer than intended

Acceptance

Auto Scaling groups
Lower baseline than production
Scales up during testing, then… doesn’t always scale back down

Production

Auto Scaling done properly (usually)
But includes non-customer-facing components that don’t need 24/7 uptime

So the problem isn’t just “instances running too long.”

It’s:

Parts of the system behaving like they’re always needed, even when they’re not.

On a side note: one could also do this with RDS, but for the purposes of this blog I will stick to EC2.

Where Scheduling Still Matters (Even with Auto Scaling)

There’s a common assumption:

“We use Auto Scaling, so we’re already optimized.”

That’s only partially true.

Auto Scaling optimises for load.
It does not optimise (by default) for:

Time-of-day usage patterns
Human working hours
Internal systems that nobody uses at night

So you still end up with:

Baseline capacity that never drops to zero
Dev instances quietly running overnight
Acceptance environments idling between test cycles

The Pattern: Scheduled Intent

The solution is not replacing Auto Scaling.

It’s complementing it.

Dev EC2 → scheduled stop/start
Acceptance ASG → scheduled min capacity = 0 off-hours
Prod (select workloads) → scheduled scale-down or shutdown

Using:

EventBridge schedules
Lambda functions
Tag-driven targeting

Pricing Context (eu-west-1)

c7g.large: ~$0.078/hour
c7g.xlarge: ~$0.155/hour

Monthly saving per instance (~510 hours avoided):

Dev-sized: ~$40/month
Prod-sized: ~$79/month

What This Looks Like

Small environment

Dev: 3 instances
Acc: baseline 2
Prod: 1 schedulable internal workload

Savings:

Dev: $120
Acc: $80
Prod: $79

→ ~$280/month

Medium environment

Dev: ~10 instances
Acc: baseline ~6
Prod: internal workloads

→ ~$1,100/month

Large environment

Dev: dozens of instances
Acc: multiple ASGs
Prod: internal services at scale

→ ~$3,500–$4,000/month

The Subtle Realisation

Auto Scaling optimises for demand.
Scheduling optimises for the absence of demand.

You need both.

2) CloudWatch Logs: The Cost of Remembering Everything

Logs are comforting.

They make us feel prepared, responsible, and mildly in control.

They also quietly accumulate into something nobody explicitly designed.

CloudWatch Logs defaults to never expire.

Which is great - right up until you realise you’re paying to remember things nobody reads anymore.

The Pattern: Make Retention Explicit

Instead of relying on good intentions:

Dev → 7 days
Acc → 30 days
Prod → 90 days

And enforce it using:

AWS Config custom rule
SSM auto-remediation (or Lambda function if you need custom logic)

This turns retention into a system, not a habit.

The Important Bit Most People Miss

CloudWatch Logs has two separate cost drivers:

Ingestion (writing logs)
Storage (keeping them)

Retention only affects storage.

Which means:

You don’t reduce logging cost by deleting logs.
You reduce storage cost.

That distinction matters more than it should.

A Note on Compression (Because This Gets Confusing)

CloudWatch Logs stores data in a compressed format, but AWS does not publish a fixed compression ratio.

In their pricing examples, log data is often shown compressing to roughly 15–20% of its original size (roughly 5:1 to 6:1).

Actual compression depends heavily on:

log format (JSON vs text)
repetition
structure

For the purposes of these examples, we’ll assume ~5:1 compression.

Not because it’s exact — but because it’s a reasonable approximation for comparative analysis.

Pricing Assumptions (eu-west-1)

Ingestion: $0.57/GB
Storage: $0.03/GB-month
Free tier: 5 GB
Compression ratio of 5:1

Small Environment

Log volume

Dev: 5 GB/day
Acc: 10 GB/day
Prod: 30 GB/day

Total: 45 GB/day → 1,350 GB/month

Ingestion cost (unchanged by retention)

(1,350 - 5) × $0.57 = $766.65/month

This is the baseline reality.

Retention won’t fix it.

Storage with retention (7/30/90)

We compress and store:

Dev: (5 × 7) ÷ 5 = 7 GB
Acc: (10 × 30) ÷ 5 = 60 GB
Prod: (30 × 90) ÷ 5 = 540 GB

Total: 607 GB

(607 - 5) × $0.03 = $18.06/month

Storage without retention (~1 year)

Total stored:

3,285 GB

(3,285 - 5) × $0.03 = $98.40/month

Savings

$98.40 - $18.06 = $80.34/month

Cost of enforcing it

Assume ~40 log groups/month:

Config: $0.12
Rule eval: $0.04

→ $0.16/month

Net

→ ~$80/month saved

Medium Environment

225 GB/day → 6,750 GB/month

Storage:

With retention: $90.90
Without: $492.60

Savings:

→ ~$401/month

Governance cost:

→ ~$0.80

Net:

→ ~$400/month

Large Environment

900 GB/day → 27,000 GB/month

Storage:

With retention: $364.05
Without: $1,970.85

Savings:

→ ~$1,606/month

Governance:

→ ~$4/month

Net:

→ ~$1,602/month

What the Numbers Are Actually Saying

Two things become clear:

**Retention is still worth it
**Even though it doesn’t touch ingestion
Ingestion is the real problem at scale

Which leads to the natural next step:

The Better Pattern

Split responsibilities:

CloudWatch → recent, searchable logs
S3 → long-term storage

This gives you:

Fast debugging
Cheap retention

Without paying CloudWatch rates forever.

3) Storage That Doesn’t Leave

Storage is different.

It doesn’t spike.
It doesn’t scale dynamically.
It doesn’t politely disappear when the workload that created it has moved on.

It just stays.

That’s part of why storage waste is so persistent. Compute waste tends to at least look busy. Storage waste just sits there quietly, billing with excellent emotional discipline.

There are two recurring offenders here:

Unattached EBS volumes
Snapshots with no lifecycle

They look different on the bill, but the root cause is usually the same: nobody came back to decide what should happen after the original workload stopped mattering.

3a) Unattached EBS Volumes: Paying for “Maybe We Still Need It”

Unattached EBS volumes are one of the simplest forms of waste in AWS.

They usually come from very ordinary events:

an EC2 instance was terminated, but the volume was retained
someone detached a volume during troubleshooting
a migration got halfway through and then became “tomorrow’s problem”
a test environment was cleaned up, except for the storage, which apparently was granted diplomatic immunity

The volume state in EBS for these is available, which is AWS’s way of saying: “This is attached to nothing, but still very much attached to your bill.”

The pattern

The cleanup pattern here is straightforward:

Run a daily Lambda
Find volumes in state available
Tag newly found volumes with:
- CleanupCandidate=true
- FirstSeenAvailable=
- DeleteAfter=<date+7d>
On subsequent runs, if the volume is still available and not exempt, delete it

That 7-day grace period matters. It keeps the automation practical rather than reckless. You want to clean up forgotten storage, not turn someone’s live troubleshooting session into a memorable learning experience.

The underlying EBS pricing model is simple: you pay for provisioned storage until you release it. AWS’s public EBS pricing examples use $0.08/GB-month for gp3 volume storage in a region that charges that rate.

The cost model

For the purposes of calculation, I’ll use:

Monthly unattached volume cost = Provisioned GB × $0.08

That gives us a clean, conservative baseline using AWS’s own public gp3 example rate.

Small environment

A small estate might not look dramatic at all:

Development: 1.0 TB of detached test volumes spread across a few teams
Acceptance: 0.5 TB left behind from refreshes and release validation
Production: 0.5 TB of retained-but-unused storage from old hosts or cutovers

That’s 2.0 TB total, or 2,048 GB.

Calculation:

2,048 GB × $0.08/GB-month = $163.84/month

So even in a fairly small environment, you’re at:

→ $163.84/month

Not catastrophic. Just pointless.

That’s usually the theme with unattached volumes: not one bad decision, but a pile of harmless-looking ones.

Medium environment

Now move to a more typical multi-team setup:

Development: 4 TB of ad hoc test storage and abandoned instance volumes
Acceptance: 3 TB from repeated environment refreshes and short-lived testing cycles
Production: 3 TB from older migrations, retained rollbacks, and “leave it there for now” volumes

That’s 10 TB total, or 10,240 GB.

Calculation:

10,240 GB × $0.08/GB-month = $819.20/month

So the monthly waste becomes:

→ $819.20/month

This is usually where the conversation changes.

At small scale, unattached storage is an annoyance.
At medium scale, it becomes a recurring bill for infrastructure that is literally helping nobody.

Large environment

At larger scale, unattached storage becomes less of an exception and more of a background condition:

Development: 20 TB across many teams, ephemeral workloads, and inconsistent clean-up
Acceptance: 15 TB across multiple shared services and test refreshes
Production: 15 TB of retained volumes from migrations, replacements, and rollback caution

That’s 50 TB total, or 51,200 GB.

Calculation:

51,200 GB × $0.08/GB-month = $4,096.00/month

So now the cost is:

→ $4,096/month

At that point, unattached EBS is no longer a housekeeping issue.
It’s a line item.

And the unpleasant thing is that the fix is still not complicated. It’s the same daily scan, just applied consistently.

What this is really saying

The important detail with unattached volumes is that the cost is wonderfully boring.

It doesn’t depend on CPU.
It doesn’t depend on traffic.
It doesn’t depend on whether the application is doing anything useful.

You are simply paying for storage that exists.

That makes it one of the cleanest cost-optimisation targets you’ll find in AWS, because there’s very little ambiguity about whether the spend is justified. If the volume is unattached for a week and no one objected, the answer is probably no.

3b) Snapshots: The Slow Archive Nobody Meant to Build

Snapshots start with good intentions.

They are created for safety.
For rollback.
For recovery.
For “just in case.”

All reasonable things.

The problem is that snapshots are usually very easy to create and much less often given a proper lifecycle afterward. So they accumulate. Quietly. Incrementally. With a sort of patient confidence that deserves respect, if not continued funding.

Two details matter here

First, EBS snapshots in the standard tier are incremental. AWS stores only the blocks that have changed, not a full copy every time. That means snapshot cost should be based on actual billed snapshot data, not the raw size of the source volumes.

Second, if you don’t set lifecycle intentionally, you tend to keep:

too many daily restore points
too many monthly “for safety” snapshots
too much old data in the standard tier

And that is where Data Lifecycle Manager helps.

The pattern: Amazon Data Lifecycle Manager (DLM)

Amazon Data Lifecycle Manager lets you automate the creation, retention, and deletion of EBS snapshots based on tags. AWS documents DLM as a complete backup solution for EC2 instances and EBS volumes at no additional cost.

A practical configuration looks like this:

Tag strategy

Apply tags such as:

SnapshotPolicy=dev
SnapshotPolicy=acc
SnapshotPolicy=prod

This is important because it lets policy follow the resource rather than relying on manual selection, which tends to age poorly.

Retention strategy

Use different schedules by environment:

Development: daily snapshots, retain 7–14 days
Acceptance: daily snapshots, retain 30 days
Production: daily snapshots, retain 90 days, plus monthly snapshots retained longer

DLM supports custom policies for EBS snapshots and can automate retention and deletion.

Archive strategy

For older snapshots that are rarely restored, use EBS Snapshot Archive.

AWS documents:

Standard snapshot storage: $0.05/GB-month
Archive storage: $0.0125/GB-month
Archive retrieval: $0.03/GB
Minimum archive period: 90 days

So the simplest cost model becomes:

Current monthly snapshot cost = Standard-tier GB × $0.05

If you move older snapshots to archive and keep only a smaller working set in standard:

New monthly cost = (Standard-tier GB kept hot × $0.05) + (Archived GB × $0.0125)

For the examples below, I’ll use a pragmatic split:

25% kept in standard tier
75% moved to archive

That is not universal, but it is easy to explain and fairly realistic for environments where recent snapshots matter more than old ones.

Small environment

Assume the actual billed snapshot footprint across the three accounts is:

Development: 0.5 TB
Acceptance: 1.0 TB
Production: 2.5 TB

That’s 4 TB total, or 4,096 GB.

Without lifecycle

Everything stays in standard tier:

4,096 GB × $0.05 = $204.80/month

With DLM + archive

Keep 25% hot:

1,024 GB × $0.05 = $51.20

Archive 75%:

3,072 GB × $0.0125 = $38.40

Total:

$51.20 + $38.40 = $89.60/month

Savings:

$204.80 - $89.60 = $115.20/month

So even in a small environment:

→ $115.20/month saved

Again, not dramatic. But also not nothing. And unlike a lot of “optimisation” work, this doesn’t require a philosophical debate about whether the resource is really needed.

Medium environment

Now assume a more typical billed snapshot footprint:

Development: 2 TB
Acceptance: 6 TB
Production: 12 TB

That’s 20 TB total, or 20,480 GB.

Without lifecycle

20,480 GB × $0.05 = $1,024.00/month

With DLM + archive

Keep 25% in standard:

5,120 GB × $0.05 = $256.00

Archive 75%:

15,360 GB × $0.0125 = $192.00

Total:

$256.00 + $192.00 = $448.00/month

Savings:

$1,024.00 - $448.00 = $576.00/month

So the medium environment saves:

→ $576/month

This is where snapshot lifecycle starts to become one of those unusually polite cost controls: low drama, predictable outcome, very little downside if you configure it sensibly.

Large environment

At larger scale, snapshot history becomes a storage system in its own right.

Assume:

Development: 10 TB
Acceptance: 30 TB
Production: 60 TB

That’s 100 TB total, or 102,400 GB.

Without lifecycle

102,400 GB × $0.05 = $5,120.00/month

With DLM + archive

Keep 25% hot:

25,600 GB × $0.05 = $1,280.00

Archive 75%:

76,800 GB × $0.0125 = $960.00

Total:

$1,280.00 + $960.00 = $2,240.00/month

Savings:

$5,120.00 - $2,240.00 = $2,880.00/month

So the large environment saves:

→ $2,880/month

And the key point is that this is not the result of aggressive deletion. It is simply the result of matching storage tier to restore likelihood.

That is usually the whole game in cloud cost work: not removing value, just stopping the most expensive version of value from being the default.

What the storage numbers are really saying

There are two different stories here.

Unattached volumes

This is straightforward waste.
If a volume is detached, untouched, and no one claims it during a grace period, there is very little strategic ambiguity.

Snapshot lifecycle

This is not usually waste in the same way. It is more often good intent with poor follow-through.

The snapshots were created for valid reasons.
They just weren’t moved, expired, or tiered afterward.

That distinction matters. One is clean-up. The other is lifecycle design.

Storage section summary

Size	Unattached EBS savings	Snapshot lifecycle savings	Total storage savings
Small	\(163.84	\)115.20	\(279.04/month
Medium	\)819.20	\(576.00	\)1,395.20/month
Large	\(4,096.00	\)2,880.00	$6,976.00/month

These are not exotic savings. They are the result of asking two fairly plain questions:

Should this volume still exist?
Should this snapshot still live in the expensive tier?

Those questions are not glamorous, but they do tend to produce results.

Final Thought

Nothing here is particularly clever.

There’s no trick.

Just intent.

Compute learns when it’s not needed
Logs learn when to expire
Storage learns when to leave

Defaults don’t do that.

They can’t.

Because defaults are designed for safety, not specificity.

And the longer you rely on them, the more your infrastructure behaves like it’s still day one.

Even when your workload clearly isn’t.

If you want to take this further, the next step isn’t more automation.

It’s making these behaviours your platform defaults so that nobody has to remember them in the first place.

Which, in the end, is the only kind of optimisation that really scales.

Command Palette

1) Compute That Never Learned When to Stop

The Shape of a Real Environment

Development

Acceptance

Production

Where Scheduling Still Matters (Even with Auto Scaling)

The Pattern: Scheduled Intent

Pricing Context (eu-west-1)

What This Looks Like

Small environment

Medium environment

Large environment

The Subtle Realisation

2) CloudWatch Logs: The Cost of Remembering Everything

The Pattern: Make Retention Explicit

The Important Bit Most People Miss

A Note on Compression (Because This Gets Confusing)

Pricing Assumptions (eu-west-1)

Small Environment

Log volume

Ingestion cost (unchanged by retention)

Storage with retention (7/30/90)

Storage without retention (~1 year)

Savings

Cost of enforcing it

Net

Medium Environment

Large Environment

What the Numbers Are Actually Saying

The Better Pattern

3) Storage That Doesn’t Leave

3a) Unattached EBS Volumes: Paying for “Maybe We Still Need It”

The pattern

The cost model

Small environment

Medium environment

Large environment

What this is really saying

3b) Snapshots: The Slow Archive Nobody Meant to Build

Two details matter here

The pattern: Amazon Data Lifecycle Manager (DLM)

Tag strategy

Retention strategy

Archive strategy

Small environment

Without lifecycle

With DLM + archive

Medium environment

Without lifecycle

With DLM + archive

Large environment

Without lifecycle

With DLM + archive

What the storage numbers are really saying

Unattached volumes

Snapshot lifecycle

Storage section summary

Final Thought

Foundations First

More from this blog