Your AWS Bill Isn’t Wrong - It’s Just Following Defaults

There’s a pattern I’ve seen often enough that it no longer feels like a coincidence.
A team builds something sensible. They separate environments properly - Development, Acceptance, Production.
They follow best practices: tagging, scaling, monitoring… the works.
And then, a few months later, someone opens the bill and asks:
“Why is this still costing so much when nothing’s really happening?”
That’s the interesting part.
Because in most cases, nothing is happening.
At least not in the way the system was originally designed.
The workloads are quiet. The deployments are stable.
The traffic is predictable.
But the infrastructure?
That’s still behaving like it’s launch day.
This article looks at three places where AWS defaults quietly keep things running (and billing), even when your workload has moved on:
Compute that doesn’t know when to rest
Logs that remember more than anyone reads
Storage that outlives its purpose
And more importantly - what it looks like when you design with intent instead.
1) Compute That Never Learned When to Stop
Let’s start with compute, because it’s the easiest to reason about (and the easiest to ignore).
The Shape of a Real Environment
In most modern AWS setups we have something like:
Development
A few small EC2 instances
Created and destroyed somewhat unpredictably
Often left running longer than intended
Acceptance
Auto Scaling groups
Lower baseline than production
Scales up during testing, then… doesn’t always scale back down
Production
Auto Scaling done properly (usually)
But includes non-customer-facing components that don’t need 24/7 uptime
So the problem isn’t just “instances running too long.”
It’s:
Parts of the system behaving like they’re always needed, even when they’re not.
On a side note: one could also do this with RDS, but for the purposes of this blog I will stick to EC2.
Where Scheduling Still Matters (Even with Auto Scaling)
There’s a common assumption:
“We use Auto Scaling, so we’re already optimized.”
That’s only partially true.
Auto Scaling optimises for load.
It does not optimise (by default) for:
Time-of-day usage patterns
Human working hours
Internal systems that nobody uses at night
So you still end up with:
Baseline capacity that never drops to zero
Dev instances quietly running overnight
Acceptance environments idling between test cycles
The Pattern: Scheduled Intent
The solution is not replacing Auto Scaling.
It’s complementing it.
Dev EC2 → scheduled stop/start
Acceptance ASG → scheduled min capacity = 0 off-hours
Prod (select workloads) → scheduled scale-down or shutdown
Using:
EventBridge schedules
Lambda functions
Tag-driven targeting
Pricing Context (eu-west-1)
c7g.large: ~$0.078/hour
c7g.xlarge: ~$0.155/hour
Monthly saving per instance (~510 hours avoided):
Dev-sized: ~$40/month
Prod-sized: ~$79/month
What This Looks Like
Small environment
Dev: 3 instances
Acc: baseline 2
Prod: 1 schedulable internal workload
Savings:
Dev: $120
Acc: $80
Prod: $79
→ ~$280/month
Medium environment
Dev: ~10 instances
Acc: baseline ~6
Prod: internal workloads
→ ~$1,100/month
Large environment
Dev: dozens of instances
Acc: multiple ASGs
Prod: internal services at scale
→ ~$3,500–$4,000/month
The Subtle Realisation
Auto Scaling optimises for demand.
Scheduling optimises for the absence of demand.
You need both.
2) CloudWatch Logs: The Cost of Remembering Everything
Logs are comforting.
They make us feel prepared, responsible, and mildly in control.
They also quietly accumulate into something nobody explicitly designed.
CloudWatch Logs defaults to never expire.
Which is great - right up until you realise you’re paying to remember things nobody reads anymore.
The Pattern: Make Retention Explicit
Instead of relying on good intentions:
Dev → 7 days
Acc → 30 days
Prod → 90 days
And enforce it using:
AWS Config custom rule
SSM auto-remediation (or Lambda function if you need custom logic)
This turns retention into a system, not a habit.
The Important Bit Most People Miss
CloudWatch Logs has two separate cost drivers:
Ingestion (writing logs)
Storage (keeping them)
Retention only affects storage.
Which means:
You don’t reduce logging cost by deleting logs.
You reduce storage cost.
That distinction matters more than it should.
A Note on Compression (Because This Gets Confusing)
CloudWatch Logs stores data in a compressed format, but AWS does not publish a fixed compression ratio.
In their pricing examples, log data is often shown compressing to roughly 15–20% of its original size (roughly 5:1 to 6:1).
Actual compression depends heavily on:
log format (JSON vs text)
repetition
structure
For the purposes of these examples, we’ll assume ~5:1 compression.
Not because it’s exact — but because it’s a reasonable approximation for comparative analysis.
Pricing Assumptions (eu-west-1)
Ingestion: $0.57/GB
Storage: $0.03/GB-month
Free tier: 5 GB
Compression ratio of 5:1
Small Environment
Log volume
Dev: 5 GB/day
Acc: 10 GB/day
Prod: 30 GB/day
Total: 45 GB/day → 1,350 GB/month
Ingestion cost (unchanged by retention)
(1,350 - 5) × \(0.57 = \)766.65/month
This is the baseline reality.
Retention won’t fix it.
Storage with retention (7/30/90)
We compress and store:
Dev: (5 × 7) ÷ 5 = 7 GB
Acc: (10 × 30) ÷ 5 = 60 GB
Prod: (30 × 90) ÷ 5 = 540 GB
Total: 607 GB
(607 - 5) × \(0.03 = \)18.06/month
Storage without retention (~1 year)
Total stored:
- 3,285 GB
(3,285 - 5) × \(0.03 = \)98.40/month
Savings
$98.40 - $18.06 = $80.34/month
Cost of enforcing it
Assume ~40 log groups/month:
Config: $0.12
Rule eval: $0.04
→ $0.16/month
Net
→ ~$80/month saved
Medium Environment
- 225 GB/day → 6,750 GB/month
Storage:
With retention: $90.90
Without: $492.60
Savings:
→ ~$401/month
Governance cost:
→ ~$0.80
Net:
→ ~$400/month
Large Environment
- 900 GB/day → 27,000 GB/month
Storage:
With retention: $364.05
Without: $1,970.85
Savings:
→ ~$1,606/month
Governance:
→ ~$4/month
Net:
→ ~$1,602/month
What the Numbers Are Actually Saying
Two things become clear:
**Retention is still worth it
**Even though it doesn’t touch ingestionIngestion is the real problem at scale
Which leads to the natural next step:
The Better Pattern
Split responsibilities:
CloudWatch → recent, searchable logs
S3 → long-term storage
This gives you:
Fast debugging
Cheap retention
Without paying CloudWatch rates forever.
3) Storage That Doesn’t Leave
Storage is different.
It doesn’t spike.
It doesn’t scale dynamically.
It doesn’t politely disappear when the workload that created it has moved on.
It just stays.
That’s part of why storage waste is so persistent. Compute waste tends to at least look busy. Storage waste just sits there quietly, billing with excellent emotional discipline.
There are two recurring offenders here:
Unattached EBS volumes
Snapshots with no lifecycle
They look different on the bill, but the root cause is usually the same: nobody came back to decide what should happen after the original workload stopped mattering.
3a) Unattached EBS Volumes: Paying for “Maybe We Still Need It”
Unattached EBS volumes are one of the simplest forms of waste in AWS.
They usually come from very ordinary events:
an EC2 instance was terminated, but the volume was retained
someone detached a volume during troubleshooting
a migration got halfway through and then became “tomorrow’s problem”
a test environment was cleaned up, except for the storage, which apparently was granted diplomatic immunity
The volume state in EBS for these is available, which is AWS’s way of saying: “This is attached to nothing, but still very much attached to your bill.”
The pattern
The cleanup pattern here is straightforward:
Run a daily Lambda
Find volumes in state available
Tag newly found volumes with:
CleanupCandidate=true
FirstSeenAvailable=
DeleteAfter=<date+7d>
On subsequent runs, if the volume is still available and not exempt, delete it
That 7-day grace period matters. It keeps the automation practical rather than reckless. You want to clean up forgotten storage, not turn someone’s live troubleshooting session into a memorable learning experience.
The underlying EBS pricing model is simple: you pay for provisioned storage until you release it. AWS’s public EBS pricing examples use $0.08/GB-month for gp3 volume storage in a region that charges that rate.
The cost model
For the purposes of calculation, I’ll use:
Monthly unattached volume cost = Provisioned GB × $0.08
That gives us a clean, conservative baseline using AWS’s own public gp3 example rate.
Small environment
A small estate might not look dramatic at all:
Development: 1.0 TB of detached test volumes spread across a few teams
Acceptance: 0.5 TB left behind from refreshes and release validation
Production: 0.5 TB of retained-but-unused storage from old hosts or cutovers
That’s 2.0 TB total, or 2,048 GB.
Calculation:
2,048 GB × \(0.08/GB-month = \)163.84/month
So even in a fairly small environment, you’re at:
→ $163.84/month
Not catastrophic. Just pointless.
That’s usually the theme with unattached volumes: not one bad decision, but a pile of harmless-looking ones.
Medium environment
Now move to a more typical multi-team setup:
Development: 4 TB of ad hoc test storage and abandoned instance volumes
Acceptance: 3 TB from repeated environment refreshes and short-lived testing cycles
Production: 3 TB from older migrations, retained rollbacks, and “leave it there for now” volumes
That’s 10 TB total, or 10,240 GB.
Calculation:
10,240 GB × \(0.08/GB-month = \)819.20/month
So the monthly waste becomes:
→ $819.20/month
This is usually where the conversation changes.
At small scale, unattached storage is an annoyance.
At medium scale, it becomes a recurring bill for infrastructure that is literally helping nobody.
Large environment
At larger scale, unattached storage becomes less of an exception and more of a background condition:
Development: 20 TB across many teams, ephemeral workloads, and inconsistent clean-up
Acceptance: 15 TB across multiple shared services and test refreshes
Production: 15 TB of retained volumes from migrations, replacements, and rollback caution
That’s 50 TB total, or 51,200 GB.
Calculation:
51,200 GB × \(0.08/GB-month = \)4,096.00/month
So now the cost is:
→ $4,096/month
At that point, unattached EBS is no longer a housekeeping issue.
It’s a line item.
And the unpleasant thing is that the fix is still not complicated. It’s the same daily scan, just applied consistently.
What this is really saying
The important detail with unattached volumes is that the cost is wonderfully boring.
It doesn’t depend on CPU.
It doesn’t depend on traffic.
It doesn’t depend on whether the application is doing anything useful.
You are simply paying for storage that exists.
That makes it one of the cleanest cost-optimisation targets you’ll find in AWS, because there’s very little ambiguity about whether the spend is justified. If the volume is unattached for a week and no one objected, the answer is probably no.
3b) Snapshots: The Slow Archive Nobody Meant to Build
Snapshots start with good intentions.
They are created for safety.
For rollback.
For recovery.
For “just in case.”
All reasonable things.
The problem is that snapshots are usually very easy to create and much less often given a proper lifecycle afterward. So they accumulate. Quietly. Incrementally. With a sort of patient confidence that deserves respect, if not continued funding.
Two details matter here
First, EBS snapshots in the standard tier are incremental. AWS stores only the blocks that have changed, not a full copy every time. That means snapshot cost should be based on actual billed snapshot data, not the raw size of the source volumes.
Second, if you don’t set lifecycle intentionally, you tend to keep:
too many daily restore points
too many monthly “for safety” snapshots
too much old data in the standard tier
And that is where Data Lifecycle Manager helps.
The pattern: Amazon Data Lifecycle Manager (DLM)
Amazon Data Lifecycle Manager lets you automate the creation, retention, and deletion of EBS snapshots based on tags. AWS documents DLM as a complete backup solution for EC2 instances and EBS volumes at no additional cost.
A practical configuration looks like this:
Tag strategy
Apply tags such as:
SnapshotPolicy=dev
SnapshotPolicy=acc
SnapshotPolicy=prod
This is important because it lets policy follow the resource rather than relying on manual selection, which tends to age poorly.
Retention strategy
Use different schedules by environment:
Development: daily snapshots, retain 7–14 days
Acceptance: daily snapshots, retain 30 days
Production: daily snapshots, retain 90 days, plus monthly snapshots retained longer
DLM supports custom policies for EBS snapshots and can automate retention and deletion.
Archive strategy
For older snapshots that are rarely restored, use EBS Snapshot Archive.
AWS documents:
Standard snapshot storage: $0.05/GB-month
Archive storage: $0.0125/GB-month
Archive retrieval: $0.03/GB
Minimum archive period: 90 days
So the simplest cost model becomes:
Current monthly snapshot cost = Standard-tier GB × $0.05
If you move older snapshots to archive and keep only a smaller working set in standard:
New monthly cost = (Standard-tier GB kept hot × \(0.05) + (Archived GB × \)0.0125)
For the examples below, I’ll use a pragmatic split:
25% kept in standard tier
75% moved to archive
That is not universal, but it is easy to explain and fairly realistic for environments where recent snapshots matter more than old ones.
Small environment
Assume the actual billed snapshot footprint across the three accounts is:
Development: 0.5 TB
Acceptance: 1.0 TB
Production: 2.5 TB
That’s 4 TB total, or 4,096 GB.
Without lifecycle
Everything stays in standard tier:
4,096 GB × \(0.05 = \)204.80/month
With DLM + archive
Keep 25% hot:
1,024 GB × \(0.05 = \)51.20
Archive 75%:
3,072 GB × \(0.0125 = \)38.40
Total:
\(51.20 + \)38.40 = $89.60/month
Savings:
$204.80 - $89.60 = $115.20/month
So even in a small environment:
→ $115.20/month saved
Again, not dramatic. But also not nothing. And unlike a lot of “optimisation” work, this doesn’t require a philosophical debate about whether the resource is really needed.
Medium environment
Now assume a more typical billed snapshot footprint:
Development: 2 TB
Acceptance: 6 TB
Production: 12 TB
That’s 20 TB total, or 20,480 GB.
Without lifecycle
20,480 GB × \(0.05 = \)1,024.00/month
With DLM + archive
Keep 25% in standard:
5,120 GB × \(0.05 = \)256.00
Archive 75%:
15,360 GB × \(0.0125 = \)192.00
Total:
\(256.00 + \)192.00 = $448.00/month
Savings:
$1,024.00 - $448.00 = $576.00/month
So the medium environment saves:
→ $576/month
This is where snapshot lifecycle starts to become one of those unusually polite cost controls: low drama, predictable outcome, very little downside if you configure it sensibly.
Large environment
At larger scale, snapshot history becomes a storage system in its own right.
Assume:
Development: 10 TB
Acceptance: 30 TB
Production: 60 TB
That’s 100 TB total, or 102,400 GB.
Without lifecycle
102,400 GB × \(0.05 = \)5,120.00/month
With DLM + archive
Keep 25% hot:
25,600 GB × \(0.05 = \)1,280.00
Archive 75%:
76,800 GB × \(0.0125 = \)960.00
Total:
\(1,280.00 + \)960.00 = $2,240.00/month
Savings:
$5,120.00 - $2,240.00 = $2,880.00/month
So the large environment saves:
→ $2,880/month
And the key point is that this is not the result of aggressive deletion. It is simply the result of matching storage tier to restore likelihood.
That is usually the whole game in cloud cost work: not removing value, just stopping the most expensive version of value from being the default.
What the storage numbers are really saying
There are two different stories here.
Unattached volumes
This is straightforward waste.
If a volume is detached, untouched, and no one claims it during a grace period, there is very little strategic ambiguity.
Snapshot lifecycle
This is not usually waste in the same way. It is more often good intent with poor follow-through.
The snapshots were created for valid reasons.
They just weren’t moved, expired, or tiered afterward.
That distinction matters. One is clean-up. The other is lifecycle design.
Storage section summary
Size | Unattached EBS savings | Snapshot lifecycle savings | Total storage savings |
Small | \(163.84 | \)115.20 | \(279.04/month |
Medium | \)819.20 | \(576.00 | \)1,395.20/month |
Large | \(4,096.00 | \)2,880.00 | $6,976.00/month |
These are not exotic savings. They are the result of asking two fairly plain questions:
Should this volume still exist?
Should this snapshot still live in the expensive tier?
Those questions are not glamorous, but they do tend to produce results.
Final Thought
Nothing here is particularly clever.
There’s no trick.
Just intent.
Compute learns when it’s not needed
Logs learn when to expire
Storage learns when to leave
Defaults don’t do that.
They can’t.
Because defaults are designed for safety, not specificity.
And the longer you rely on them, the more your infrastructure behaves like it’s still day one.
Even when your workload clearly isn’t.
If you want to take this further, the next step isn’t more automation.
It’s making these behaviours your platform defaults so that nobody has to remember them in the first place.
Which, in the end, is the only kind of optimisation that really scales.

