BigQuery Storage Billing Model Mismatch

The problem

BigQuery defaults to logical storage billing, but the most cost-efficient model depends on dataset behavior. Some datasets remain on logical billing even when compression and stability would make physical billing cheaper. Others use physical billing despite high churn, retention overhead, or historical storage that increases cost. The issue is a mismatch between billing model and dataset behavior.

Why it happens

BigQuery uses logical billing by default, and teams rarely revisit the billing model as datasets evolve.
Compression benefits of physical billing are not evaluated against real dataset characteristics.
Time-travel retention, fail-safe storage, and historical physical bytes are often overlooked in cost estimates.
Storage optimization efforts focus on query cost instead of billing model efficiency.

What this means for cost

Estimated monthly

$150 to $3,000/mo

Estimated annual

$1,800 to $36,000/yr

This waste pattern often shows up as $150 to $3,000/mo in recurring monthly cost, or roughly $1,800 to $36,000/yr if it sits untouched for a year.

How to detect BigQuery storage billing model mismatch

The key signal is a dataset whose forecasted monthly storage cost is materially lower under the alternate billing model than under the one it uses today.

Start by collecting the current dataset billing model and the storage metrics the detector actually uses.

SELECT
  table_schema,
  SUM(active_logical_bytes) AS active_logical_bytes,
  SUM(long_term_logical_bytes) AS long_term_logical_bytes,
  SUM(active_physical_bytes) AS active_physical_bytes,
  SUM(long_term_physical_bytes) AS long_term_physical_bytes,
  SUM(time_travel_physical_bytes) AS time_travel_physical_bytes,
  SUM(fail_safe_physical_bytes) AS fail_safe_physical_bytes
FROM `region-us`.INFORMATION_SCHEMA.TABLE_STORAGE
GROUP BY table_schema;

Then compare those totals against the current regional storage rates for logical and physical billing. The detector logic is intentionally simple and transparent:

logical forecast uses active_logical_bytes and long_term_logical_bytes
physical forecast uses active_physical_bytes, long_term_physical_bytes, and fail_safe_physical_bytes
a finding appears only when the alternate billing model is materially cheaper than the active one

If you also need to confirm the active model, inspect the dataset metadata directly:

bq show --format=prettyjson my-project:my_dataset | jq '.storageBillingModel'

What this detector actually checks

This page is about billing-model fit, not generic BigQuery cleanup. Cloud Waste Hunter looks at the current billing model, dataset-level storage metrics, and forecasted cost under both models. It does not decide that physical billing is always better, and it does not treat stale tables alone as sufficient evidence.

That distinction matters because the best model can flip over time. A compressible, stable dataset can favor physical billing, while heavy rewrites, time-travel retention, or historical physical storage can make logical billing the better choice.

How to fix BigQuery storage billing model mismatch

Use the detector as a review queue, not an auto-flip rule:

confirm the current billing model and dataset location
validate that recent churn and retention behavior are representative rather than temporary
switch billing models only when the cheaper path is likely to stay cheaper
reduce stale tables, duplicate outputs, or excessive retention when those are the real cost drivers

Caveats and overlap boundaries

This detector uses modeled storage forecasts rather than direct billing-export attribution, so it is best for prioritization and review. It also excludes the free storage tier because that allowance cannot be assigned cleanly to one dataset.

If the bigger problem is abandoned derived tables or stale marts, that is adjacent to but different from a billing-model mismatch. The storage-billing question is “is this dataset on the right model?” The stale-table question is “should these tables exist at all?”

How Cloud Waste Hunter helps

Cloud Waste Hunter compares forecasted logical and physical dataset storage cost, shows which model looks cheaper from current storage characteristics, and gives operators enough context to decide whether the mismatch is durable or just temporary churn. For the broader storage review, continue into the GCP Storage Cost Optimization guide.

FAQ

Is physical storage billing always cheaper?

No. Physical billing is cheaper when datasets are stable and compress well, but can be more expensive when churn, time-travel retention, or historical storage is high.

Why do most datasets stay on logical billing?

Logical billing is the default in BigQuery, and teams often do not revisit the billing model as datasets evolve.

What factors determine the best billing model?

The key factors are compression ratio, data churn, time-travel retention, and the amount of historical or fail-safe storage retained.

The problem

Why it happens

What this means for cost

How to detect BigQuery storage billing model mismatch

What this detector actually checks

How to fix BigQuery storage billing model mismatch

Caveats and overlap boundaries

How Cloud Waste Hunter helps

FAQ

Related detectors

GCS bucket lifecycle policy cleanup

GCS versioning without noncurrent cleanup

Unattached Persistent Disks