Bloom Filters in Postgres: The Index Type Most Developers Overlook

This is the second post in a short series on Postgres features you can reach for instead of bolting on another piece of infrastructure. The first one, You Don't Need Redis, Postgres Already Has Pub/Sub, built a real-time app on LISTEN and NOTIFY. This one is about the bloom index.

Postgres has shipped a bloom index since 9.6. It can replace a stack of B-trees with a single, much smaller index, and almost nobody uses it. This post is a short tour: what a bloom filter is, how Postgres turns it into an index, when that index wins, and a benchmark on a realistic cache table.

Bloom filters are everywhere outside Postgres too. Google uses them inside Chrome's safe browsing list, Cassandra and HBase use them to skip SSTable lookups on the read path, Akamai uses them in CDN cache hints, Medium uses them to dedupe recommendations, and Bitcoin SPV clients use them to filter relevant transactions. The Postgres bloom extension is the same idea, just turned into an index type.

What a bloom filter is

Picture a tiny cache for usernames you've already seen. The naive version is a Set<string>: every username is stored in full, so a million users means a million strings in memory. A bloom filter is the cheap version of that set.

Instead of storing usernames, a bloom filter keeps a fixed-size array of bits, all starting at zero. To remember a name, hash it a few times, flip the bits at each hash position. To check whether a name was added, hash it again and look at the same positions. If any bit is 0, the name was never added. If all bits are 1, it was probably added.

The demo below walks through the idea: adding two names, then three lookups that show the three possible outcomes (probably present, definitely absent, and the false positive that gives bloom filters their reputation).

Empty filter

const bits = new Uint8Array(16);
// all zeros, nothing added

A 16-bit array, all zeros. Nothing has been added yet.

ready

You traded exactness for size, so a few hundred KB of bits can stand in for millions of items. False positives are tolerable because they just trigger a recheck against the real data. False negatives never happen, because we never clear bits we set. Burton Bloom's 1970 paper has the math.

What the hash function actually does

The "hash it a few times" part is small. Pick three independent hash functions, feed them the item, take each result mod the bit-array size. Those three numbers are the bit positions to flip. The same three positions get probed on lookup, which is why the operation is symmetric.

Input

function hashes(item) {
  const h1 = fnv1a(item)   % 16;
  const h2 = djb2(item)    % 16;
  const h3 = murmur3(item) % 16;
  return [h1, h2, h3];
}

hashes("alice"); // [2, 7, 11]

item"alice"

fnv1a

...

djb2

...

murmur3

...

We need three independent bit positions for "alice". Start by feeding the name into the first hash.

From a bloom filter to a bloom index

The visualization above is one bloom filter over a set of names. A Postgres bloom index applies the same trick row by row: for each row in the table, Postgres computes one small bloom signature over the indexed columns and stores it. The index is a stack of those per-row signatures, the same way a B-tree is a stack of (value, pointer) entries, except each entry here is a bit signature instead of an exact value.

When you query WHERE tenant_id = $1 AND region = $2, Postgres hashes the filter values, scans the signatures looking for ones whose bits cover those hashes, then rechecks each candidate against the heap to drop false positives.

The shape of the change in your schema is small: enable the bloom extension once, and create a single index that covers every column you might filter on.

CREATE EXTENSION IF NOT EXISTS bloom;

CREATE INDEX cache_bloom_idx ON cache_entries
  USING bloom (
    tenant_id, user_id, endpoint,
    locale, region, api_version
  );

Any subset of those six columns can use the index for equality lookups. You don't have to anticipate every combination up front, and your SELECT statements stay exactly the same. The planner just reaches for the bloom index instead of one or two of the B-trees.

The wide-table problem

The shape of table where a bloom index pays off is the wide one: many columns, queried by many different filter combinations. A response cache is a clean example:

CREATE TABLE cache_entries (
  id          BIGSERIAL PRIMARY KEY,
  tenant_id   TEXT,
  user_id     TEXT,
  endpoint    TEXT,
  locale      TEXT,
  region      TEXT,
  api_version INT,
  payload     JSONB
);

Different code paths look up entries differently:

WHERE tenant_id = $1 AND endpoint = $2
WHERE user_id = $1 AND locale = $2
WHERE tenant_id = $1 AND region = $2 AND api_version = $3

The default move is one B-tree per column, plus a few composite indexes for hot paths. That works, but:

Six B-trees on six columns add up in disk usage and write amplification.
A new query that mixes columns differently might still miss the indexes you have.
You end up maintaining indexes for combinations you never knew the app would ask for.

A quick refresher on B-trees

The demo in the next section compares one bloom index against six B-trees on the same table, so it helps to know what the baseline is doing.

When you write CREATE INDEX … ON cache_entries (tenant_id) without specifying a type, Postgres builds a B-tree. A B-tree is a sorted tree of (column_value, row_pointer) entries. To answer WHERE tenant_id = 't42', Postgres descends the tree from the root, finds the leaf page that holds 't42', and follows the pointers to the matching rows.

Start at the root

function lookup(tree, key) {
  let node = tree.root;
  while (!node.isLeaf) {
    const child = node.childFor(key);
    node = child;
  }
  return node.findEntry(key)?.rowPointer;
}

lookup(idx, "t42"); // -> row 142

idle

root

t20t50

leaf 0

t05row 105

t12row 112

t18row 118

leaf 1

t30row 130

t40row 140

t42row 142

leaf 2

t60row 160

t75row 175

t90row 190

A B-tree is a sorted tree of (key, pointer) entries. Lookups always start at the root.

That shape is why B-trees do so much:

Exact lookups are O(log n).
Range and prefix queries (<, BETWEEN, LIKE 'foo%') work because entries are kept sorted.
The result can flow straight into ORDER BY without an extra sort.

But the cost grows with the number of indexes:

Each B-tree stores its own copy of the column value plus a row pointer, so disk usage scales with rows × indexes.
Every INSERT and UPDATE has to update every relevant B-tree.
A single B-tree only helps if your WHERE clause hits its leading columns, so wide tables end up with many B-trees to cover many query shapes.

Demo: cache lookups

We'll seed a cache_entries table with 10,000 rows, run three realistic lookup queries, and compare:

A. Six B-tree indexes, one per column.
B. One bloom index covering all six columns.

Setup

bun init -y
bun add pg create-db
bun add -d @types/pg

The whole demo is a single index.ts. It provisions a temporary Prisma Postgres database via the programmatic create-db API, seeds the table, then builds each index strategy in turn and times the same three lookups against both.

Run it

Step through what the script does. The left panel highlights the code currently running. The right panel shows the cumulative terminal output and you can click any step to jump between steps.

index.tsStep 1 of 6 · Provision DB

Spin up a temporary Prisma Postgres database with a 1 hour TTL and connect over pg.

index.ts

import { create, isDatabaseSuccess } from "create-db";
import { Client } from "pg";

console.log("Provisioning a temporary Prisma Postgres database (1h TTL)...");
const db = await create({ ttl: "1h" });
if (!isDatabaseSuccess(db)) throw new Error(db.message);
console.log(`   claim URL: ${db.claimUrl}`);

const client = new Client({ connectionString: db.connectionString! });
await client.connect();

await client.query(`CREATE EXTENSION IF NOT EXISTS bloom`);
await client.query(`
  DROP TABLE IF EXISTS cache_entries;
  CREATE TABLE cache_entries (
    id BIGSERIAL PRIMARY KEY,
    tenant_id TEXT, user_id TEXT, endpoint TEXT,
    locale TEXT, region TEXT, api_version INT,
    payload JSONB
  );
`);

console.log(`Seeding ${N.toLocaleString()} rows...`);
for (let start = 0; start < N; start += BATCH) {
  // build BATCH rows mixing tenants / users / endpoints / locales / regions / versions
  await client.query(`INSERT INTO cache_entries (...) VALUES ${placeholders}`, params);
}
await client.query(`ANALYZE cache_entries`);

// A. Six B-tree indexes (one per column)
for (const c of COLS) {
  await client.query(`CREATE INDEX btree_${c} ON cache_entries (${c})`);
}
const btreeMB = await totalIndexMB();
const btreeMs = await runLookups();

// B. One bloom index covering all six columns
await client.query(`
  CREATE INDEX cache_bloom_idx ON cache_entries
  USING bloom (${COLS.join(", ")})
`);
const bloomMB = await totalIndexMB();
const bloomMs = await runLookups();

const shrink = ((1 - bloomMB / btreeMB) * 100).toFixed(0);
console.log(
  `Bloom index is ${shrink}% smaller ` +
  `(${bloomMB.toFixed(1)} MB vs ${btreeMB.toFixed(1)} MB)`,
);

terminal output

Provisioning a temporary Prisma Postgres database (1h TTL)...

claim URL: https://create-db.prisma.io/claim?projectID=...

Creating cache_entries table and enabling bloom extension...

Seeding 10,000 rows...

seeded in 1.2s

A. Six B-tree indexes (one per column)...

index size: 0.5 MB

3 lookups: 306.5 ms

B. One bloom index (all six columns)...

index size: 0.2 MB

3 lookups: 302.7 ms

Bloom index is 60% smaller (0.2 MB vs 0.5 MB),

and one index covers any subset of those six columns.

The bloom index covers the same three lookups with one index that is roughly 2.5x smaller than the stack of six B-trees. Every write updates one index instead of six.

The bloom index is a touch slower per lookup because Postgres rechecks each candidate row against the heap. For a wide table where any column might be a filter and disk is not free, that is usually a good trade.

When to use it

Use the bloom index when:

A table has many columns that get filtered on equality. Wide tables that store one row per resource and get queried by different attributes are the canonical fit.

CREATE TABLE events (
  id           BIGSERIAL PRIMARY KEY,
  tenant_id    TEXT,
  user_id      TEXT,
  device_id    TEXT,
  action       TEXT,
  feature_flag TEXT,
  -- ...
);

Different code paths filter by different subsets of those columns. No single composite index covers them all; a bloom index covers any subset on equality.

SELECT * FROM events WHERE tenant_id = $1 AND action = $2;
SELECT * FROM events WHERE user_id = $1 AND feature_flag = $2;
SELECT * FROM events WHERE tenant_id = $1 AND device_id = $2 AND action = $3;

Storage or write amplification from many B-trees is becoming a problem. Six B-trees on a 10M-row table are not free.

SELECT pg_size_pretty(SUM(pg_relation_size(indexrelid)))
FROM pg_index
WHERE indrelid = 'events'::regclass;

A small recheck overhead on reads is acceptable. Bloom indexes fetch a candidate set and re-verify; that costs a few extra heap touches per query.

When not to use it

Skip the bloom index when:

You only ever filter on one or two known columns. A single B-tree on those columns is faster and cheaper.
You need range, prefix, sort, or LIKE queries. Bloom indexes only support equality (=, IN).
Your columns are highly selective and you want index-only scans. B-trees keep the value, so they can answer some queries without touching the heap. Bloom indexes always recheck.
Your table is small enough that sequential scans are already fast. No index pays for itself until the table is big.

Recap

Postgres ships a bloom extension (in core since 9.6, available on Prisma Postgres).
A bloom index stores one bloom signature per row, so one index covers equality lookups on any subset of its columns.
It trades a recheck against the heap and no support for ranges or sorts for a much smaller footprint and a single write per row.
The fit is wide tables with unpredictable filter combinations: caches, lookup tables, audit logs, event streams.