I Built a Coin Ledger That Guilts Customers Into Coming Back 🧇
The Problem We Were Actually Solving
Glazed runs on Next.js + Supabase. Ordering, payments, menu, analytics - all solid. Loyalty was the missing piece, but the real problem wasn't "give people points." It was: how do we get someone who bought one waffle to come back for a second one?
That's a retention problem, not a points problem. And retention problems have a dirty little secret - they respond way better to loss than to gain. "You've earned 10 coins!" is a shrug. "Your ₹47 disappears in 7 days" is a reason to open WhatsApp at 11pm and order a waffle you didn't need.
So before writing a line of SQL, I locked in four rules:
- Coins are money, not points. ₹1 spent = 1 coin, 1 coin = ₹1. No mental math for the customer.
- Loss beats gain. Every message is framed around what's about to vanish, not what could be earned.
- The ledger is the truth. The cached balance is just a cache. If it disagrees with the transaction log, the log wins. Always.
- Rolling expiry, not batch expiry. No "all coins expire Dec 31." Every earn event gets its own 21-day countdown, so there's always something about to expire for somebody.
That last point is the one that shaped the entire schema.
Why an Append-Only Ledger (And Not Just a Balance Column)
My first instinct was a single current_balance integer on the user row. Increment on earn, decrement on redeem. It would've worked for about a week, until the first support ticket: "I had 80 coins yesterday, now I have 12, what happened?"
With a single mutable number, the answer is "I have no idea." There's no history, no audit trail, no way to reconstruct what happened.
So the balance became a cache, and the real system became a coin_ledger table where rows are never updated or deleted - only inserted.
CREATE TYPE coin_event_type AS ENUM (
'earn',
'redeem',
'expire',
'bonus',
'tier_upgrade',
'admin_adjust'
);
CREATE TABLE public.coin_ledger (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
user_id uuid NOT NULL REFERENCES users(id),
order_id uuid REFERENCES orders(id),
event_type coin_event_type NOT NULL,
coins numeric NOT NULL,
expiry_at timestamptz,
source_ledger_id uuid REFERENCES coin_ledger(id),
balance_after numeric NOT NULL,
note text,
created_at timestamptz NOT NULL DEFAULT now(),
CONSTRAINT earn_and_bonus_must_have_expiry
CHECK (event_type NOT IN ('earn', 'bonus') OR expiry_at IS NOT NULL)
);
A few decisions worth calling out:
coinsis signed. Earn and bonus are positive, redeem and expire are negative. Summing the column gives you the balance - for free.source_ledger_idlinks a redeem or expire row back to the earn row it consumed. Full traceability: every coin that ever leaves an account can be traced to the exact order it was earned from.balance_afteris a snapshot taken at insert time, mostly there to make debugging painless.- The
CHECKconstraint physically prevents anyone from inserting an earn row without an expiry date. The database enforces the business rule, not just the app code.
Now that support ticket becomes a one-line query: filter coin_ledger by user_id, and you get a complete, chronological, tamper-evident story of every coin that's ever touched that account. Refunds, disputes, "where did my coins go" - all answerable without guessing.
Rolling Expiry: The PostgreSQL View Doing the Heavy Lifting
Batch expiry ("everything expires Dec 31") is simpler to build, but it's bad psychology. Most of the year, nothing is at stake, so there's no urgency. Then on Dec 31 everyone scrambles at once - or doesn't, and you've just burned a pile of goodwill for nothing.
Rolling expiry fixes this: every earn row carries its own expiry_at, set to 21 days from the order. That means on any given day, someone is close to losing coins, which means the "your coins expire soon" message is always true for a meaningful slice of customers - not a once-a-year blast.
The catch is that "soon" now means computing per-user expiry windows on the fly, across potentially thousands of ledger rows. That's where this view comes in:
CREATE VIEW public.coins_expiring_soon AS
SELECT
cl.user_id,
u.phone_number,
u.name,
SUM(cl.coins) FILTER (WHERE cl.expiry_at <= now() + interval '14 days') AS coins_expiring_14d,
SUM(cl.coins) FILTER (WHERE cl.expiry_at <= now() + interval '7 days') AS coins_expiring_7d,
SUM(cl.coins) FILTER (WHERE cl.expiry_at <= now() + interval '1 day') AS coins_expiring_1d,
MIN(cl.expiry_at) AS earliest_expiry
FROM coin_ledger cl
JOIN users u ON u.id = cl.user_id
WHERE cl.event_type = 'earn'
AND cl.expiry_at > now()
AND cl.coins > 0
GROUP BY cl.user_id, u.phone_number, u.name;
The FILTER clause is doing the trick here. Without it, getting three different time-window aggregates (14-day, 7-day, 1-day) would mean three separate queries - or one query with three CASE WHEN expressions buried inside SUM(), which works but reads badly. FILTER lets you say "sum this column, but only for rows matching this condition" right next to the aggregate, in plain SQL, in a single pass over the table.
It's a regular view, not materialized, on purpose. The dataset is a few thousand users, not a few million, so recomputing it live every time the cron runs costs nothing - and it guarantees the cron is always reading the current truth, never a stale snapshot.
To keep this cheap as the ledger grows, I added two partial indexes scoped to earn rows only, since that's the only event type these expiry queries ever touch:
CREATE INDEX idx_coin_ledger_expiry ON coin_ledger(expiry_at) WHERE event_type = 'earn';
CREATE INDEX idx_coin_ledger_user_expiry ON coin_ledger(user_id, expiry_at) WHERE event_type = 'earn';
redeem, expire, and admin_adjust rows never show up in these indexes at all, so they stay small no matter how large the ledger gets.
The Cron Job: One Route, Four Jobs
Everything ties together in a single daily cron route, hit by an external scheduler (cron-job.org) once a day. Conceptually it does four things in order: expire old coins, find coins about to expire and warn people, find inactive users and nudge them, then fan all those messages out over WhatsApp.
The tricky part is that this is a Vercel serverless function, and serverless functions don't love long-running background work. cron-job.org expects a fast response, but expiring coins for thousands of rows and sending hundreds of messages is not fast.
The fix is Next.js's after():
export async function GET(req: NextRequest) {
const authHeader = req.headers.get("authorization");
const cronSecret = process.env.CRON_SECRET;
if (cronSecret && authHeader !== `Bearer ${cronSecret}`) {
return NextResponse.json({ error: "Unauthorized" }, { status: 401 });
}
// Return 200 immediately, process in background
after(async () => {
// steps 1-4 happen here
});
return NextResponse.json({ success: true, message: "Cron job triggered" });
}
after() lets the function return its 200 response right away - so the cron scheduler is satisfied and moves on - while the actual work keeps executing in the background on Vercel's infrastructure.
The four steps inside it:
- Expire old coins. Find every earn row where
expiry_athas passed andcoins > 0(meaning it hasn't been fully consumed by a redemption), and insert a matching negative expire row. - Query
coins_expiring_soonand build a list of who needs a 7-day warning. - Query inactive users - no earn activity in 10+ days, with a non-zero balance - for a re-engagement nudge.
- Queue everything through QStash, chunked into batches of 20.
Notice step 1 never deletes or updates the original earn row. It just adds an expire row that cancels it out in the sum. The ledger stays append-only even when coins die of old age.
The Dedup Trick: One Upsert, Zero Duplicate Messages
Here's the part I'm most pleased with.
Cron jobs that send messages have an obvious failure mode: what if the job runs twice, or retries after a partial failure? Without protection, customers get the same "your coins are expiring" text three times and start tuning out your brand entirely.
The naive fix is "check if we already sent this, then send if not" - a read, then a write, with a race condition sitting right between them. Two cron invocations close together can both read "not sent yet" and both send.
Instead, there's an idempotency table:
CREATE TABLE public.whatsapp_trigger_log (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
user_id uuid NOT NULL REFERENCES users(id),
trigger_type whatsapp_trigger_type NOT NULL,
dedup_key text NOT NULL UNIQUE,
sent_at timestamptz NOT NULL DEFAULT now(),
metadata jsonb
);
And the whole dedup check becomes one batch upsert:
const { data: insertedLogs } = await supabase
.from("whatsapp_trigger_log")
.upsert(warningsToLog, {
onConflict: "dedup_key",
ignoreDuplicates: true,
})
.select("dedup_key");
// Only messages that were ACTUALLY inserted (not duplicates) get sent
if (insertedLogs) {
for (const log of insertedLogs) {
const msg = warningMap.get(log.dedup_key);
if (msg) {
messagesToQueue.push(msg);
}
}
}
upsert with ignoreDuplicates: true tries to insert every row in the batch, and silently skips any that collide on the unique dedup_key. It then returns only the rows that actually got inserted. Cross-reference those returned keys against the original message map, and you've got your "definitely new, definitely send these" list - computed in a single round trip to the database, no race condition possible, no separate read-then-write.
The dedup_key design is the other half of the trick. Each trigger type builds its key so that "duplicate" means exactly what you want it to mean:
| Trigger | Key Format | Why |
|---|---|---|
| 7-day expiry warning | expiry_warning_7d:{user_id}:{earliest_expiry_date} |
One per user per distinct expiry date |
| Re-engagement | reengagement_10d:{user_id}:{YYYY-MM} |
Max one per user per calendar month |
| Tier upgrade | tier_upgrade:{user_id}:{tier_id} |
Once per tier, ever |
| Birthday bonus | birthday_bonus:{user_id}:{YYYY} |
Once per year |
The database's unique constraint is doing all the deduplication work. No application-level locking, no Redis, no extra infrastructure.
Why QStash Instead of Just Promise.all
For the actual sending, the obvious move is Promise.all over every message and call the WhatsApp gateway directly. I do exactly that - as a fallback. But it's not the primary path, for three reasons.
First, the after() callback still runs inside a Vercel function with a bounded execution window. If a slow day produces 300 expiry warnings, firing 300 concurrent HTTP requests and waiting on all of them risks running past that window - and unlike a normal request, there's no client waiting to retry for you.
Second, plain Promise.all has no retry story. If the WhatsApp gateway hiccups on request #47, that message is just gone.
Third, hammering a third-party API with 300 simultaneous requests is a great way to get rate-limited or banned.
So messages get chunked into batches of 20 and pushed onto Upstash QStash instead:
const chunkSize = 20;
const qstashMessages = [];
for (let i = 0; i < messagesToQueue.length; i += chunkSize) {
const chunk = messagesToQueue.slice(i, i + chunkSize);
qstashMessages.push({
url: workerUrl,
body: {
messages: chunk.map((m) => ({
phoneNumber: m.phoneNumber,
firstName: m.firstName,
coins: m.coins,
days: m.days,
})),
},
headers: { "Content-Type": "application/json" },
});
}
await qstash.batchJSON(qstashMessages);
QStash takes over the delivery problem entirely: it retries failed deliveries automatically, and the batches naturally throttle how hard the WhatsApp gateway gets hit. The cron route's job shrinks down to "figure out who needs a message" - the actual sending becomes someone else's problem, with retries built in.
The Worker Route: Where Signature Verification Earns Its Keep
QStash delivers each batch by POSTing it to a worker route, which is just a normal Next.js API route - except it can't be a normal open API route, because anyone who finds that URL could spam your entire customer list with WhatsApp messages on your dime.
// app/api/admin/loyalty/cron/worker/route.ts
import { verifySignatureAppRouter } from "@upstash/qstash/dist/nextjs";
async function handler(request: Request) {
const { messages } = await request.json();
const results = await Promise.all(
messages.map(async (msg) => {
try {
await axios.get(buildWhatsAppUrl(msg));
return { phoneNumber: msg.phoneNumber, status: "success" };
} catch (waErr: any) {
return { phoneNumber: msg.phoneNumber, status: "failed", error: waErr.message };
}
})
);
return NextResponse.json({ success: true, processed: messages.length, results });
}
export const POST = verifySignatureAppRouter(handler);
verifySignatureAppRouter wraps the handler and checks a cryptographic signature that QStash attaches to every request it makes, using a shared signing key. If the signature doesn't check out, the request never reaches the handler. So even if someone discovers the worker URL, they can't trigger it - only QStash, holding the matching key, can.
Inside the worker itself, Promise.all is exactly the right tool - this isn't the unbounded fan-out from before, it's a pre-sized batch of 20 that QStash handed over specifically because it's a safe size to run concurrently.
FIFO Redemption: Spend the Oldest Coins First
When a customer redeems coins at checkout, which coins actually get spent? With rolling ex
Comments
No comments yet. Start the discussion.