Go
OTP
Passwordless Auth
Redis Sessions

EduQueue Deep Dive: OTP-Authenticated Operator Console & Append-Only Email Analytics

A platform-engineering deep dive into UIU EduQueue's operator surface — passwordless OTP login backed by Redis-TTL sessions, the append-only EmailEvent schema, on-demand analytics rollups, the campaign builder's draft → scheduled → sending → sent state machine, and the live "pause everything" toggle that runs over Redis pub/sub.

Azraf Al Monzim
56 views
Listen to this post··:··

Companion post to the UIU EduQueue platform overview. This is the long-form version of "how the operator console actually works" and "where the analytics numbers come from."

The operator console is the part of EduQueue I rely on most and the part most users never see. It's also the part where the platform-engineering decisions show up most clearly — every "should this be a YAML setting or a Redis pub/sub message" call gets made in this layer.

This post walks through the auth model, the analytics schema, the on-demand rollups, and the live-control surfaces that turn the dashboard from a viewer into a control plane.


Auth — passwordless, Redis-backed, single-use

There is no password anywhere in the operator console. The full login flow:

1. Operator hits  /admin-dashboard
2. Frontend POSTs /api/eduqueue/admin/request-otp
3. Server generates 6-digit OTP via crypto/rand
4. Server stores  eduqueue_otp:{otp} → "valid"  in Redis (TTL 10m)
5. Server posts the OTP to a Discord webhook  (out-of-band channel)
6. Operator copies OTP into the UI
7. Frontend POSTs /api/eduqueue/admin/verify-otp { otp }
8. Server DELs the OTP key (single-use), generates a session token
9. Server stores  eduqueue_session:{token} → "valid"  in Redis (TTL 24h)
10. Token returned to frontend, stored in localStorage, sent via X-Session-Token

OTP generation lives in eduqueue_admin.go:generateEduQueueOTP. The token is sha256(otp + nanoTimestamp) truncated and base64-encoded.

The constants:

ConstValueWhy
eduQueueOTPPrefix"eduqueue_otp:"Namespacing for safe Redis sharing
eduQueueOTPTTL10 * time.MinuteLong enough for a copy-paste, short enough for security
eduQueueSessionPrefix"eduqueue_session:"Same namespacing rule
eduQueueSessionTTL24 * time.HourOne day of operator work, then re-auth

Why server-side sessions, not JWTs

I deliberately chose Redis sessions over JWTs:

  • Revocation is a DEL. No token denylist, no key-rotation game.
  • TTL is the source of truth. Redis evicts; I don't have to.
  • No secrets in the cookie. The token is opaque.
  • Server-side, not browser-side. A stolen token without DB access is useless.

The trade-off is one Redis round-trip per admin request. For a console that an operator hits a few hundred times per day, that's negligible. JWTs would matter at 100k req/sec. They do not matter at 100 req/sec.

Discord as the OTP channel

OTPs flow through a Discord webhook because:

  1. The channel I check anyway. I'm in Discord all day.
  2. Out of band from email. If email is broken, I can still get into the console to fix it.
  3. Audit trail for free. Every OTP request shows up in #eduqueue-ops with timestamp, requester IP, and the 10-minute warning.

The webhook payload includes the requester IP so I can spot a wrong actor immediately.


The analytics schema — append-only events, on-demand rollups

The hardest decision in the analytics layer was whether to pre-aggregate. I chose not to. Here's the schema and the reasoning.

TableShapeCardinalityRead pattern
EmailMessagePer-send record (UUID, status, OpenCount, ClickCount, CampaignID)1 per recipient × campaignKPI denominators, drill-down
EmailEventAppend-only (MessageID, EventType, OccurredAt)10–100× EmailMessageTime-series, funnel
EmailCampaignAggregate (TotalSent, TotalOpened, TotalClicked, TotalBounced)1 per campaignOverview cards
EmailRateLimitTrackerPer-provider window counterSmall (one row per provider × window)Live operator console

The discipline is: EmailEvent is the only mutating writer for engagement signals. EmailMessage.OpenCount and EmailMessage.ClickCount are denormalized counters maintained by the same handler that writes the event row, in the same transaction. No background job recomputes them. That keeps the read-side fast and the write-side simple.

The query layer

KPIs reduce to grouped SUMs against EmailCampaign:

SELECT
  campaign_id,
  SUM(total_sent)   AS sent,
  SUM(total_opened) AS opens,
  SUM(total_clicked) AS clicks,
  SUM(total_bounced) AS bounces
FROM email_campaigns
WHERE created_at > now() - interval '30 days'
GROUP BY campaign_id;

Drill-down for a single campaign joins EmailMessage to EmailEvent:

SELECT date_trunc('hour', e.occurred_at) AS hour,
       e.event_type,
       count(*) AS n
FROM email_events e
JOIN email_messages m ON m.id = e.message_id
WHERE m.campaign_id = $1
GROUP BY hour, e.event_type
ORDER BY hour;

These are not pre-aggregated. With covering indexes on (campaign_id, occurred_at) and (message_id, event_type), the queries return in tens of milliseconds at current cardinality. The upgrade path when this stops being true is a MATERIALIZED VIEW refreshed by a cron task — not a rewrite.

Why on-demand wins at this scale

Pre-aggregation is the classic "cleverness debt." It costs you:

  1. An extra writer. Now every event hits two places.
  2. A reconciliation problem. If the rollup drifts, which side is right?
  3. Schema rigidity. The rollup encodes today's questions; tomorrow's analytics question requires a backfill.

The append-only schema with on-demand queries trades query latency (which is currently fine) for maximum schema flexibility (which is always valuable). When traffic warrants the trade-off, materialized views are a one-migration upgrade — not a rewrite.


The campaign builder

Custom campaigns share the email pipeline but discriminate on EmailCampaign.Type = "custom". They have their own state machine:

draft  ─► scheduled  ─► sending  ─► sent
  │          │            │           
  ▼          ▼            ▼           
cancel    cancel       cancel        
StateWhat's allowedWhat blocks transition
draftEdit Subject, HTMLBody, TextBody, recipient query
scheduledSet SendAt in the future; worker won't pick up early
sendingThe queue is draining; per-recipient rows being writtenNo edits
sentFrozen for analyticsNo edits, no resend (must clone)

The transitions are gated server-side. The UI never owns campaign state. This is the kind of invariant that seems like overkill until the day a half-ready campaign accidentally fires because a button race condition didn't disable on mouseup.

The template editor

Campaigns and templates use the same EmailTemplate table ((slug, version) unique index, IsActive boolean). Editing a template:

  1. Inserts a new row with Version = current + 1, IsActive = false.
  2. Renders a preview against sample data via POST /api/eduqueue/admin/email-templates/preview-draft.
  3. On approve, flips the new version's IsActive to true and the previous version's to false in a single transaction.

Rollback is a single SQL UPDATE — flip IsActive from N back to N-1. No migration, no redeploy, no editing in production.


The live-control surfaces

The operator console exposes the parts of the platform that are too dangerous for a YAML config and too operationally important to require a redeploy.

CapabilityEndpointEffect
Pause all sendsPOST /api/eduqueue/admin/email/pauseSets email_processor_enabled=false via Redis pub/sub; workers honor on next 30s tick
Send a test routinePOST /api/eduqueue/admin/email/testRenders + sends without touching analytics
Inspect the queueGET /api/eduqueue/admin/email-queuePending / in-flight / failed snapshot
Reset rate limitsDELETE /api/eduqueue/admin/rate-limitsTruncates EmailRateLimitTracker, publishes eduqueue:ratelimit:clear
Live settingsGET/PUT /api/eduqueue/admin/settingsHot-reload via SettingsService + eduqueue:settings:update pub/sub
Block a studentPOST /api/eduqueue/admin/blocked-studentsInserts into EduQueueBlockedStudent; SQL-side filter applied immediately

Why this matters

A console that only displays state is a graveyard. A console that controls state is a platform. Every row in the table above represents a decision I would otherwise have to make by editing code, redeploying, and waiting — at 2am, during exam week, while students are watching.

The pause toggle is the one I use most. It exists because the worst day of EduQueue's life was an accidental triple-send to 4,000 subscribers in 90 seconds. The post-mortem said: needs a kill switch the operator can hit from a phone. So it has one.


The "stuck queue" panel

Asynq's MaxRetry(2) parks failed tasks in an archived state. The console exposes them:

FieldSource
Task typeasynq.TaskInfo.Type
Last errorasynq.TaskInfo.LastErr
Failed atasynq.TaskInfo.LastFailedAt
PayloadDecoded from asynq.TaskInfo.Payload
Replay buttonPOST /api/eduqueue/admin/email-queue/replay/{id}

The replay path is intentionally manual. Auto-retry beyond MaxRetry(2) is the path to infinite loops; manual replay forces me to look at the error first. In platform engineering, a slow path you take consciously beats a fast path you forgot you wired.


What I'd change

Three things on the roadmap for this subsystem:

  1. WebAuthn alongside OTP. OTP-via-Discord works for me; a hardware key would scale to a small ops team without giving anyone my Discord.
  2. Saved analytics views. Right now every drill-down is built from scratch. A SavedQuery table with a few canonical KPIs would speed up daily report-building.
  3. Audit log. Every operator action should write a row I can grep. Currently I rely on Discord history and Redis TTLs — fine for one operator, fragile for two.

None of these are urgent. All become important the day a second person logs in.


Part of the UIU EduQueue platform case study.

Tags:
Go
OTP
Passwordless Auth
Redis Sessions
Analytics
Append-Only
Email Tracking
Campaign Builder
State Machine
Operator Console
Platform Engineering
Deep Dive
EduQueue
Azraf Al Monzim

Written by Azraf Al Monzim

Platform Engineer passionate about building scalable systems and sharing knowledge through writing.