Catching content rule violations at build time with Astro Content Collections + Zod

If you run a Markdown-based blog long enough, the frontmatter starts accumulating rules. "A reviews post must carry an ad disclosure." "FAQ questions in JSON-LD must also appear in the body." Eventually a README check-list isn't enough — you forget.

Astro Content Collections plus Zod lets you push most of those rules into build failures. .refine() couples two fields, nested z.object types your structured data, and a violation gets caught at astro build.

This post is the code-first version of the setup I use on aulvem.com. The longer version with operational notes is linked at the end.

Minimal setup: `defineCollection` + `z.object`

// src/content.config.ts
import { defineCollection, z } from "astro:content";
import { glob } from "astro/loaders";

const blog = defineCollection({
  loader: glob({
    pattern: "**/[^_]*.{md,mdx}",
    base: "./src/content/blog",
  }),
  schema: z.object({
    title: z.string(),
    description: z.string(),
    pubDate: z.coerce.date(),
    category: z.enum(["build", "reviews"]),
    tags: z.array(z.string()).default([]),
    draft: z.boolean().default(false),
    affiliate: z.boolean().default(false),
  }),
});

export const collections = { blog };

Four moves cover most of the surface:

z.enum pins the category to a fixed set — typos break the build
z.coerce.date reads 2026-05-23 as a Date
.default(false) makes the field omissible at the YAML side
z.array(z.string()) and other composites work as-is

This is straight out of the Astro 5 docs. The interesting work starts with .refine().

`.refine()` for "two fields must move together"

When two fields are coupled — change one, the other must follow — .refine() at the end of the schema is the right shape. Aulvem's case: category: reviews posts must have affiliate: true so the disclosure banner and rel="sponsored" injection both kick in.

const blog = defineCollection({
  loader: glob({ /* ... */ }),
  schema: z
    .object({
      title: z.string(),
      category: z.enum(["build", "reviews"]),
      affiliate: z.boolean().default(false),
      // ...
    })
    .refine((data) => (data.category === "reviews") === data.affiliate, {
      message: "affiliate must be true iff category is 'reviews'",
      path: ["affiliate"],
    }),
});

(A === B) === affiliate reads as "these two are always equal" — same logic as XOR, easier to scan months later.

Build error from a reviews post that forgot affiliate: true:

[ContentEntryInvalidError] Content config error in `blog → 2026-05-...`:
affiliate must be true iff category is 'reviews'
  at affiliate

message lands in the output verbatim, so it's worth writing it as instructions for future-you.

`.refine` vs `.superRefine`

When you need more than one independent constraint on an object — or per-field error messages — .superRefine is easier:

.superRefine((data, ctx) => {
  if (data.category === "reviews" && !data.affiliate) {
    ctx.addIssue({
      code: z.ZodIssueCode.custom,
      message: "reviews posts must set affiliate: true",
      path: ["affiliate"],
    });
  }
  if (data.draft && data.updatedDate) {
    ctx.addIssue({
      code: z.ZodIssueCode.custom,
      message: "draft posts should not carry updatedDate",
      path: ["updatedDate"],
    });
  }
})

For a single relationship between two fields, .refine() stays lighter.

Typed structured data in frontmatter

HowTo and FAQPage JSON-LD blocks pull their data from frontmatter rather than from parsed body text. The reasons:

Frontmatter is what Zod validates, so the shape is enforced for free
A heading rename doesn't quietly break JSON-LD
The JSON-LD generator can trust frontmatter without re-parsing MDX

Schema:

howto: z
  .object({
    name: z.string().optional(),
    description: z.string().optional(),
    totalTime: z.string().optional(),
    steps: z.array(
      z.object({
        name: z.string(),
        text: z.string(),
        image: z.string().optional(),
      }),
    ),
  })
  .optional(),
faq: z
  .array(
    z.object({
      question: z.string(),
      answer: z.string(),
    }),
  )
  .optional(),

YAML side:

---
title: "Astro Content Collections tips"
faq:
  - question: "When do you reach for .superRefine over .refine?"
    answer: "When one object needs more than one independent constraint..."
  - question: "What breaks when the schema changes?"
    answer: "Every existing post — by design..."
---

A howto with zero steps, or a faq entry missing answer, fails the build.

What Zod can't reach

Zod only inspects frontmatter — the body MDX is outside its scope.

Google's quality guidelines flag JSON-LD without body counterparts as structured-data mismatch and pull the rich-result eligibility. A post with frontmatter FAQs that never appear in the body passes the schema and silently disqualifies itself.

The fix is a separate layer. A small grep-based validator covers it:

import { readFile } from "node:fs/promises";
import { parse as parseYaml } from "yaml";

const raw = await readFile(path, "utf8");
const m = /^---\r?\n([\s\S]*?)\r?\n---\r?\n([\s\S]*)$/.exec(raw);
if (!m) process.exit(0);

const data = parseYaml(m[1]);
const body = m[2].replace(/\s+/g, " ").toLowerCase();

const mismatches = [];
if (Array.isArray(data.faq)) {
  for (const [i, q] of data.faq.entries()) {
    const needle = q.question.replace(/\s+/g, " ").toLowerCase();
    if (!body.includes(needle)) {
      mismatches.push(`faq[${i}].question not in body: "${q.question}"`);
    }
  }
}

if (mismatches.length) {
  for (const e of mismatches) console.error(e);
  process.exit(1);
}

It's substring presence only. The script doesn't catch a wrong answer under the right question — that's a review-time concern.

The three-layer split

Once you split rules across three layers, "where should this rule live?" becomes answerable:

Layer	Fires at	Catches	Misses
Zod schema	`astro build`	types, enums, required/optional, field relations	meaning, body parity
Lint script	pre-commit, CI	banned phrases, substring parity	meaning
Review	pre-publish	meaning, judgment calls	not automatable

Rule of thumb: if a higher layer can catch it, don't push it down.

The full operational notes — the failure modes I keep an eye on, the disclosure-strength judgments, the decision history of why some rules stay in review — live on Aulvem → Pushing operational rules into Astro Content Collections with Zod

Catching content rule violations at build time with Astro Content Collections + Zod

Minimal setup: `defineCollection` + `z.object`

`.refine()` for "two fields must move together"

`.refine` vs `.superRefine`

Typed structured data in frontmatter

What Zod can't reach

The three-layer split

Tags

Author

Stats

Published

You Might Also Like

Five overlooked packages running my AI directory stack

How I Migrated From Astro 5 to 6 With All My React Islands

Three post-deploy checks I run after every Cloudflare Pages build

Astro + React 19 Islands: Shipping Zero JavaScript Until User Interaction—The CitizenApp Case Study

Three post-deploy checks I run after every Cloudflare Pages build

Three post-deploy checks I run after every Cloudflare Pages build

Catching content rule violations at build time with Astro Content Collections + Zod

Minimal setup: defineCollection + z.object

.refine() for "two fields must move together"

.refine vs .superRefine

Typed structured data in frontmatter

What Zod can't reach

The three-layer split

Tags

Author

Stats

Published

You Might Also Like

Five overlooked packages running my AI directory stack

How I Migrated From Astro 5 to 6 With All My React Islands

Three post-deploy checks I run after every Cloudflare Pages build

Astro + React 19 Islands: Shipping Zero JavaScript Until User Interaction—The CitizenApp Case Study

Three post-deploy checks I run after every Cloudflare Pages build

Three post-deploy checks I run after every Cloudflare Pages build

Minimal setup: `defineCollection` + `z.object`

`.refine()` for "two fields must move together"

`.refine` vs `.superRefine`