If you run a Markdown-based blog long enough, the frontmatter starts accumulating rules. "A reviews post must carry an ad disclosure." "FAQ questions in JSON-LD must also appear in the body." Eventually a README check-list isn't enough — you forget.
Astro Content Collections plus Zod lets you push most of those rules into build failures. .refine() couples two fields, nested z.object types your structured data, and a violation gets caught at astro build.
This post is the code-first version of the setup I use on aulvem.com. The longer version with operational notes is linked at the end.
Minimal setup: defineCollection + z.object
// src/content.config.ts
import { defineCollection, z } from "astro:content";
import { glob } from "astro/loaders";
const blog = defineCollection({
loader: glob({
pattern: "**/[^_]*.{md,mdx}",
base: "./src/content/blog",
}),
schema: z.object({
title: z.string(),
description: z.string(),
pubDate: z.coerce.date(),
category: z.enum(["build", "reviews"]),
tags: z.array(z.string()).default([]),
draft: z.boolean().default(false),
affiliate: z.boolean().default(false),
}),
});
export const collections = { blog };
Four moves cover most of the surface:
-
z.enumpins the category to a fixed set — typos break the build -
z.coerce.datereads2026-05-23as aDate -
.default(false)makes the field omissible at the YAML side -
z.array(z.string())and other composites work as-is
This is straight out of the Astro 5 docs. The interesting work starts with .refine().
.refine() for "two fields must move together"
When two fields are coupled — change one, the other must follow — .refine() at the end of the schema is the right shape. Aulvem's case: category: reviews posts must have affiliate: true so the disclosure banner and rel="sponsored" injection both kick in.
const blog = defineCollection({
loader: glob({ /* ... */ }),
schema: z
.object({
title: z.string(),
category: z.enum(["build", "reviews"]),
affiliate: z.boolean().default(false),
// ...
})
.refine((data) => (data.category === "reviews") === data.affiliate, {
message: "affiliate must be true iff category is 'reviews'",
path: ["affiliate"],
}),
});
(A === B) === affiliate reads as "these two are always equal" — same logic as XOR, easier to scan months later.
Build error from a reviews post that forgot affiliate: true:
[ContentEntryInvalidError] Content config error in `blog → 2026-05-...`:
affiliate must be true iff category is 'reviews'
at affiliate
message lands in the output verbatim, so it's worth writing it as instructions for future-you.
.refine vs .superRefine
When you need more than one independent constraint on an object — or per-field error messages — .superRefine is easier:
.superRefine((data, ctx) => {
if (data.category === "reviews" && !data.affiliate) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
message: "reviews posts must set affiliate: true",
path: ["affiliate"],
});
}
if (data.draft && data.updatedDate) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
message: "draft posts should not carry updatedDate",
path: ["updatedDate"],
});
}
})
For a single relationship between two fields, .refine() stays lighter.
Typed structured data in frontmatter
HowTo and FAQPage JSON-LD blocks pull their data from frontmatter rather than from parsed body text. The reasons:
- Frontmatter is what Zod validates, so the shape is enforced for free
- A heading rename doesn't quietly break JSON-LD
- The JSON-LD generator can trust frontmatter without re-parsing MDX
Schema:
howto: z
.object({
name: z.string().optional(),
description: z.string().optional(),
totalTime: z.string().optional(),
steps: z.array(
z.object({
name: z.string(),
text: z.string(),
image: z.string().optional(),
}),
),
})
.optional(),
faq: z
.array(
z.object({
question: z.string(),
answer: z.string(),
}),
)
.optional(),
YAML side:
---
title: "Astro Content Collections tips"
faq:
- question: "When do you reach for .superRefine over .refine?"
answer: "When one object needs more than one independent constraint..."
- question: "What breaks when the schema changes?"
answer: "Every existing post — by design..."
---
A howto with zero steps, or a faq entry missing answer, fails the build.
What Zod can't reach
Zod only inspects frontmatter — the body MDX is outside its scope.
Google's quality guidelines flag JSON-LD without body counterparts as structured-data mismatch and pull the rich-result eligibility. A post with frontmatter FAQs that never appear in the body passes the schema and silently disqualifies itself.
The fix is a separate layer. A small grep-based validator covers it:
import { readFile } from "node:fs/promises";
import { parse as parseYaml } from "yaml";
const raw = await readFile(path, "utf8");
const m = /^---\r?\n([\s\S]*?)\r?\n---\r?\n([\s\S]*)$/.exec(raw);
if (!m) process.exit(0);
const data = parseYaml(m[1]);
const body = m[2].replace(/\s+/g, " ").toLowerCase();
const mismatches = [];
if (Array.isArray(data.faq)) {
for (const [i, q] of data.faq.entries()) {
const needle = q.question.replace(/\s+/g, " ").toLowerCase();
if (!body.includes(needle)) {
mismatches.push(`faq[${i}].question not in body: "${q.question}"`);
}
}
}
if (mismatches.length) {
for (const e of mismatches) console.error(e);
process.exit(1);
}
It's substring presence only. The script doesn't catch a wrong answer under the right question — that's a review-time concern.
The three-layer split
Once you split rules across three layers, "where should this rule live?" becomes answerable:
| Layer | Fires at | Catches | Misses |
|---|---|---|---|
| Zod schema | astro build |
types, enums, required/optional, field relations | meaning, body parity |
| Lint script | pre-commit, CI | banned phrases, substring parity | meaning |
| Review | pre-publish | meaning, judgment calls | not automatable |
Rule of thumb: if a higher layer can catch it, don't push it down.
The full operational notes — the failure modes I keep an eye on, the disclosure-strength judgments, the decision history of why some rules stay in review — live on Aulvem → Pushing operational rules into Astro Content Collections with Zod








