Contents
- Pre-bring-up: From datasheet to expectations
- Power, clocks, and register checks that prevent the common P1 delays
- Incremental driver bring-up and minimal firmware patterns
- Validation strategies: test vectors, CI pipelines, and regression control
- Practical application: step-by-step bring-up checklist
Pre-bring-up: From datasheet to expectations
Before you solder or power anything, translate the schematic and BOM into a short, concrete list of expectations you can hand to the bench.
- Create a concise expectations document (one page) that answers: which UART will provide boot logs, which PMIC rails are required for CPU core/IO/PHY, which chip-selects or strap pins define boot mode, and what oscillator(s)/PLLs must lock first. Get these answers from the datasheet and the PMIC reference design.
- Run a BOM sanity pass: confirm package variants, voltage ranges, and boot-critical alternate parts (e.g., a 1.8 V vs 1.71 V regulator replacement can change POR behavior). Add expected power-good (PG) signals and which PG you’ll use to hold reset. Use the PMIC datasheet to identify
POWER_GOOD/RESETpins. - Identify debug access early: JTAG / SWD pinout, a usable UART brought to the board edge, and accessible I2C/SPI test points. If any of these are missing in hardware, escalate immediately — adding them later costs days, not hours.
- Extract a minimal register map from the datasheet: base addresses, reset values, and reserved bits. Put the first 8–12 registers into a spreadsheet column with expected reset and acceptable range columns so bench checks are binary: pass/fail.
- Agree a definition of “P0 / P1 / P2” success states with the project: e.g., P0 = CPU comes out of reset and prints UART bootloader banner; P1 = kernel boots to prompt and enumerates basic buses; P2 = device driver functional. Use those success states to scope what you test first.
Important: The checklist above prevents the single largest class of bring-up delays: misaligned expectations between hardware, firmware, and software teams.
Power, clocks, and register checks that prevent the common P1 delays
Most first-failures are power or clock related. Take an engineer’s approach: measure, don’t guess.
- Verify power rails in order. Confirm each regulator’s startup voltage, ramp time, and power-good sequencing from the PMIC/SoC documentation. Check for absolute-maximum differential constraints between rails during ramp (some processors forbid certain voltage differences during power up). Use the PMIC evaluation manual or the SoC reference manual to find these numbers.
- Use a current-limited bench supply set slightly above expected quiescent current for the first power-up. That limits damage and helps reveal short circuits quickly.
- Validate oscillator/clock trees early: check crystal drive circuits and PLL lock indicators (if available). If the SoC requires a stable reference clock for SDRAM/PLL, the board will not reach P0 without it.
- Connect a serial console (hardware UART) to the designated debug UART and confirm boot ROM / bootloader activity before trying kernel-level bring-up. Bootloaders frequently give the first clues about strap pin and boot source mis-configuration.
- Register validation pattern:
- Read reset values of the first mapped register window and compare to datasheet values.
0xFFFFFFFFfrom reads often means an unpowered rail, wrong MMIO base, or bus not enabled. - Check control registers for clock enable and reset de-assert bits before enabling DMA or interrupts.
- Confirm ID or revision registers early to verify you’re talking to the right silicon.
- Read reset values of the first mapped register window and compare to datasheet values.
Example: quick MMIO read in Python (run as root; use with care):
# mmio_read.py — read a 32-bit value from physical address
import mmap, os, struct, sys
BASE = 0x40000000 # change to your device
OFFSET = 0x0
LENGTH = 0x1000
fd = os.open("/dev/mem", os.O_RDONLY)
mm = mmap.mmap(fd, LENGTH, prot=mmap.PROT_READ, flags=mmap.MAP_SHARED, offset=BASE)
val = struct.unpack_from("<I", mm, OFFSET)
print("0x%08x" % val)
mm.close()
os.close(fd)
Caution:
mmap//dev/memand direct register pokes bypass kernel protection and can hang or brick a board. Prefer regulated bench voltages and JTAG when possible. Use these tools only for early validation and under bench supervision.
- Use a logic analyzer to validate clock/alignment and bus-level toggles. Decode the physical protocol (SPI, I2C, UART) and verify ACK/NAK, CS timing, and CPOL/CPHA settings. The Saleae guides show practical steps to decode SPI/I2C captures and common alignment issues; the open Sigrok ecosystem provides low-cost capture and scripting for automation.
Incremental driver bring-up and minimal firmware patterns
Bring drivers up in tiny, verifiable increments. The right step order reduces the blast radius of bugs.
- Start in userspace first:
- Use
i2c-tools(i2cdetect,i2cget,i2cset),spidevtest programs, or a small userspace app to assert basic read/write and interrupt lines. Userspace tests give fast feedback without the complexity of driver probe ordering.
- Use
- Minimal firmware / bootloader pattern:
- Ship a minimal bootloader or a small bring-up firmware that: holds the device reset line asserted until all PMIC rails are stable; configures clocks to known-good defaults; provides a serial console; and leaves peripherals in a conservative (powered-down) state. The bare-minimum boot guides show why having this minimal control shortens the software bring-up window.
- Where possible, disable aggressive power saving or boot-time runtime configuration in the bootloader so the kernel sees consistent hardware states.
- Incremental kernel integration:
- Create a tiny kernel probe that
ioremap/readlthe device’s ID/revision register and prints its contents inprobe()— confirm mapping and interrupt routing before allocating IRQs or enabling DMA. This follows the kernel device modelprobe()contract. - Move functionality into the kernel in small steps: register mapping → clock/regulator enable → reset de-assert → basic interrupts → DMA tx/rx → full feature set.
- Use
-EPROBE_DEFERinprobe()when you depend on other drivers (clocks, regulators, PHYs) to delay binding until the resources are present. This avoids fragile ordering bugs.
- Create a tiny kernel probe that
Minimal platform_driver skeleton (drop-in starting point):
// minimal_probe.c (skeleton)
#include <linux/module.h>
#include <linux/platform_device.h>
#include <linux/io.h>
#include <linux/of.h>
struct mydev { void __iomem *regs; };
static int my_probe(struct platform_device *pdev)
{
struct resource *res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
struct mydev *m;
m = devm_kzalloc(&pdev->dev, sizeof(*m), GFP_KERNEL);
if (!m) return -ENOMEM;
m->regs = devm_ioremap_resource(&pdev->dev, res);
if (IS_ERR(m->regs)) return PTR_ERR(m->regs);
dev_info(&pdev->dev, "REG0 = 0x%08x\n", readl(m->regs + 0x0));
platform_set_drvdata(pdev, m);
return 0;
}
static struct platform_driver my_driver = {
.probe = my_probe,
.driver = {
.name = "acme,mydevice",
.of_match_table = of_match_ptr((struct of_device_id[]) {
{ .compatible = "acme,mydevice" }, { /* sentinel */ }
}),
},
};
module_platform_driver(my_driver);
MODULE_LICENSE("GPL");
- Build test-only userspace utilities that mirror driver operations (e.g., a small spidev-based loopback tester, or a DMA injector) so failing kernel behavior can be reproduced in userspace and captured in a logic analyzer or oscilloscope trace. Bootlin’s experience with developing standalone testing tools for VPU bring-up is a good example of how userspace harnesses drastically reduce kernel debugging time.
Validation strategies: test vectors, CI pipelines, and regression control
Hardening drivers is about repeatability: deterministic test vectors, automated runs, and a hardware-backed CI.
- Test vector taxonomy (use all four types):
- Functional vectors: nominal transactions that exercise happy-path (read ID, init sequence, mode change).
- Edge vectors: clock jitter, stray CS edges, unaligned transfers, maximal payload sizes.
- Stress vectors: sustained DMA transfers, interrupt floods (start low, ramp), thermal/power cycling.
- Negative vectors: bus NACK/timeout, corrupted payload, incomplete transactions.
- Example low-level register vectors (pattern list):
- Walk-one: 0x00000001, 0x00000002, ...
- Walk-zero: inverse.
- Alternating: 0xAAAAAAAA, 0x55555555.
- Burst fill: repeating 64KB known pattern followed by read-back validate.
- Automate with the right kernel frameworks:
-
Unit tests: write
KUnittests for pure logic in your driver (state machines, register bit-decoding) so you can exercise code in UML or headless builds quickly. KUnit is a fast unit testing framework for kernel logic. -
Selftests / integration: add
kselftesttests undertools/testing/selftests/for userspace-or-kernel interactions that require a real kernel. - System/regression suites: run LTP-style stress and regression tests to catch regressions under load.
- Hardware CI: push validated builds to a hardware-backed CI such as KernelCI to catch regressions across kernels and boards at scale. KernelCI standardizes hardware testing for the upstream kernel.
-
Unit tests: write
- CI practical pattern:
- Run
kunit.pyas a fast pre-merge gate for logic changes. Commit KUnit tests with your driver so they travel with the code. - Gate hardware-in-the-loop testing on a submit queue that runs longer battery tests (nightly), and run fast unit tests in PR checks. Use KernelCI or a self-hosted lab for hardware runs.
- Run
- Maintain a reproducible test fixture description: board id, kernel commit, bootloader version, PMIC firmware, and serial logs attached to test results. Save the logic analyzer capture that corresponds to a failing test to a trace archive; name it by test-case ID and kernel revision.
Markdown table: comparing quick test types
| Test level | What it proves | When to run |
|---|---|---|
| KUnit | Logic correctness, bitfields, small state machines | Pre-merge, fast |
| kselftest | Kernel <-> userspace interactions | CI per-commit on emulated/hardware runners |
| LTP | System stability, IO stress | Nightly / release candidates |
| KernelCI | Cross-kernel hardware regression | Continuous hardware lab runs |
Practical application: step-by-step bring-up checklist
A compact, ordered checklist you can paste into a ticket and follow.
- Paperwork & access (Day 0)
- Confirm the BOM, PCB revision, and who signed off the gerbers.
- Confirm JTAG/SWD and UART test points exist and are accessible.
- Pre-power checks (30–60 minutes)
- Verify soldering quality, shorts with DMM, correct polarity on rails and connectors.
- Power rails check: set bench PSU to expected voltage, current limit ~1.5× expected idle.
- First power-up (P0, ~1–2 hours)
- Power the board; watch current; connect UART at
115200 8N1(or the board’s documented baud). - Confirm boot ROM / bootloader banner. Capture full boot output.
- If no UART output: measure core/reference clocks and PG signals; try holding CPU in reset and probing I2C for PMIC presence.
- Capture logic analyzer traces on boot-critical lines (reset, SCL/SDA, SPI CLK/CS) for later correlation.
- Power the board; watch current; connect UART at
- Basic hardware checks (P1, next day)
- Verify ID registers and device revision values against datasheet via the minimal kernel probe or userspace MMIO read.
- Validate clock PLLs and oscillator lock states.
- Enable and test each peripheral bus in isolation (I2C then SPI then USB, etc).
- Minimal driver integration (P1 → P2)
- Add minimal
probe()that maps registers and prints a few key values (ID, STATUS). - Wire up regulator/clock consumer calls in the driver; de-assert reset last.
- Add interrupt handling but keep handler minimal (ack and log).
- Add minimal
- Tests and validation (ongoing)
- Run functional vectors, edge and stress vectors. Save logs + LA captures to artifact storage.
- Add failing cases as regression tests and include them in nightly CI (kunit/kselftest/LTP as appropriate).
- Pre-release (stability)
- Run long-duration stress tests (hours) on KernelCI/self-hosted lab.
- Verify regression test pass-rate across kernel versions you support.
Small CI example (job snippet):
# .github/workflows/kunit.yml (illustrative)
name: KUnit quick-run
on: [pull_request]
jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build kernel (partial)
run: make -j$(nproc) all
- name: Run KUnit
run: ./tools/testing/kunit/kunit.py run
Run fast checks in PRs and offload long tests to nightly hardware runners. KernelCI provides the model and community infra for hardware-backed regression.
Sources
Device Drivers — The Linux Kernel documentation - Kernel device model, probe() semantics, sync_state() and driver registration guidance used to build incremental driver steps and the minimal platform_driver pattern.
Linux and the Devicetree — The Linux Kernel documentation - How the kernel uses device tree, recommendations for minimal DT usage during board bring-up and structuring board-vs-soc bindings.
Board Bring Up Considerations — Intel documentation - Practical recommendations for power sequencing, boot UART visibility, and board-level bring-up sequences.
SPI Analyzer - User Guide | Saleae Support - Practical guidance for capturing and decoding SPI with a logic analyzer and common alignment issues.
I2C Analyzer - User Guide | Saleae Support - I2C decoding best-practices and common noise/ACK issues to check during register validation.
KUnit — KUnit documentation - Unit testing framework for kernel logic; recommended approach for fast pre-merge tests and how to run kunit.py.
KernelCI Foundation - Community hardware-backed CI for testing kernels and catching driver regressions across platform/board combinations.
Bootlin: Wrapping up the Allwinner VPU crowdfunded Linux driver work - Example of developing standalone userspace test tools (v4l2-request-test) and using register dumps to drive kernel driver development.
OSD335x Bare Minimum Board Boot Process | Octavo Systems - Practical guidance for minimal boot circuitry and why a small bring-up firmware helps hardware validation.
Getting started with a logic analyzer - Sigrok - Open-source logic analyzer tooling (PulseView / sigrok) for capture, decode, and scripting in bring-up workflows.
Linux Test Project — LTP documentation - System-level kernel and system regression suites for long-running stress and conformance testing.












