How we structured a visa requirements database across 15 languages with ISO standardization, caching layers, and translation pipelines.
Most travel APIs return data in English. That's fine if your users are American or British. It's useless if they're Japanese, Thai, Arabic-speaking, or any of the 5.5 billion people whose first language isn't English.
When a Vietnamese traveler looks up visa requirements for France, they need the answer in Vietnamese — not just the visa type, but the required documents list, the application process, the embassy information, the travel tips. Translating "Valid passport with 6 months minimum validity" into 15 languages isn't a string replacement. It's a content localization problem at scale.
The Orizn Visa API serves 39,585 passport-destination pairs in 15 languages: English, French, Spanish, Portuguese, German, Japanese, Korean, Chinese, Russian, Italian, Arabic, Hindi, Thai, Vietnamese, and Filipino. Here's how the system is structured.
Why these 15 languages
Not random. These 15 cover approximately 75% of the world's internet users by primary language:
- English — lingua franca, baseline
- Chinese — 1.1B speakers, China's visa-free program expanding rapidly
- Spanish — 550M speakers, Latin America is a growing travel market
- Hindi — 600M speakers, Indian passport holders are one of the most visa-restricted populations
- Arabic — 370M speakers, Gulf states have some of the fastest-growing passports
- Portuguese — 260M speakers, Brazil alone has 210M people
- French — 280M speakers, major passport in Africa and Europe
- Japanese, Korean — high-value travel markets, strong outbound tourism
- Russian — 250M speakers, complex visa landscape post-2022
- German, Italian — core EU passports
- Thai, Vietnamese, Filipino — Southeast Asia, where digital nomads concentrate
Each language isn't just a translation — it's a market. A Thai translation means Thai travel bloggers can embed our widgets and Thai developers can build apps with localized visa data.
Data architecture
The core data model is simple:
visa_pair {
passport: ISO 3166-1 alpha-3 (e.g. "FRA")
destination: ISO 3166-1 alpha-3 (e.g. "JPN")
requirement: enum (visa_free | visa_required | e_visa |
visa_on_arrival | eta | no_admission)
visa_free_days: integer | null
verified: boolean
source_url: string
last_updated: timestamp
}
199 passports × 199 destinations = 39,601 theoretical pairs. Some pairs are self-referential (you don't need a visa to visit your own country), bringing the actual count to 39,585.
Each pair has a base record in English. Translations are stored separately:
visa_translation {
passport: ISO3
destination: ISO3
lang: enum (15 values)
description: text
documents: text[]
process: text[]
tips: text[]
}
39,585 pairs × 15 languages = 593,775 translation records. That's the real scale of the system.
The translation pipeline
Raw visa data comes from 136 government portals. Most publish in their national language plus English. Some only publish in their national language.
The pipeline has 5 stages:
1. Extraction — Pull structured visa rules from government sources. This is the hardest part. Every government formats their visa information differently. Some have clean REST APIs. Most have PDF documents or HTML pages with inconsistent formatting.
2. Normalization — Map to the 6 standardized requirement types. A government might say "no visa needed for stays under 90 days" — that maps to visa_free with visa_free_days: 90. Another might say "electronic authorization required prior to travel" — that's eta. The mapping isn't always obvious and edge cases are everywhere.
3. English baseline — Generate the English version with all fields: description, documents_required, process, tips, country_info. This is the canonical record that everything else derives from.
4. Translation — Generate the 14 other language versions. This isn't word-for-word translation. Document names, process steps, and tips need to be culturally adapted. "Apply at the embassy" in Japanese includes the Japanese name of the embassy and Japanese-language application forms. "Proof of sufficient funds" in Arabic needs to reflect local banking norms.
5. Verification — Cross-check against at least 2 independent sources per pair. The verified flag indicates whether the data has been confirmed. Unverified pairs are still returned but flagged — better to have data with a caveat than no data at all.
Caching strategy
With 593K+ records and 15 language variants, caching is critical:
Request flow:
Client → API Gateway → Cache (Redis) → Database
Cache key pattern: visa:{passport}:{destination}:{lang}
The TTL strategy is split by volatility:
-
24 hours for stable pairs — visa types don't change hourly. A
visa_freepair that's been stable for 3 years doesn't need real-time freshness. - 1 hour for recently changed pairs — if Thailand just modified its policy, the cache needs to reflect that quickly.
-
No cache for the
/changesendpoint — it queries the diff table directly. When someone asks "what changed this week?", they need the latest data, not a cached snapshot.
The /check endpoint (quick check, no documents) is cached aggressively — it returns 5 fields. The /visa endpoint (full details with documents, process, tips) has shorter TTLs because embassies can update document requirements at any time.
ISO standardization decisions
We use ISO 3166-1 alpha-3 exclusively. Not alpha-2 (FR, JP), not country names (France, Japan), not IATA codes (CDG, NRT). Three reasons:
- Unambiguous — alpha-3 has no collisions across all 199 countries
-
Universal — same codes work regardless of language (a Japanese developer sends
FRA, notフランス) - Machine-readable — 3 uppercase ASCII characters, trivially validatable with a regex
The API auto-uppercases inputs (fra → FRA) and returns clear error messages for invalid codes:
{
"error": "\"passport\" value \"JP\" is not a valid ISO 3166-1 alpha-3 code. Did you mean \"JPN\"?"
}
This matters especially for MCP and agent usage where the LLM might send lowercase or alpha-2 codes. A clear error message lets the agent self-correct and retry.
What I'd do differently
Start with fewer languages. Launching with 15 simultaneously was ambitious. Starting with 5 (English, Spanish, French, Chinese, Arabic) and adding based on demand would have been faster and let us focus quality on the highest-impact languages first.
Invest in government source monitoring earlier. The hardest operational challenge isn't translation — it's knowing when a government changes its visa policy. Thailand's 60→30 day rollback happened via a cabinet resolution. That's not an RSS feed. Building automated monitoring for 136 government portals is an ongoing project.
Build the diff system from day one. We added the /changes endpoint later. If we'd built temporal versioning into the data model from the start (valid_from, valid_to on every record), the change detection and audit trail features would have been trivial instead of retrofitted.
Try the API
The multilingual response in action:
# English (free, no API key)
curl "https://visa.orizn.app/api/v1/visa/check?passport=JPN&destination=FRA"
# Japanese (needs free API key)
curl -H "x-api-key: YOUR_KEY" \
"https://visa.orizn.app/api/v1/visa?passport=JPN&destination=FRA&lang=ja"
# Arabic
curl -H "x-api-key: YOUR_KEY" \
"https://visa.orizn.app/api/v1/visa?passport=JPN&destination=FRA&lang=ar"
Quick checks are free, no API key needed. Full details in 15 languages with a free key (3,000 req/month) from visa.orizn.app.
SDKs: npm install orizn · pip install orizn · cargo add orizn









