License Data Freshness: Real-Time vs Cached API

Every verification API makes a decision that affects your product: serve cached data fast and cheap, or scrape the authoritative source live and pay for it in latency and cost. This tradeoff is not obvious, and getting it wrong in either direction costs you.

Serve everything real-time and you will hit rate limits from state board scrapers, destroy your API response time, and pay for 10x more scrapes than you actually need. Serve everything from cache and you will occasionally serve stale data that reflects a license status that changed two days ago.

This post walks through how we think about this problem at ContractorVerify - when data change rates justify caching, when they do not, and how to build client-side logic that uses the right mode for each situation.

How Often Does License Data Actually Change?

Most engineers assume license data changes frequently because it is important. The reality is different. A contractor license database is structurally stable - the vast majority of records do not change from one day to the next.

We have been collecting change rate data across all 50 state boards since we started building this infrastructure. The numbers are surprisingly stable:

On any given day, roughly 0.3-0.8% of all active licenses in a state change status
Most status changes are expiration events - predictable, because expiration dates are known in advance
Disciplinary suspensions and revocations represent under 0.05% of daily changes - uncommon, but high-consequence when they occur
New license issuances (relevant for contractors who just renewed with a new number) are also rare relative to total database size

What this means in practice: if you verified a contractor's license yesterday and it came back active, there is a 99.2-99.7% probability it is still active today. For most routine checks, serving yesterday's result is nearly as accurate as hitting the live source - and dramatically cheaper and faster.

Status Change Types and Their Urgency

Not all license changes carry the same urgency. Understanding the type distribution matters for deciding your caching strategy:

Change Type	Urgency	Predictability	Frequency
License expiration	Medium	High - date known in advance	Highest
Disciplinary suspension	Critical	None - happens without warning	Low (0.05%/day)
License revocation	Critical	None - final action	Very low
License renewal (reactivation)	Low urgency	Somewhat - tied to renewal cycle	Moderate
New classification added	Low	Contractor-initiated	Low
Bond amount change	Medium	None - carrier-initiated	Low

The pattern here is important: the high-urgency changes (suspension, revocation) are also the least predictable. You cannot cache away from them because there is no advance signal. But they are also the rarest. Expirations - which are predictable and manageable - make up the bulk of actual status changes.

The Case for 24-Hour Cache TTL

A 24-hour TTL for cached license lookups is the right default for most platforms. Here is the reasoning:

State Boards Are Slow Sources

State licensing board websites are not built for machine-readable API consumption. They are public portals designed for occasional human lookups. Response times are consistently slow - our infrastructure data across all 50 states shows:

Fastest state boards (California CSLB, Florida DBPR modern endpoints): 1.5-2.5 seconds per lookup
Mid-tier state boards: 3-5 seconds per lookup
Slowest state boards (some use ASP.NET legacy systems with session-based navigation): 6-12 seconds per lookup

An average real-time lookup takes 3-4 seconds. For a platform doing 10,000 lookups per day across contractors in multiple states, that is 8-11 hours of cumulative blocking time - before you account for the error rate from timeouts.

Rate Limits and IP Blocking Are Real

State board websites notice when a single IP or user agent makes hundreds of requests per day. Responses to this range from passive (CAPTCHA challenges that break scraping) to active (IP blocks, temporary bans). Getting blocked by a state board takes your verification for that state offline until you rotate IPs and potentially appeal the block.

ContractorVerify maintains a distributed scraping infrastructure with state-specific rate throttling precisely to avoid this. But even with that infrastructure, scraping every lookup in real time would push against state board tolerance limits for platforms doing meaningful volume.

A 24-hour cache dramatically reduces scrape volume. If you have 1,000 contractors and do 5 lookups per contractor per day (job checks, re-verification, consumer profile views), that is 5,000 potential scrapes without caching. With a 24-hour cache and a 5% cache miss rate, that drops to 250 actual scrapes. State boards tolerate 250 lookups per day very differently from 5,000.

Expiration Is Predictable - Cache It Accordingly

For the most common status change type - expiration - a 24-hour TTL is actually fine. You know the expiration date from the cached data. Build a pre-expiration monitoring sweep that force-refreshes licenses approaching expiration (within 30 days) on a daily or more frequent cycle. This gives you ahead-of-time warning without needing real-time lookups for everyone.

When to Force a Cache Bypass

A blanket "always use cache" policy is wrong, just like "always scrape live" is wrong. Here are the specific triggers that should bypass the cache and force a fresh lookup:

Job Value Threshold

For any job acceptance above your configured value threshold (typically $5,000), force a real-time check. The incremental latency (2-8 seconds) is acceptable for a high-stakes transaction. The cost of acting on a 23-hour-old cache hit when the contractor's license was suspended this morning is not acceptable.

// Job acceptance handler
async function handleJobAcceptance(contractorId, jobValue) {
  const forceFresh = jobValue >= 5000;

  const result = await contractorVerify.lookup({
    contractor_id: contractorId,
    force_fresh: forceFresh,
    // cache hit returns in < 100ms
    // force_fresh returns in 2-8 seconds
  });

  if (result.status !== 'active') {
    return blockJobAcceptance(contractorId, result);
  }

  return confirmJobAcceptance(contractorId);
}
  

Consumer Complaint Signal

When a consumer files a complaint against a contractor, trigger an immediate fresh lookup. A complaint and a disciplinary action can co-occur - the state board may have already suspended the license in response to the same or a related complaint. Fresh lookup at complaint time catches this quickly.

License Approaching Expiration

When cached data shows expiration_date within 30 days, shift to daily force-fresh lookups for that contractor. Many contractors renew close to their expiration date - daily checks ensure you pick up the renewal quickly rather than waiting for the cache to expire.

Contractor Self-Reported Renewal

When a contractor tells you they just renewed their license, give them a button in their dashboard to trigger a fresh check. This is good UX (contractors do not want to wait 24 hours for their renewed status to reflect) and good data practice (you get fresh data on demand rather than on a fixed schedule).

How ContractorVerify Handles Caching Internally

Our caching layer is not a single global TTL applied uniformly. State boards update their data at different frequencies, and we tune TTLs per state to match:

States with known batch-update schedules (some process renewals overnight and update their database at 2am): We schedule our refresh sweeps to run after their update window.
States with real-time databases (a minority - a few states have modern systems that update instantly): We use shorter TTLs (4-6 hours) because the cost of a fresh lookup is low and data freshness is achievable.
States with weekly batch updates (a few legacy states that process everything in a weekly job): We use longer TTLs (48-72 hours) because fresher data is not available anyway, and more frequent scraping provides no benefit.

The data_timestamp field in every API response tells you when the underlying data was last retrieved from the state board. Use this field when displaying verification information to consumers - "Verified as of March 16, 2026 at 11:42am" is more accurate than implying real-time status.

Performance Impact at Scale

The latency difference between cached and real-time lookups is significant enough to affect your product experience:

Lookup Type	P50 Latency	P95 Latency	Reliability
Cache hit (Redis)	< 100ms	< 150ms	99.9%+
Cache miss, fast state board	1.5 - 2.5s	3 - 4s	99.5%
Cache miss, slow state board	4 - 8s	10 - 15s	98%
Force fresh, any state	2 - 8s	12 - 20s	97%

For routine verification operations (contractor profile loading, background sweep checks), use cached lookups. Sub-100ms responses can be incorporated seamlessly into any page load. A 4-8 second stall waiting for a state board is not.

For transactional gates (job acceptance, onboarding approval), a user waiting 3-5 seconds for a verification check is acceptable if the UI communicates what is happening. Use a loading indicator with a specific message: "Verifying license with state board..." is better than a generic spinner that implies general slowness.

Multi-State Contractors

Contractors who hold licenses in multiple states - common for larger contractors operating across state lines - require independent verification and independent caching per state. A plumber licensed in both Tennessee and Kentucky has two separate license records, two separate expiration dates, and two separate status checks.

Cache each state license independently. Do not aggregate to a single "licensed" status until all required state checks have been completed. When displaying multi-state verification to consumers, be specific: "Licensed in TN (verified Mar 15) and KY (verified Mar 16)" is more useful than a single combined badge.

For platforms operating in multiple states, this also affects your data freshness logic. A contractor's Tennessee license might be in a state that updates daily; their Kentucky license might be in a state with weekly batch updates. The data_timestamp field per state-check result lets you display appropriately calibrated recency information for each.

Cache Invalidation Events Beyond TTL

TTL-based expiration is not your only cache invalidation mechanism. Two additional patterns are worth building:

Nightly Expiration Sweep

Run a nightly batch job that pulls all contractors with expiration_date within 45 days and forces fresh lookups for each. This catches renewals proactively and gets you ahead of the expiration cliff before it hits your 24-hour TTL cycle.

// Nightly sweep - runs at 3am
const expiringSoon = await db.contractors.findWhere({
  expiration_date: { $lte: addDays(now(), 45) },
  status: 'active'
});

for (const contractor of expiringSoon) {
  await contractorVerify.lookup({
    contractor_id: contractor.id,
    force_fresh: true
  });
  // 500ms delay between requests to avoid hammering
  await sleep(500);
}
  

Monitoring Webhook Integration

Some state boards and third-party data providers offer webhook or alert services for license status changes. Where available, subscribe to these and use them as cache invalidation triggers. When a webhook fires indicating a license suspension in your monitored contractor set, immediately invalidate the cache for that contractor and trigger a fresh lookup.

This is not available for all states - most state boards do not offer outbound webhooks - but for high-value contractor categories where you want maximum data freshness, monitoring services can bridge the gap.

Choosing the Right Strategy for Your Platform Type

The right caching posture depends on your platform's risk profile and use case:

High-Frequency Home Services Marketplace

Contractors accepting multiple jobs per day. Use 24-hour cache for routine checks. Force-fresh for jobs above value threshold. Daily nightly sweep for expiring licenses. This keeps your scrape volume manageable while ensuring fresh data at high-stakes decision points.

Low-Frequency Insurance Underwriting Platform

License verification at policy issuance and annual renewal. Force-fresh at every underwriting decision - latency is acceptable, data freshness is critical for actuarial accuracy and E&O exposure. Shorter TTL (4-6 hours) is appropriate given lower total lookup volume.

Property Management Platform

Verification when onboarding a new vendor, then quarterly re-verification. Use force-fresh at onboarding. Use 24-hour cache during quarterly sweeps (running thousands of checks, cache hit rate will be very high since most licenses do not change between quarters). Force-fresh when a specific contractor is flagged for a high-value job or incident.

Background Check / Due Diligence Provider

Your clients are likely requesting a point-in-time verification for a specific business purpose. Force-fresh on every lookup - your clients are paying for authoritative current data, and a cached result may not meet their compliance requirements. Build the latency cost into your SLA and pricing.

Building scrapers for each of these state sources is not a simple engineering problem - our post on building scrapers for 50 state licensing boards covers the infrastructure complexity involved. And if you are deciding between building this yourself versus using an API, the analysis of why manual checks fail at scale applies equally well to DIY scraper infrastructure as it does to human-driven verification.

Summary

The right answer to "real-time vs. cached" is not binary - it is contextual. Use cached data (24-hour TTL default) for routine operations where state board data is structurally stable and the cost of a 24-hour-old result is low. Force fresh lookups for high-stakes transaction gates, expiring licenses, and consumer complaints where stale data carries real consequences.

Build your system to be aware of the difference. A verification layer that serves every lookup the same way - whether it is a background dashboard refresh or a $20,000 job acceptance - is optimizing the wrong thing. The goal is accurate data at the moments that matter, not fresh data everywhere all the time.

Contractor License Data Freshness - Real-Time vs. Cached API Lookups