SEO

A technical SEO audit checklist for 2026

SEOany · June 21, 2026 · 7 min read

Technical SEO is the foundation every other tactic sits on: if crawlers can't reach, render, and parse a page, no amount of content or links will rank it. In 2026 the stakes doubled — the same technical signals now decide whether AI engines like ChatGPT and Google's AI Overviews can retrieve and cite you too. This checklist walks the audit in the order problems actually compound, from crawl access to machine-readability. Run it quarterly, or after any migration, redesign, or CMS change.

What does a technical SEO audit actually check in 2026?

A technical SEO audit checks whether search engines and AI crawlers can access, render, and understand your site — not what your content says, but whether machines can reach it. In 2026 it spans four layers: crawlability and indexation, rendering and speed, structured data, and machine-readability for generative engines.

Classic audits stopped at Googlebot; 2026 audits add a second audience. GPTBot, ClaudeBot, PerplexityBot, and Google-Extended crawl the same pages, and each can be blocked or starved independently of Googlebot.

Order matters because failures compound downward. A page that can't be crawled will never be indexed, rendered, scored for speed, or cited — so audit access first and machine-readability last.

  • Crawl & index — can bots fetch the URL, and is it allowed into the index?
  • Render & performance — does the page paint fast and stay stable for real users?
  • Structured data — can machines extract unambiguous facts, not just prose?
  • Machine-readability — can AI retrieval pipelines find and quote the right pages?

Can search engines crawl and index your pages?

Start with access: confirm robots.txt isn't blocking important paths, that each page returns a 200, and that no stray noindex or wrong canonical is quietly removing you from the index. Crawlability and indexability are separate problems — a page can be crawled and still be excluded.

The most common self-inflicted wound is a noindex or mispointed canonical left over from staging. Audit every template's rendered HTML, not just the source, because tags injected by JavaScript are what Google actually reads.

On large sites, wasted crawls are a silent tax. Duplicate URLs, endless faceted filters, and redirect chains burn crawl budget that should reach your money pages — consolidate them and keep XML sitemaps to canonical, 200-status URLs only.

A clean canonical URL strategy is the standard cure for the same content living at many addresses: point parameter pages, print views, and syndicated copies back to one authoritative URL.

  • robots.txt allows every indexable section; disallows only true junk paths.
  • No accidental noindex in the rendered head of important templates.
  • Every canonical is self-referential or points to a live 200 URL.
  • Redirect chains collapsed to a single 301 hop, with no loops.
  • XML sitemap lists only canonical, indexable URLs and is referenced in robots.txt.

Do your Core Web Vitals still pass in 2026?

Pull field data, not lab scores: pass thresholds are LCP ≤ 2.5s, CLS ≤ 0.1, and INP ≤ 200ms on real users at the 75th percentile. INP replaced FID in 2024 and trips up JavaScript-heavy sites hardest, so audit interaction latency, not just load time.

Core Web Vitals are measured on real visitors in the Chrome UX Report, so a green Lighthouse score on your fast laptop can hide a failing experience on mid-range phones. Trust the field data in Search Console over any single lab test.

INP is the metric most sites now fail. It captures the delay between a tap and the screen updating, so heavy third-party scripts and long main-thread tasks hurt it — audit your tag manager and hydration cost before anything else.

Speed is also a retrieval signal for AI engines, which time out on slow pages. A page that takes six seconds to render may simply be skipped by a crawler on a budget.

  • LCP ≤ 2.5s — the largest image or text block paints quickly.
  • CLS ≤ 0.1 — no layout jumps from late-loading images, ads, or fonts.
  • INP ≤ 200ms — taps and clicks respond without lag.
  • Field data (CrUX / Search Console) passing, not just lab scores.

Is your structured data giving machines unambiguous facts?

Structured data turns prose into facts a machine can lift verbatim. Audit that every eligible template carries valid JSON-LD — Article, Product, FAQPage, Organization — with zero errors in Google's Rich Results Test. In 2026 schema doubles as the cleanest way to feed facts to AI engines.

Schema markup is one of the highest-leverage, lowest-risk fixes in modern search: it powers rich results in Google and hands AI engines pre-parsed facts instead of asking them to infer meaning from your paragraphs.

Consistency is the audit's real job. Your Organization schema, footer, and About page should state the same name, logo, and social profiles, because that agreement is how systems resolve your brand as a single entity across the web.

Validate rendered output, not templates. A schema block that is correct in source but broken after JavaScript execution earns nothing — test the live URL.

  • Every template maps to the right type (Article, Product, FAQPage, HowTo, Organization).
  • JSON-LD validates with zero errors in the Rich Results Test.
  • Organization and sameAs facts match your site, socials, and Knowledge Panel.
  • No schema for content that isn't actually visible on the page.

Does your architecture guide crawlers and AI to what matters?

Site architecture decides what gets discovered and how authority flows. Audit for orphan pages with zero internal links, bloated navigation, and important pages buried more than three clicks deep. Descriptive anchor text and a shallow, logical hierarchy help both crawlers and AI retrieval find your best content.

Internal linking is the cheapest ranking lever most sites underuse: links distribute authority, define topical clusters, and give crawlers the shortest path to your priority pages. Every important URL should be reachable from at least two descriptive internal links.

Orphan pages are the classic audit finding — content that exists in the sitemap but is linked from nowhere, so it accrues no authority and is often skipped. A crawl that compares your sitemap against your link graph surfaces them fast.

Anchor text is a relevance signal, not decoration. 'Technical SEO audit checklist' tells a crawler what the target page is about; 'click here' tells it nothing — audit and rewrite generic anchors on high-value links.

  • No orphan pages — every indexable URL has at least two internal links in.
  • Priority pages reachable within three clicks of the homepage.
  • Anchor text is descriptive, not 'click here' or 'read more'.
  • Breadcrumbs present and marked up with BreadcrumbList schema.

Is your site machine-readable for AI search engines?

The 2026 addition to every audit: check that AI crawlers aren't blocked, that your key facts survive extraction, and that you publish an llms.txt manifest. Generative engines cite the clearest self-contained passages — technical hygiene now decides AI visibility as much as classic rankings.

First, confirm you haven't accidentally blocked the crawlers you want. GPTBot, ClaudeBot, PerplexityBot, and Google-Extended all obey robots.txt, so a blanket disallow silently removes you from AI answers — audit these user-agents explicitly.

Publishing a /llms.txt manifest gives language models a curated map of your most important pages; our guide on what llms.txt is covers the format, and it costs minutes to maintain.

Finally, structure content for extraction: self-contained paragraphs, one claim each, with a direct answer under every heading. This is the same discipline that wins featured snippets, and it's how you get cited by ChatGPT, Perplexity, and AI Overviews.

  • AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended) allowed in robots.txt if you want citations.
  • /llms.txt published and pointing at your canonical, high-value pages.
  • Key facts stated in plain text, not locked inside images or scripts.
  • Each section answers one question in a self-contained, quotable paragraph.

Let the agent run this playbook for you

Start free