How SiteGuru Calculates Content Relevance
TL;DR: We convert your keyword cluster and your page into vectors (“embeddings”), compare every page section to the cluster using cosine similarity, and then combine the best-matching sections into one page score. Higher = more on-topic.
What Is an Embedding?
An embedding is a numeric representation of text. Similar ideas map to similar vectors in space. We use the same embedding model for your keyword cluster and for your page sections, so comparisons are apples-to-apples.
From Keywords to a Page Score (Step-by-Step)
- Build the cluster centroid. We embed each keyword and average them into one vector—the centroid—that represents the overall topic.
- Chunk the page into sections. We split the body into coherent sections (structure-aware windows) and embed each section.
- Compare with cosine similarity. After normalizing vectors, we compute the similarity between the centroid and every section (0 = unrelated, 1 = very strong match). In practice, strong sections often land around 40% – 60%; 60%+ is excellent but uncommon.
- Aggregate the best sections. Your Page Relevance is a weighted top-K average of the strongest sections (K ≈ 10 by default). The very best sections get slightly more weight because a few excellent matches matter more than many weak ones.
- Show what helps (and what doesn’t). Reports list sections from most to least relevant and flag a few low-relevance sections you might prune or improve.
How to Read the Numbers
-
Per-section similarity (0–100%)
- 0 - 15%: off-topic
- 15% - 30%: weakly related
- 30% - 45%: on-topic but light
- 45% - 60%: strong alignment
- 60%+: excellent alignment (rare)
- Page Relevance (0–1) — the weighted average of your best sections. Higher usually indicates clearer topical focus and better coverage.
Note: Ranges are guidelines; distributions vary by niche and intent.
Keyword Coverage and “Depth” Suggestions
- Keyword coverage: For each cluster keyword, we find its best-matching section. If similarity is below a calibrated threshold, it’s marked uncovered and shown in Content Gaps.
- Depth opportunities: We find sections in a mid-band (relevant but shallow) that miss high-importance subtopics. Those sections get a “go deeper” recommendation with the top missing subtopics.
Why Two Pages Can Score Differently
- Focus vs. breadth: A tight, focused page can beat a longer but scattered page.
- Structure: Clear headings and cohesive sections help the model recognize the topic.
- Language & consistency: Using the same language and model for keywords and content keeps relevance scores reliable.
How to Improve Your Score (and Rankings)
- Strengthen your best sections with examples, data, or step-by-step guidance.
- Fill Content Gaps: cover uncovered, high-importance subtopics first.
- Add helpful modules where relevant: comparison tables, FAQs, pricing/ROI, checklists, citations.
- Keep sections cohesive: one subtopic per section with clear headings.
FAQ
Is this keyword density?
No. It’s semantic similarity—coverage of ideas, not raw keyword counts. It's much smarter then just looking at specific words, it really understands the text.
Do titles and H1s matter?
Yes. Clear, on-topic headings improve alignment and help readers and models.
Can a single great section rescue a page?
It helps—our aggregation gives extra weight to your strongest sections—but broad, useful coverage still matters.