What 25,000 URLs reveal about citations

admin 3 hours ago

0 0 4 minutes read

Large-scale linguistic models (LLMs) excel at synthesizing large amounts of information into personalized responses to simple language commands. These answers draw from large training datasets and are often developed through Internet searches. The fastest way to influence what LLMs say about your product is to influence the content they find through that search.

At Evertune Research, we use the Evertune AI marketing platform to track hundreds of brands across 250 categories across all major LLMs. This gives us a clear understanding of which AI models tend to quote from, especially when users ask for product or product recommendations across industries.

For this analysis, we reviewed the 6,000 most cited URLs for each model across ChatGPT, Copilot, Gemini, Google AI Mode, Google AI Overview, and Perplexity for March and April. We found that these models share an important behavior: they mostly cite listicles.

Part of LLM’s most cited URLs are lists

Of the nearly 25,000 unique URLs we reviewed, half were listings. Of the nearly 400 million citations across all models, 63% identified listicles.

Listicles have many qualities that make them ideal for modeling.

They focus on a single topic, such as “best laptops for gamers,” which makes them more relevant to user information.
Their structured format also makes the content easy for models to analyze and reproduce.
For product-related questions, listicles does a lot of work for LLMs by comparing products directly on features, price, materials, and more—a ChatGPT format that’s now prominently featured in its shopping widget.

Listicles were featured on all the models we reviewed. They account for 40–65% of the most cited URLs, with Copilot at the low end and Gemini at the top.

Most of the lists in our analysis included ranked lists, such as “Top 5 CRM Tools.” Depending on the model, these make up 71% to 86% of listles. Uncategorized lists, such as “7 Ways to Save on Shopping,” were just a second. Institutional ranking (eg, data-heavy lists such as US News & World Report’s Best Colleges rankings) accounted for only 1.4% to 4.7% of rankings.

Corporations, earned media, and owned domains were the top sources of listings in our analysis. It’s worth noting, however, that individual pages may contain affiliate content even if the broader site does not.

For example, Forbes.com is an earned media site, but it also includes affiliate segments such as Forbes Advisor and Forbes Vetted. It ranks among the top three sources in all models of our URL dataset list.

A word of warning before making listings the basis of a GEO strategy: Google has already signaled its intention to crack promotional listings. Simply ranking your No. 1 product and competitors may also run afoul of a Federal Trade Commission rule that “prohibits a business from misrepresenting that a website or an entity it controls provides independent reviews or opinions about a class of products or services that include its products or services,” among other prohibitions.

URLs thrive in many models

We reviewed the 6,000 most cited URLs across the six LLMs, which theoretically produced a pool of 36,000 URLs. In fact, the dataset contains about 25,000 unique URLs, as many appeared among the most cited results across multiple models.

Among the models, the three powerful models of Google Gemini – Gemini, AI Mode, and AI Overviews – showed the highest overlap. More than half of the most cited URLs in Google AI mode also appear among the most cited URLs in Google AI Overviews. Gemini similarly shared a large portion of its top-cited URLs in both Google AI Mode and Google AI Overview.

The remaining models also shared multiple URLs with Google AI Mode and Google AI Overviews, although the overlap was minimal. Confusion shared more than 20% of URLs for both models, while ChatGPT shared more than 15% alone.

Given the thousands of model URLs to cite on any topic, that still represents a reasonable overlap. Copilot, by contrast, shared just 4% to 6% of its URLs with any other model.

Very simple model URLs deviate for many reasons, including model training, site resolution permissions and other factors. Traditional SEO that moves content higher in search results, whether it’s bot or human search, also plays a role, especially in Google AI Mode and Google AI Overview.

Page parts of the most cited URLs

Our review of nearly 25,000 highly cited URLs by LLMs found that these pages typically range from 1,000 to 2,000 words, average 18 words per sentence, are linked regularly, and use structured headings (H2s and H3s) throughout.

Copilot prefers very short content, often citing pages with 964 words and 24 paragraphs. Gemini twisted many words, often citing pages with 1,977 words and 53 paragraphs.

While there’s no cookie-cutter formula for success in AI visibility, we’ve found that the most cited pages typically include the following components:

GEO takeaways

Each LLM has its own preferences and characteristics, and GEO’s strong strategy accounts for them. But our analysis of over 25,000 URLs suggests that some GEO best practices can improve brand visibility and experience across models.

All LLMs cite a large volume of highly structured, highly detailed, example vocabulary. Avoid spammy, self-advertising listings that Google penalizes, but otherwise aim to create and appear on the listings where appropriate.
Traditional SEO supports GEO. Pages that perform well in human search results tend to perform well in bot-driven searches. This is especially true for Gemini-based models.
Pay attention to the page layouts that are often cited by the model you want to target. Copilot tends to prefer brevity, while Gemini responds better to expansive content. In general, keep pages under 2,000 words, use regular links, use a solid layout, and include images and lists where appropriate.

The opinions expressed in this article are those of the sponsors. Search Engine Land does not confirm or deny any of the conclusions given above.

admin 3 hours ago

0 0 4 minutes read