Search Engines: From PageRank to Perplexity
A comprehensive guide to how search engines work—from Google's PageRank and inverted indices to Perplexity's AI-powered answer engine. Understand the architectures, algorithms, and trade-offs shaping how we find information.
Table of Contents
The Two Eras of Search
For over two decades, search meant one thing: type keywords, get a list of blue links, click through to find your answer. Google perfected this model, building an empire on the simple insight that the web's link structure reveals which pages matter most.
Now we're witnessing a fundamental shift. AI-powered "answer engines" like Perplexity don't return links—they return answers. They synthesize information from multiple sources, cite their work, and engage in conversation. The experience is radically different: instead of hunting through ten tabs, you get a direct response.
But here's what most people miss: AI search engines don't replace classical search—they build on top of it. Perplexity still needs to find relevant documents before it can synthesize answers. Understanding how classical search works illuminates why AI search works the way it does, and why the hybrid approaches emerging today may be the future.
This post takes you through both eras: the elegant algorithms that made web search possible, and the AI systems that are transforming it.
Part I: Classical Search
The Web Search Problem
Before diving into solutions, let's understand the problem. In 2025, the web contains over 200 billion pages. When you type a query like "best restaurants in Tokyo," the search engine must:
- Find pages that might be relevant (out of 200 billion)
- Rank them by quality and relevance (the hard part)
- Return results in under 200 milliseconds
The first web search engines in the 1990s solved problem #1 but struggled with #2. They ranked pages by keyword frequency—if you searched "car," pages that mentioned "car" more often ranked higher. This was easily gamed: stuff keywords into your page, rank higher.
Google's breakthrough was realizing that the web itself contains signals about quality. Links between pages encode human judgments about what's valuable. A page that many other pages link to is probably important. A page linked to by important pages is even more important.
This insight—that link structure reveals quality—became the foundation of modern search.
How Crawling Works
Before you can search the web, you must have a copy of it. Web crawlers (also called spiders or bots) systematically browse the internet, downloading pages and following links to discover new content.
┌─────────────────────────────────────────────────────────────────────────┐
│ THE CRAWLING PROCESS │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ STEP 1: SEED URLS │
│ ───────────────── │
│ Start with a list of known URLs (seed set). These might be major │
│ websites, popular pages, or URLs discovered from sitemaps. │
│ │
│ STEP 2: FETCH AND PARSE │
│ ─────────────────────── │
│ For each URL in the queue: │
│ 1. Download the page (HTTP GET request) │
│ 2. Parse the HTML to extract: │
│ • Page content (text, images, metadata) │
│ • Outgoing links to other pages │
│ 3. Store the page content for indexing │
│ │
│ STEP 3: DISCOVER NEW URLS │
│ ───────────────────────── │
│ Add discovered links to the crawl queue (frontier). │
│ The frontier can grow to billions of URLs. │
│ │
│ STEP 4: REPEAT │
│ ───────────── │
│ Continue crawling, prioritizing: │
│ • Important pages (high PageRank) │
│ • Frequently updated pages (news sites) │
│ • Recently discovered pages │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ POLITENESS AND ROBOTS.TXT: │
│ ────────────────────────── │
│ Crawlers must be "polite"—not overwhelming servers with requests. │
│ They respect robots.txt files that specify what can be crawled. │
│ │
│ Example robots.txt: │
│ User-agent: Googlebot │
│ Disallow: /private/ │
│ Crawl-delay: 10 │
│ │
│ This tells Google's crawler to avoid /private/ and wait 10 seconds │
│ between requests. │
│ │
└─────────────────────────────────────────────────────────────────────────┘
The Scale of Crawling
Google's crawler operates at staggering scale. Consider the numbers:
- Billions of pages to keep track of
- Continuous re-crawling to catch updates (some pages change hourly, others yearly)
- Distributed across thousands of machines worldwide
- Petabytes of raw data downloaded and processed daily
The crawler must make intelligent decisions about where to spend its limited resources. Crawling a news site's homepage every hour makes sense; crawling a static documentation page from 2015 every hour does not. Prioritization algorithms determine crawl frequency based on page importance and update patterns.
Rendering adds another layer of complexity. Modern websites use JavaScript extensively—a page might load empty HTML and populate content via JavaScript. Google's crawler runs a full Chrome browser to render pages, executing JavaScript to see what users actually see. This is computationally expensive but necessary for indexing the modern web.
The Inverted Index: The Heart of Search
Once pages are crawled, they must be organized for fast retrieval. The core data structure is the inverted index—perhaps the most important concept in information retrieval.
Forward Index vs. Inverted Index
Imagine you have three documents:
Document 1: "the cat sat on the mat"
Document 2: "the dog sat on the log"
Document 3: "the cat chased the dog"
A forward index maps documents to words:
Doc 1 → [the, cat, sat, on, the, mat]
Doc 2 → [the, dog, sat, on, the, log]
Doc 3 → [the, cat, chased, the, dog]
This is how we naturally think about documents. But it's terrible for search. To find all documents containing "cat," you'd have to scan every document—impossible at web scale.
An inverted index flips this relationship, mapping words to documents:
the → [Doc 1, Doc 2, Doc 3]
cat → [Doc 1, Doc 3]
sat → [Doc 1, Doc 2]
on → [Doc 1, Doc 2]
mat → [Doc 1]
dog → [Doc 2, Doc 3]
log → [Doc 2]
chased → [Doc 3]
Now finding documents containing "cat" is instant: just look up "cat" in the index and get [Doc 1, Doc 3]. This is random access rather than sequential scan—the difference between O(1) and O(n) when n is 200 billion.
┌─────────────────────────────────────────────────────────────────────────┐
│ INVERTED INDEX STRUCTURE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ BASIC STRUCTURE: │
│ ──────────────── │
│ │
│ Term Dictionary Posting Lists │
│ ┌──────────────┐ ┌─────────────────────────────────┐ │
│ │ apple │───────→│ Doc4, Doc17, Doc203, Doc891, ...│ │
│ ├──────────────┤ ├─────────────────────────────────┤ │
│ │ banana │───────→│ Doc2, Doc45, Doc789, ... │ │
│ ├──────────────┤ ├─────────────────────────────────┤ │
│ │ cherry │───────→│ Doc1, Doc3, Doc99, ... │ │
│ └──────────────┘ └─────────────────────────────────┘ │
│ │
│ The dictionary is sorted alphabetically (or stored in a trie/hash) │
│ for fast term lookup. Each term points to a "posting list" of │
│ documents containing that term. │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ ENHANCED POSTING LISTS: │
│ ─────────────────────── │
│ Real search engines store more than just document IDs: │
│ │
│ "apple" → [ │
│ (Doc4, freq=3, positions=[12, 45, 89]), │
│ (Doc17, freq=1, positions=[234]), │
│ (Doc203, freq=7, positions=[1, 5, 23, 67, 102, 156, 201]), │
│ ... │
│ ] │
│ │
│ Additional data stored: │
│ • Term frequency: How many times the term appears │
│ • Positions: Where in the document (for phrase queries) │
│ • Field information: Title vs body vs URL │
│ • Payloads: Custom scoring data │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ QUERY PROCESSING: │
│ ───────────────── │
│ │
│ Query: "apple pie" │
│ │
│ 1. Look up "apple" → [Doc4, Doc17, Doc203, Doc891] │
│ 2. Look up "pie" → [Doc4, Doc56, Doc203, Doc445] │
│ 3. Intersect lists → [Doc4, Doc203] │
│ 4. Score and rank the matching documents │
│ │
│ For OR queries, union the lists instead of intersecting. │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Text Processing: From Raw Text to Index Terms
Before building the index, text must be processed into standardized tokens. This pipeline is crucial for matching queries to documents:
1. Tokenization: Split text into individual tokens.
- "New York City" → ["New", "York", "City"] or ["New York City"] (depends on strategy)
- Handle punctuation, numbers, special characters
2. Lowercasing: Convert to lowercase for case-insensitive matching.
- "Apple" and "apple" should match the same documents
3. Stop Word Removal: Optionally remove very common words.
- "the", "a", "is", "on" appear in almost every document
- Removing them reduces index size but loses phrase information
- Modern engines often keep stop words for phrase matching
4. Stemming/Lemmatization: Reduce words to their root form.
- "running", "runs", "ran" → "run"
- "better" → "good" (lemmatization uses vocabulary)
- Helps match different word forms
5. Synonyms and Expansion: Optionally expand terms.
- "car" might also match "automobile", "vehicle"
- Usually done at query time, not index time
This processing ensures that a search for "running shoes" matches documents containing "run," "runs," or "running" combined with "shoe" or "shoes."
BM25: The Ranking Algorithm That Survived 30 Years
Once you've found documents containing the query terms, how do you rank them? The dominant algorithm for three decades has been BM25 (Best Matching 25), developed in the 1970s-80s and still the default in Elasticsearch, Solr, and most search systems today.
BM25 improves on simpler TF-IDF by handling two key problems:
┌─────────────────────────────────────────────────────────────────────────┐
│ BM25: THE INTUITION │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ CORE INSIGHT #1: TERM FREQUENCY SATURATION │
│ ────────────────────────────────────────── │
│ │
│ If a document mentions "apple" 100 times, is it 100× more relevant │
│ than one mentioning "apple" once? Probably not. │
│ │
│ Simple TF-IDF: Score increases linearly with frequency │
│ BM25: Score saturates—diminishing returns from repetition │
│ │
│ Relevance │
│ ↑ │
│ │ ____________________ TF-IDF │
│ │ _____/ │
│ │ ____/ │
│ │ ____/ ___________________________ BM25 │
│ │ ____/ _____/ │
│ │____/ _____/ │
│ │ _____/ │
│ └─────────────────────────────────────────────→ Term Frequency │
│ 1 2 5 10 20 50 100 │
│ │
│ BM25's saturation prevents keyword stuffing from gaming rankings. │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ CORE INSIGHT #2: DOCUMENT LENGTH NORMALIZATION │
│ ─────────────────────────────────────────────── │
│ │
│ A 10,000-word document naturally contains more words than a │
│ 500-word document. Without normalization, long documents would │
│ unfairly rank higher simply because they mention more terms. │
│ │
│ BM25 normalizes by document length: │
│ • Documents longer than average are penalized │
│ • Documents shorter than average get a boost │
│ • The parameter b controls how much length matters (typically 0.75) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ CORE INSIGHT #3: INVERSE DOCUMENT FREQUENCY (IDF) │
│ ────────────────────────────────────────────────── │
│ │
│ Rare terms are more informative than common terms. │
│ │
│ • "the" appears in 99% of documents → low IDF, low value │
│ • "quantum" appears in 0.1% of documents → high IDF, high value │
│ │
│ If someone searches "quantum physics," the word "quantum" tells us │
│ much more about what they want than "physics" (which is common). │
│ │
│ IDF = log(N / df) │
│ Where N = total documents, df = documents containing the term │
│ │
└─────────────────────────────────────────────────────────────────────────┘
The BM25 Formula
The complete BM25 formula combines these insights:
Score(D, Q) = Σ IDF(qi) × [ f(qi, D) × (k1 + 1) / (f(qi, D) + k1 × (1 - b + b × |D|/avgdl)) ]
Where:
- Q = query with terms q1, q2, ..., qn
- D = document being scored
- f(qi, D) = frequency of term qi in document D
- |D| = length of document D
- avgdl = average document length in the collection
- k1 = term frequency saturation parameter (typically 1.2)
- b = length normalization parameter (typically 0.75)
The formula looks complex but encodes simple ideas:
- Sum over all query terms
- Weight each term by its IDF (rare terms matter more)
- Apply saturation to term frequency (diminishing returns)
- Normalize by document length
Despite being over 30 years old, BM25 remains remarkably effective. Its simplicity makes it fast, its parameters are well-understood, and it's surprisingly hard to beat with more complex methods for many retrieval tasks.
BM25's Limitation: No Semantic Understanding
BM25's fundamental limitation is that it matches terms, not concepts. It cannot understand that:
- "automobile" and "car" mean the same thing
- "Apple the company" and "apple the fruit" are different
- "not good" has opposite meaning to "good"
- "king - man + woman = queen" (word relationships)
This is why semantic search using embeddings has become important—but BM25 remains valuable for exact matching and is often combined with semantic methods in hybrid approaches.
PageRank: The Algorithm That Changed Search
While BM25 tells you which documents match a query, it doesn't tell you which matching documents are trustworthy or authoritative. A random blog post and the New York Times might both contain the query terms—but the Times article is probably more reliable.
Larry Page and Sergey Brin's insight at Stanford in 1996 was that the web's link structure encodes quality signals. If many pages link to a page, it's probably important. If important pages link to it, it's probably very important.
┌─────────────────────────────────────────────────────────────────────────┐
│ PAGERANK: THE INTUITION │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ THE RANDOM SURFER MODEL: │
│ ───────────────────────── │
│ Imagine a person randomly browsing the web: │
│ │
│ 1. Start on a random page │
│ 2. With probability d (typically 0.85): click a random link │
│ 3. With probability (1-d): jump to a completely random page │
│ 4. Repeat forever │
│ │
│ PageRank = the probability of being on each page after infinite │
│ random browsing. Pages you'd visit more often have higher PageRank. │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ WHY IT WORKS: │
│ ───────────── │
│ │
│ • Pages with many incoming links are visited more often │
│ • Links from high-PageRank pages pass more value │
│ • Pages that link to many sites dilute their vote │
│ │
│ Example: │
│ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │ A │────────→│ B │────────→│ C │ │
│ └─────┘ └─────┘ └─────┘ │
│ │ ↑ ↑ │
│ │ │ │ │
│ └───────────────┴───────────────┘ │
│ │
│ If A has high PageRank, it passes value to both B and C. │
│ B also passes value to C. So C gets value from multiple paths. │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ THE MATH (SIMPLIFIED): │
│ ────────────────────── │
│ │
│ PR(A) = (1-d)/N + d × Σ PR(Ti)/C(Ti) │
│ │
│ Where: │
│ • d = damping factor (0.85) │
│ • N = total number of pages │
│ • Ti = pages that link to A │
│ • C(Ti) = number of outgoing links from Ti │
│ │
│ A page's PageRank is: │
│ • A base amount (1-d)/N that everyone gets │
│ • Plus value passed from pages linking to it │
│ • Divided by how many pages they link to (dilution) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ COMPUTING PAGERANK: │
│ ─────────────────── │
│ The formula is recursive (A's PageRank depends on B's, which │
│ depends on C's, which might depend on A's). │
│ │
│ Solution: Iterative computation │
│ 1. Initialize all PageRanks to 1/N │
│ 2. Apply the formula to update all PageRanks │
│ 3. Repeat until values converge (stop changing) │
│ │
│ This is computing the principal eigenvector of the link matrix. │
│ Convergence is guaranteed by the damping factor (random jumps). │
│ │
└─────────────────────────────────────────────────────────────────────────┘
PageRank in Practice
PageRank transformed search quality, but it also became a target for manipulation. Early "link farms" created networks of pages linking to each other to inflate PageRank. Google's response was continuously evolving:
Link quality evaluation: Not all links are equal. Links from trusted domains (universities, news organizations) carry more weight. Links from known spam sites carry negative weight.
Anchor text analysis: The text of a link provides context. If many pages link to a page with anchor text "digital camera reviews," that page is probably about digital camera reviews—even if it never uses those exact words.
Nofollow and sponsored attributes: Google introduced link attributes to let webmasters indicate paid or untrusted links that shouldn't pass PageRank.
Penguin algorithm: Introduced in 2012 to penalize sites with unnatural link profiles—too many exact-match anchor texts, links from irrelevant sites, etc.
PageRank remains part of Google's algorithm today, as confirmed by Google employees and revealed in the 2024 algorithm leak. However, it's now one of over 200 ranking factors, weighted alongside relevance, freshness, user engagement, and many others.
Query Understanding
Finding and ranking documents is only valuable if you understand what the user wants. Query understanding transforms raw queries into structured search intent.
┌─────────────────────────────────────────────────────────────────────────┐
│ QUERY UNDERSTANDING PIPELINE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ RAW QUERY: "apple store nyc hours" │
│ │
│ STEP 1: TOKENIZATION AND NORMALIZATION │
│ ──────────────────────────────────────── │
│ • Split into tokens: ["apple", "store", "nyc", "hours"] │
│ • Normalize: lowercase, expand abbreviations │
│ • "nyc" → "new york city" │
│ │
│ STEP 2: ENTITY RECOGNITION │
│ ─────────────────────────── │
│ • "apple store" = Apple Store (retail location) │
│ • "nyc" = New York City (location) │
│ • "hours" = operating hours (attribute) │
│ │
│ STEP 3: INTENT CLASSIFICATION │
│ ───────────────────────────── │
│ What type of query is this? │
│ │
│ • Navigational: User wants a specific site │
│ • Informational: User wants to learn something │
│ • Transactional: User wants to do something (buy, book, etc.) │
│ • Local: User wants nearby results │
│ │
│ This query: LOCAL + INFORMATIONAL │
│ │
│ STEP 4: QUERY EXPANSION │
│ ─────────────────────── │
│ Add related terms: │
│ • "hours" → "open", "close", "schedule", "timing" │
│ • Location variants: "manhattan", "5th avenue" │
│ │
│ STEP 5: STRUCTURED QUERY │
│ ───────────────────────── │
│ { │
│ intent: "local_business_hours", │
│ entity: "Apple Store", │
│ location: "New York City", │
│ attribute: "hours", │
│ expanded_terms: ["hours", "open", "schedule"] │
│ } │
│ │
│ This structured understanding enables: │
│ • Triggering local business card (not organic results) │
│ • Showing hours directly in search results │
│ • Connecting to maps and business listings │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Spell Correction and Query Rewriting
Users make mistakes. "reciepe for choclate cake" should match results for "recipe for chocolate cake." Search engines handle this through:
Spell correction: Suggest or automatically correct misspellings. Based on edit distance, query logs (common corrections), and language models.
Query rewriting: Transform queries based on learned patterns.
- "pics of Eiffel Tower" → "Eiffel Tower images"
- "what is the capital of France" → "France capital"
Personalization: Adjust based on user history and context.
- User in UK searching "football" → soccer
- User in US searching "football" → American football
Putting It Together: The Search Pipeline
When you type a query into Google, here's what happens in roughly 200 milliseconds:
┌─────────────────────────────────────────────────────────────────────────┐
│ CLASSICAL SEARCH PIPELINE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ USER QUERY: "best hiking trails california" │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ QUERY UNDERSTANDING (5-10ms) │ │
│ │ • Tokenize and normalize │ │
│ │ • Identify entities: "hiking trails", "california" │ │
│ │ • Classify intent: informational, seeking recommendations │ │
│ │ • Expand query: add "hikes", "hiking paths", etc. │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ CANDIDATE RETRIEVAL (20-50ms) │ │
│ │ • Query the inverted index │ │
│ │ • Find all documents containing query terms │ │
│ │ • Apply BM25 for initial scoring │ │
│ │ • Return top ~1000 candidates │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ RANKING (50-100ms) │ │
│ │ • Apply PageRank and link signals │ │
│ │ • Apply freshness signals │ │
│ │ • Apply quality signals (E-E-A-T) │ │
│ │ • Apply personalization │ │
│ │ • Neural re-ranking (BERT-based models) │ │
│ │ • Return top ~100 results │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ RESULT ASSEMBLY (20-30ms) │ │
│ │ • Generate snippets (extract relevant text) │ │
│ │ • Add rich features (images, ratings, dates) │ │
│ │ • Add knowledge panels, featured snippets │ │
│ │ • Filter adult content, remove duplicates │ │
│ │ • Diversify results (avoid same-site dominance) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ SEARCH RESULTS PAGE (~200ms total) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ INFRASTRUCTURE: │
│ • Query routed to nearest data center │
│ • Index sharded across thousands of machines │
│ • Each shard queried in parallel │
│ • Results merged and re-ranked │
│ • Cached results for common queries │
│ │
└─────────────────────────────────────────────────────────────────────────┘
This pipeline, refined over 25+ years, is remarkably efficient. It can search 200+ billion documents and return results in under 200 milliseconds. But it has a fundamental limitation: it returns documents, not answers. The user must still click through links, read pages, and synthesize information themselves.
This is where AI search engines enter the picture.
Part II: AI-Powered Search
The Paradigm Shift
Classical search answers the question: "Which documents might contain what you're looking for?"
AI search answers a different question: "What is the answer to your question?"
This shift from document retrieval to answer generation changes everything—the user experience, the underlying architecture, the business model, and the relationship between search engines and content creators.
┌─────────────────────────────────────────────────────────────────────────┐
│ CLASSICAL VS AI SEARCH │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ CLASSICAL SEARCH (Google): │
│ ────────────────────────── │
│ │
│ User: "What's the best way to learn Python?" │
│ │
│ Response: │
│ • 10 blue links to various articles │
│ • User must click, read, compare, synthesize │
│ • Time to answer: 5-15 minutes │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ AI SEARCH (Perplexity): │
│ ─────────────────────── │
│ │
│ User: "What's the best way to learn Python?" │
│ │
│ Response: │
│ "Based on current resources and community recommendations: │
│ │
│ For beginners, the most effective approach combines: │
│ │
│ 1. Interactive tutorials: Start with Python.org's official │
│ tutorial or Codecademy's Python course for fundamentals [1][2] │
│ │
│ 2. Project-based learning: Build small projects after basics. │
│ Automate the Boring Stuff with Python is excellent for this [3] │
│ │
│ 3. Practice platforms: LeetCode and HackerRank for algorithms, │
│ real-world projects on GitHub for portfolio [4][5] │
│ │
│ Timeline: Most learners achieve basic proficiency in 2-3 months │
│ with 1-2 hours daily practice. │
│ │
│ Sources: [1] python.org [2] codecademy.com [3] automatetheboringstuff│
│ [4] leetcode.com [5] github.com" │
│ │
│ Time to answer: 30 seconds │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ KEY DIFFERENCES: │
│ │
│ Classical: AI: │
│ • Returns links • Returns synthesized answer │
│ • User synthesizes • AI synthesizes │
│ • Many clicks needed • Zero clicks needed │
│ • No conversation • Conversational follow-ups │
│ • Static results • Dynamic, personalized answers │
│ │
└─────────────────────────────────────────────────────────────────────────┘
How Perplexity Works: Architecture Deep Dive
Perplexity has emerged as the leading AI search engine, reaching 780 million monthly queries in 2025. Understanding its architecture reveals how AI search engines actually work.
At its core, Perplexity is a Retrieval-Augmented Generation (RAG) system at massive scale:
┌─────────────────────────────────────────────────────────────────────────┐
│ PERPLEXITY ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ USER QUERY │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ QUERY UNDERSTANDING │ │
│ │ │ │
│ │ • Parse user intent │ │
│ │ • Identify entities and concepts │ │
│ │ • Determine if real-time data needed │ │
│ │ • Route to appropriate model/mode │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ MULTI-STAGE RETRIEVAL │ │
│ │ │ │
│ │ Stage 1: Fast Retrieval │ │
│ │ • Query web index (200+ billion URLs tracked) │ │
│ │ • Lexical search (BM25-style) │ │
│ │ • Embedding-based semantic search │ │
│ │ • Return ~1000 candidate documents │ │
│ │ │ │
│ │ Stage 2: Reranking │ │
│ │ • Cross-encoder models score relevance │ │
│ │ • Consider freshness, authority, relevance │ │
│ │ • Return top ~10-20 documents │ │
│ │ │ │
│ │ Stage 3: On-Demand Crawling (if needed) │ │
│ │ • For time-sensitive queries, fetch pages in real-time │ │
│ │ • Extract and process fresh content │ │
│ │ • Bypass stale index data │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ ANSWER GENERATION │ │
│ │ │ │
│ │ • Select appropriate LLM (GPT-4, Claude, Sonar, etc.) │ │
│ │ • Construct prompt with retrieved context │ │
│ │ • Generate answer with inline citations │ │
│ │ • Stream response to user │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ANSWER WITH CITATIONS │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ INFRASTRUCTURE SCALE: │
│ • 200+ billion URLs indexed │
│ • Tens of thousands of CPUs for crawling │
│ • 400+ petabytes of storage │
│ • 200 million daily queries │
│ • Powered by Vespa AI for search infrastructure │
│ • Indexes updated tens of thousands of times per second │
│ │
└─────────────────────────────────────────────────────────────────────────┘
The Multi-Model Strategy
Unlike a single-model chatbot, Perplexity operates as an orchestration layer that routes queries to different models based on the task:
┌─────────────────────────────────────────────────────────────────────────┐
│ PERPLEXITY'S MODEL ROUTING │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ QUERY ANALYSIS → MODEL SELECTION │
│ │
│ Simple factual query: │
│ "What year was the Eiffel Tower built?" │
│ → Route to Sonar (fast, cost-effective in-house model) │
│ │
│ Complex research query: │
│ "Compare the economic policies of the EU and US regarding AI" │
│ → Route to Claude 3.5 Sonnet or GPT-4 (higher reasoning) │
│ │
│ Coding query: │
│ "Write a Python function to merge two sorted lists" │
│ → Route to code-specialized model │
│ │
│ Math/reasoning query: │
│ "Solve this calculus integral step by step" │
│ → Route to reasoning model (DeepSeek R1, GPT-4) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ AVAILABLE MODELS (2025): │
│ │
│ • Sonar: Perplexity's in-house models, fine-tuned for search │
│ • GPT-4o: OpenAI's flagship model │
│ • Claude 3.5 Sonnet: Anthropic's model │
│ • Gemini Flash 2.0: Google's fast model │
│ • DeepSeek R1: Strong reasoning model │
│ • Llama 3.1: Meta's open-source model │
│ │
│ This multi-model approach lets Perplexity: │
│ • Optimize cost (use cheaper models for simple queries) │
│ • Maximize quality (use best model for each task type) │
│ • Reduce latency (fast models for time-sensitive queries) │
│ • Provide choice (Pro users can select models) │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Real-Time Information: The Freshness Challenge
A key differentiator of AI search engines is handling real-time information. When you ask "What's the weather in Tokyo?" or "Latest news about Apple," the answer must be current—not from a pre-training cutoff months ago.
Perplexity solves this through multiple mechanisms:
1. Continuous Index Updates: The index is updated tens of thousands of times per second. High-priority pages (news sites, frequently changing pages) are re-crawled frequently.
2. On-Demand Crawling: For time-sensitive queries, Perplexity can fetch and process web pages in real-time, bypassing the index entirely. This adds latency but ensures freshness.
3. API Integrations: For structured data (weather, stock prices, sports scores), direct API calls to authoritative sources provide instant, accurate data.
4. Freshness Signals in Ranking: When relevance scores are similar, prefer recent content. The retrieval system weighs publication dates and last-modified timestamps.
┌─────────────────────────────────────────────────────────────────────────┐
│ FRESHNESS STRATEGIES │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Query: "SpaceX launch today" │
│ │
│ TRADITIONAL APPROACH (Google): │
│ • Return indexed news articles │
│ • May be hours old │
│ • User must check multiple sources for updates │
│ │
│ AI SEARCH APPROACH (Perplexity): │
│ • Detect time-sensitive query ("today") │
│ • Trigger on-demand crawl of SpaceX, NASA, news sites │
│ • Fetch and process pages in real-time │
│ • Synthesize current status with sources │
│ │
│ Result: "SpaceX's Starship launch is scheduled for 4:00 PM EST │
│ today (January 4, 2025). The launch window extends until 6:00 PM. │
│ Weather conditions are currently favorable with 80% probability │
│ of acceptable conditions. [1][2][3]" │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ CRAWL FREQUENCY BY CONTENT TYPE: │
│ │
│ Breaking news sites: Every few minutes │
│ Major news sites: Every 15-60 minutes │
│ Blogs, articles: Daily to weekly │
│ Documentation: Weekly to monthly │
│ Static pages: Monthly or on-demand │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Other AI Search Engines
Perplexity isn't alone. The AI search landscape in 2025 includes several major players:
┌─────────────────────────────────────────────────────────────────────────┐
│ AI SEARCH ENGINE LANDSCAPE (2025) │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ PERPLEXITY AI │
│ ───────────── │
│ Focus: Research-oriented, citation-first search │
│ Strengths: │
│ • Best-in-class citation quality │
│ • Multi-model selection │
│ • Strong for deep research │
│ • 780M monthly queries │
│ Limitations: │
│ • Weaker for local/transactional queries │
│ • No vertical integrations (maps, shopping) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ GOOGLE AI OVERVIEWS │
│ ──────────────────── │
│ Focus: Enhancing traditional search with AI summaries │
│ Strengths: │
│ • 2 billion monthly users │
│ • Seamless integration with existing Google services │
│ • Strong for local, shopping, and transactional queries │
│ • Massive index and infrastructure │
│ Limitations: │
│ • Higher error rate (26% on some topics) │
│ • Summaries can feel forced/unnecessary │
│ • Copyright and publisher concerns │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ CHATGPT SEARCH │
│ ─────────────── │
│ Focus: Conversational search within ChatGPT │
│ Strengths: │
│ • Strong language model (GPT-4) │
│ • Natural conversation flow │
│ • Good for exploratory queries │
│ Limitations: │
│ • Not designed as primary search interface │
│ • Citation quality inconsistent │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ MICROSOFT COPILOT (Bing) │
│ ───────────────────────── │
│ Focus: AI-enhanced Bing search │
│ Strengths: │
│ • Deep Microsoft/Office integration │
│ • Enterprise features │
│ • Good for productivity tasks │
│ Limitations: │
│ • Smaller index than Google │
│ • Less refined than Perplexity │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ YOU.COM │
│ ──────── │
│ Focus: Customizable, multi-app search │
│ Strengths: │
│ • Highly customizable interface │
│ • Multiple search "apps" for different tasks │
│ • Privacy-focused options │
│ Limitations: │
│ • Smaller user base │
│ • Less refined than leaders │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ SPECIALIZED AI SEARCH: │
│ │
│ • Phind: Developer/code-focused search │
│ • Consensus: Academic paper search │
│ • Elicit: Research assistant │
│ • Brave Search Leo: Privacy-first AI search │
│ │
└─────────────────────────────────────────────────────────────────────────┘
The RAG Foundation: How AI Search Actually Retrieves Information
At their core, AI search engines are large-scale RAG (Retrieval-Augmented Generation) systems. Understanding RAG illuminates how these systems work:
┌─────────────────────────────────────────────────────────────────────────┐
│ RAG FOR AI SEARCH │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ THE RAG PATTERN: │
│ ──────────────── │
│ │
│ 1. RETRIEVE: Find relevant documents for the query │
│ 2. AUGMENT: Add retrieved content to the LLM prompt │
│ 3. GENERATE: LLM produces answer using the context │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ WHY RAG, NOT JUST LLMs? │
│ ─────────────────────── │
│ │
│ Problem 1: Knowledge Cutoff │
│ LLMs are trained on data up to a cutoff date. They can't know │
│ about events after training. │
│ → RAG provides real-time information │
│ │
│ Problem 2: Hallucination │
│ LLMs confidently generate plausible-sounding but false information. │
│ → RAG grounds responses in retrieved documents │
│ │
│ Problem 3: Source Attribution │
│ LLMs can't tell you where information came from. │
│ → RAG enables citations to specific sources │
│ │
│ Problem 4: Specialized Knowledge │
│ LLMs may have shallow knowledge of niche topics. │
│ → RAG retrieves deep, authoritative sources │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ AI SEARCH = RAG AT WEB SCALE │
│ ───────────────────────────── │
│ │
│ Standard RAG: │
│ • Your documents (thousands to millions) │
│ • Single embedding model │
│ • Single retrieval step │
│ │
│ AI Search (Perplexity): │
│ • The entire web (200+ billion pages) │
│ • Multiple retrieval strategies (lexical + semantic) │
│ • Multi-stage retrieval and reranking │
│ • Real-time crawling when needed │
│ • Multi-model generation │
│ │
│ The principles are the same; the scale is different. │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Hybrid Retrieval: Combining Lexical and Semantic Search
Modern AI search engines don't rely on a single retrieval method. They combine multiple approaches:
Lexical Search (BM25-style):
- Matches exact terms
- Fast and precise
- Great for specific queries ("error code 0x8007045D")
- Misses semantic similarity
Semantic Search (Embeddings):
- Matches meaning, not just words
- "car" matches "automobile"
- "how to fix a bug" matches "debugging techniques"
- Can miss exact matches
Hybrid Approach: Combine both and let the reranker figure out relevance:
┌─────────────────────────────────────────────────────────────────────────┐
│ HYBRID RETRIEVAL │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Query: "Python list comprehension syntax error" │
│ │
│ LEXICAL RETRIEVAL (BM25): │
│ Looks for exact terms: "Python", "list", "comprehension", "syntax", │
│ "error" │
│ Results: Pages with these exact words │
│ ✓ Finds: Stack Overflow posts with exact error messages │
│ ✗ Misses: Articles about "Python collection expressions" │
│ │
│ SEMANTIC RETRIEVAL (Embeddings): │
│ Understands meaning: Python programming, list operations, errors │
│ Results: Conceptually similar pages │
│ ✓ Finds: General Python tutorials on iterations │
│ ✗ Might miss: Exact error message matches │
│ │
│ HYBRID (RRF - Reciprocal Rank Fusion): │
│ Combine rankings from both methods: │
│ Score = 1/(k + rank_lexical) + 1/(k + rank_semantic) │
│ │
│ Results: Best of both worlds │
│ ✓ Finds exact error message matches │
│ ✓ Also finds conceptually relevant tutorials │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ RERANKING: │
│ After hybrid retrieval, a cross-encoder model re-scores all │
│ candidates together, considering the full query-document │
│ relationship. This is slower but more accurate. │
│ │
│ Stage 1: Fast retrieval → 1000 candidates │
│ Stage 2: Cross-encoder reranking → top 20 │
│ Stage 3: LLM selects most relevant for answer → top 5-10 │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Part III: Head-to-Head Comparison
When Classical Search Wins
Despite the hype around AI search, traditional search engines retain significant advantages in specific scenarios:
┌─────────────────────────────────────────────────────────────────────────┐
│ WHEN TO USE CLASSICAL SEARCH │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. NAVIGATIONAL QUERIES │
│ ──────────────────────── │
│ User knows what site they want: │
│ • "facebook login" │
│ • "amazon" │
│ • "new york times" │
│ │
│ Classical search: One click to destination │
│ AI search: Unnecessary synthesis, slower │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 2. LOCAL QUERIES │
│ ──────────────── │
│ "restaurants near me" │
│ "gas stations open now" │
│ "pharmacy 10001" │
│ │
│ Google: Maps integration, reviews, hours, directions—all in one │
│ AI search: Less mature local integration │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 3. TRANSACTIONAL QUERIES │
│ ───────────────────────── │
│ "buy iPhone 15 Pro" │
│ "book flight to Paris" │
│ "cheapest hotels in Barcelona" │
│ │
│ Google: Shopping results, price comparisons, booking integrations │
│ AI search: Can summarize options but can't complete transactions │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 4. EXPLORATORY BROWSING │
│ ──────────────────────── │
│ When you want to browse, not just get answers: │
│ • Shopping for inspiration │
│ • Reading multiple perspectives on news │
│ • Discovering new websites │
│ │
│ Classical: Multiple sources to explore │
│ AI search: Single synthesized view (may miss perspectives) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 5. VISUAL/MEDIA SEARCH │
│ ─────────────────────── │
│ "cute cat pictures" │
│ "how to tie a tie video" │
│ │
│ Google: Image/video results, visual browsing │
│ AI search: Better at describing than showing │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 6. REAL-TIME STRUCTURED DATA │
│ ───────────────────────────── │
│ "Apple stock price" │
│ "weather in Tokyo" │
│ "Lakers score" │
│ │
│ Google: Direct answers from authoritative sources, instant │
│ AI search: Similar, but Google's integrations are more mature │
│ │
└─────────────────────────────────────────────────────────────────────────┘
When AI Search Wins
AI search engines excel in scenarios requiring synthesis, understanding, and conversation:
┌─────────────────────────────────────────────────────────────────────────┐
│ WHEN TO USE AI SEARCH │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. RESEARCH QUERIES │
│ ─────────────────── │
│ "Compare renewable energy policies across EU countries" │
│ "What are the pros and cons of different database architectures?" │
│ │
│ Classical: Hours clicking through multiple sources, manual synthesis │
│ AI search: Comprehensive summary with citations in 30 seconds │
│ │
│ Studies show: AI search reduces research time by up to 30% │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 2. COMPLEX QUESTIONS │
│ ──────────────────── │
│ "How does CRISPR gene editing work and what are its applications?" │
│ "Explain the causes and effects of the 2008 financial crisis" │
│ │
│ Classical: Need to read multiple articles, piece together │
│ AI search: Structured explanation with key points │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 3. FOLLOW-UP QUESTIONS │
│ ─────────────────────── │
│ Initial: "What is machine learning?" │
│ Follow-up: "How is it different from deep learning?" │
│ Follow-up: "What are some practical applications?" │
│ │
│ Classical: Each query is independent, no context │
│ AI search: Maintains conversation context, builds on previous │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 4. SYNTHESIS ACROSS SOURCES │
│ ──────────────────────────── │
│ "What do experts say about the future of electric vehicles?" │
│ "Compare reviews of the iPhone 15 Pro" │
│ │
│ Classical: Manual comparison across 10+ sources │
│ AI search: Synthesized summary of multiple perspectives │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 5. LEARNING AND EXPLANATION │
│ ──────────────────────────── │
│ "Explain quantum computing to a 10 year old" │
│ "What's the intuition behind backpropagation?" │
│ │
│ Classical: Results may be too technical or too basic │
│ AI search: Tailored explanation at requested level │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 6. TECHNICAL QUESTIONS WITH NUANCE │
│ ─────────────────────────────────── │
│ "When should I use PostgreSQL vs MongoDB?" │
│ "Best practices for error handling in async JavaScript" │
│ │
│ Classical: Generic articles, may not address your specific case │
│ AI search: Can consider context and provide tailored advice │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Quality and Accuracy Comparison
The question everyone wants answered: which is more accurate?
┌─────────────────────────────────────────────────────────────────────────┐
│ ACCURACY COMPARISON │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ RESEARCH FINDINGS (2025): │
│ ───────────────────────── │
│ │
│ Wordstream July 2025 Study: │
│ • Google AI Overviews: 26% error rate on PPC topics │
│ • Perplexity: 13% error rate on same topics │
│ │
│ This doesn't mean AI search is "better"—it means different │
│ approaches have different failure modes. │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ ERROR TYPES: │
│ ──────────── │
│ │
│ CLASSICAL SEARCH ERRORS: │
│ • Ranking errors: Best result buried on page 2 │
│ • Freshness: Outdated information ranked highly │
│ • SEO manipulation: Low-quality content ranks high │
│ • User still responsible for evaluating sources │
│ │
│ AI SEARCH ERRORS: │
│ • Hallucination: Plausible-sounding but false claims │
│ • Synthesis errors: Misunderstanding or misrepresenting sources │
│ • Citation mismatch: Citation doesn't support claim │
│ • Confidence without uncertainty: Presents uncertain info as fact │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ SOURCE QUALITY: │
│ ─────────────── │
│ │
│ Google AI Overviews: │
│ • Prefers older, established domains (49% over 15 years old) │
│ • Leans on well-known sources (Wikipedia, CNN) │
│ │
│ Perplexity: │
│ • More willing to cite niche sources (whitepapers, specialized) │
│ • Domain age mix more diverse (26% are 10-15 years old) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ KEY INSIGHT: │
│ │
│ Classical search gives you sources to evaluate yourself. │
│ AI search evaluates for you, then gives you the synthesis. │
│ │
│ If AI search is wrong, you might not know. │
│ If classical search ranks poorly, you can still find truth. │
│ │
│ → Always check citations in AI search results │
│ → Use both approaches for important research │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Part IV: The Future of Search
Market Dynamics
The search market is undergoing its biggest transformation since Google's founding:
┌─────────────────────────────────────────────────────────────────────────┐
│ MARKET TRANSFORMATION │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ CURRENT STATE (2025): │
│ ───────────────────── │
│ • Google: 89.7% market share │
│ • Bing: ~3% │
│ • Others: ~7% │
│ • Perplexity: 780M monthly queries (tiny vs Google's billions) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ USER BEHAVIOR SHIFT: │
│ ───────────────────── │
│ │
│ • 52% of US adults have used an AI LLM │
│ • 2/3 of LLM users use them "like search engines" │
│ • 98% of ChatGPT users also still use Google │
│ • Users are learning which tool to use for which queries │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ PREDICTIONS: │
│ ──────────── │
│ │
│ Gartner: Traditional search volume drops 25% by 2026 │
│ Semrush: AI search visitors surpass traditional by 2028 │
│ │
│ Reality check: Google isn't going away. They have: │
│ • Massive infrastructure advantage │
│ • Vertical integrations (Maps, Shopping, YouTube) │
│ • 25 years of data and refinement │
│ • AI Overviews with 2B monthly users already │
│ │
│ Most likely outcome: Hybrid approaches dominate │
│ │
└─────────────────────────────────────────────────────────────────────────┘
The Traffic Problem
AI search creates an existential challenge for content creators:
┌─────────────────────────────────────────────────────────────────────────┐
│ THE ZERO-CLICK PROBLEM │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ TRADITIONAL SEARCH: │
│ ─────────────────── │
│ User → Google → Click to Website → Website gets traffic & ad revenue │
│ │
│ AI SEARCH: │
│ ─────────── │
│ User → Perplexity → Gets answer → (Maybe clicks citation) │
│ │
│ IMPACT: │
│ • Google AI Overviews: 15.5% drop in organic CTR │
│ • Position 1 CTR: 34.5% drop when AI Overview present │
│ • Publishers report 1-25% traffic losses │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ THE TENSION: │
│ ──────────── │
│ │
│ AI search engines need content to synthesize. │
│ Content creators need traffic to survive. │
│ If traffic disappears, who will create content? │
│ │
│ POSSIBLE RESOLUTIONS: │
│ │
│ 1. Revenue sharing: AI engines pay publishers │
│ 2. Premium content: Paywalled content excluded from AI │
│ 3. New metrics: Citations and brand awareness over clicks │
│ 4. Licensing deals: OpenAI/Perplexity partnerships with publishers │
│ │
│ This is an unsolved problem with significant implications. │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Hybrid Future
The future likely isn't AI replacing classical search—it's intelligent combination:
┌─────────────────────────────────────────────────────────────────────────┐
│ THE HYBRID FUTURE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ INTELLIGENT QUERY ROUTING: │
│ ────────────────────────── │
│ │
│ "Facebook login" │
│ → Direct navigation (classical) │
│ │
│ "restaurants near me" │
│ → Local search with maps (classical + structured data) │
│ │
│ "Compare React vs Vue for a new project" │
│ → AI synthesis with sources (AI search) │
│ │
│ "buy running shoes" │
│ → Shopping results (classical + commerce) │
│ │
│ "What caused the 2008 financial crisis?" │
│ → AI explanation with follow-ups (AI search) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ GOOGLE'S APPROACH: │
│ ───────────────── │
│ Add AI Overviews when helpful, traditional results when not. │
│ Keep vertical integrations (Maps, Shopping, YouTube). │
│ Let users choose their experience. │
│ │
│ PERPLEXITY'S APPROACH: │
│ ────────────────────── │
│ AI-first, but show sources prominently. │
│ Add features for specific verticals over time. │
│ Position as research tool, not everything tool. │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ USER RECOMMENDATION: │
│ ───────────────────── │
│ │
│ "Use Google when it works best. Switch to Perplexity when you │
│ need answers, not links. The future of search isn't one │
│ engine—it's the ability to move between them intelligently." │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Emerging Trends
Several developments will shape search's evolution:
1. Agentic Search Search becomes action. Instead of "find me flights to Paris," you say "book me the cheapest flight to Paris next weekend." The search agent not only finds information but executes tasks.
2. Personalized AI Models Search engines that learn your preferences, writing style, and expertise level. A query from a doctor gets different results than the same query from a patient.
3. Multimodal Search Search with images, voice, and video as naturally as text. "Find me a dress like this" (uploads photo). "What's this plant?" (points phone camera).
4. Real-Time Knowledge Instant integration of breaking information. Today's AI search can be minutes behind; future systems will be seconds behind or real-time.
5. Specialized Vertical AI Domain-specific search engines with deep expertise: legal research, medical literature, scientific papers, code search. General search for general queries; specialized search for specialized needs.
Building Your Own: Key Concepts
For those building search or RAG systems, here are the key concepts to understand:
Simplified Search Implementation
# Conceptual search pipeline (simplified)
class SimpleSearch:
def __init__(self):
self.inverted_index = {} # term -> [doc_ids]
self.documents = {} # doc_id -> content
self.doc_lengths = {} # doc_id -> length
self.avg_doc_length = 0
def index_document(self, doc_id, content):
"""Add a document to the index."""
tokens = self.tokenize(content)
self.documents[doc_id] = content
self.doc_lengths[doc_id] = len(tokens)
for token in set(tokens):
if token not in self.inverted_index:
self.inverted_index[token] = []
self.inverted_index[token].append(doc_id)
# Update average document length
self.avg_doc_length = sum(self.doc_lengths.values()) / len(self.doc_lengths)
def search(self, query, top_k=10):
"""Search and rank documents using BM25."""
query_tokens = self.tokenize(query)
scores = {}
for token in query_tokens:
if token not in self.inverted_index:
continue
# IDF calculation
doc_freq = len(self.inverted_index[token])
idf = math.log((len(self.documents) - doc_freq + 0.5) / (doc_freq + 0.5))
for doc_id in self.inverted_index[token]:
# Term frequency in document
tf = self.documents[doc_id].lower().count(token)
# BM25 scoring
doc_len = self.doc_lengths[doc_id]
k1, b = 1.2, 0.75
tf_component = (tf * (k1 + 1)) / (tf + k1 * (1 - b + b * doc_len / self.avg_doc_length))
if doc_id not in scores:
scores[doc_id] = 0
scores[doc_id] += idf * tf_component
# Return top-k by score
ranked = sorted(scores.items(), key=lambda x: x[1], reverse=True)
return ranked[:top_k]
This simplified example shows the core concepts. Production systems add:
- Efficient data structures (tries, compressed posting lists)
- Distributed indexing across many machines
- Caching for common queries
- Real-time index updates
- Query understanding and expansion
RAG for AI Search
# Conceptual RAG pipeline (simplified)
class SimpleRAGSearch:
def __init__(self, retriever, llm):
self.retriever = retriever # Search system
self.llm = llm # Language model
def search(self, query):
"""AI search: retrieve then generate."""
# Step 1: Retrieve relevant documents
documents = self.retriever.search(query, top_k=10)
# Step 2: Construct prompt with context
context = "\n\n".join([
f"[{i+1}] {doc['title']}\n{doc['content']}"
for i, doc in enumerate(documents)
])
prompt = f"""Answer the question based on the provided sources.
Cite sources using [1], [2], etc.
Sources:
{context}
Question: {query}
Answer:"""
# Step 3: Generate answer
answer = self.llm.generate(prompt)
return {
"answer": answer,
"sources": documents
}
Production AI search adds:
- Multi-stage retrieval (fast recall → slow precision)
- Query understanding and routing
- Multi-model selection
- Citation verification
- Streaming responses
- Conversation context
Summary
Search has evolved from simple keyword matching to sophisticated AI systems that understand and synthesize information. Here's what we've learned:
┌─────────────────────────────────────────────────────────────────────────┐
│ KEY TAKEAWAYS │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ CLASSICAL SEARCH (Google, Bing): │
│ ───────────────────────────────── │
│ • Foundation: Crawling, inverted indices, BM25 ranking │
│ • Innovation: PageRank—link structure reveals quality │
│ • Returns: Links to documents for users to evaluate │
│ • Strengths: Local, transactional, navigational queries │
│ • 25+ years of refinement, massive infrastructure │
│ │
│ AI SEARCH (Perplexity, ChatGPT Search): │
│ ──────────────────────────────────────── │
│ • Foundation: RAG at web scale │
│ • Innovation: LLM synthesis with real-time retrieval │
│ • Returns: Direct answers with citations │
│ • Strengths: Research, complex questions, follow-ups │
│ • Rapid evolution, challenging traditional search │
│ │
│ THE TRUTH: │
│ ────────── │
│ • AI search builds ON classical search, not instead of it │
│ • Different tools for different jobs │
│ • 98% of AI search users also use Google │
│ • Hybrid approaches are the future │
│ │
│ FOR BUILDERS: │
│ ───────────── │
│ • Understand inverted indices and BM25 │
│ • Understand RAG and when to use it │
│ • Hybrid retrieval (lexical + semantic) often works best │
│ • Multi-stage retrieval: fast recall, precise reranking │
│ │
└─────────────────────────────────────────────────────────────────────────┘
The search landscape is more exciting than ever. Classical algorithms like PageRank and BM25 remain relevant even as AI transforms user expectations. Understanding both eras equips you to build better search experiences and use existing tools more effectively.
Frequently Asked Questions
Related Articles
Building Production-Ready RAG Systems: Lessons from the Field
A comprehensive guide to building Retrieval-Augmented Generation systems that actually work in production, based on real-world experience at Goji AI.
LLM-Powered Search for E-Commerce: Beyond NER and Elasticsearch
A deep dive into building intelligent e-commerce search systems that understand natural language, leverage metadata effectively, and support multi-turn conversations—moving beyond classical NER + Elasticsearch approaches.
Building Deep Research AI: From Query to Comprehensive Report
How to build AI systems that conduct thorough, multi-source research and produce comprehensive reports rivaling human analysts.
Agentic RAG: When Retrieval Meets Autonomous Reasoning
How to build RAG systems that don't just retrieve—they reason, plan, and iteratively refine their searches to solve complex information needs.
Multi-Step Documentation Search: Building Intelligent Search for Docs
A comprehensive guide to building intelligent documentation search systems—multi-step retrieval, query understanding, hierarchical chunking, reranking, and production patterns used by Mintlify, GitBook, and modern docs platforms.