Skip to main content
Back to Blog

Generative AI for Recommendation Systems: LLMs Meet Personalization

A comprehensive guide to LLM-powered recommendation systems. From feature augmentation to conversational agents, understand how generative AI is transforming personalization.

9 min read
Share:

The Convergence of LLMs and RecSys

At RecSys 2025 in Prague, one trend dominated: Large Language Models and recommendation systems are converging. This isn't hype—it's a fundamental shift in how we think about personalization.

Traditional recommendation systems excel at collaborative filtering: finding patterns in user-item interactions. But they struggle with:

  • Cold start: New users and items have no interaction history
  • Explainability: Why was this recommended?
  • Natural interaction: Users want to converse, not just click
  • Semantic understanding: "I want something like that movie but more uplifting"

LLMs address all of these. They understand language, reason about preferences, and generate explanations. The question is no longer "should we use LLMs for recommendations?" but "how?"

Code
┌─────────────────────────────────────────────────────────────────────────┐
│           TRADITIONAL RECSYS vs LLM-ENHANCED RECSYS                      │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  TRADITIONAL RECSYS:                                                     │
│  ────────────────────                                                    │
│                                                                          │
│  User → [Interaction History] → Collaborative Filtering → Items         │
│                                                                          │
│  Strengths:                                                              │
│  + Fast inference (embeddings + ANN)                                    │
│  + Captures behavioral patterns                                         │
│  + Well-understood, mature                                              │
│                                                                          │
│  Weaknesses:                                                             │
│  - Cold start for new users/items                                       │
│  - No natural language understanding                                    │
│  - Black box (limited explainability)                                   │
│  - Static (can't reason about context)                                  │
│                                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                          │
│  LLM-ENHANCED RECSYS:                                                    │
│                                                                          │
│  User → [Natural Language + History] → LLM Reasoning → Items           │
│                                                                          │
│  Strengths:                                                              │
│  + Handles cold start via content understanding                         │
│  + Natural conversational interface                                     │
│  + Explainable ("I recommended this because...")                        │
│  + Can reason about complex preferences                                 │
│  + Zero/few-shot adaptation to new domains                              │
│                                                                          │
│  Weaknesses:                                                             │
│  - Higher latency and cost                                              │
│  - Less precise on behavioral patterns                                  │
│  - Hallucination risks                                                  │
│  - Harder to A/B test and control                                       │
│                                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                          │
│  THE WINNING APPROACH: HYBRID                                           │
│                                                                          │
│  LLMs augment traditional systems, not replace them                     │
│  • LLM for understanding, reasoning, explanation                        │
│  • Traditional models for fast retrieval, behavioral patterns           │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

2025 State of the Art: A comprehensive survey analyzing 50+ studies identifies three fundamental paradigms: Recommender-oriented (LLMs enhance recommendation mechanisms), Interaction-oriented (conversational recommendations), and Simulation-oriented (multi-agent systems modeling user-item dynamics).


Part I: The LLM-RecSys Taxonomy

Three Paradigms for LLM Integration

Research has converged on three primary ways to integrate LLMs with recommendation systems:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                    LLM-RECSYS INTEGRATION PARADIGMS                      │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  1. RECOMMENDER-ORIENTED (Enhance the Model)                            │
│  ─────────────────────────────────────────────                           │
│                                                                          │
│  LLM augments or replaces traditional recommendation components         │
│                                                                          │
│  Approaches:                                                             │
│  • Knowledge Enhancement: LLM generates item descriptions, features     │
│  • Interaction Enhancement: LLM enriches user-item signals              │
│  • Model Enhancement: LLM as scorer, ranker, or full recommender        │
│                                                                          │
│  Example: LLMRec, CoLLM, P5                                             │
│                                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                          │
│  2. INTERACTION-ORIENTED (Conversational)                               │
│  ──────────────────────────────────────────                              │
│                                                                          │
│  LLM enables natural language interaction for recommendations           │
│                                                                          │
│  Approaches:                                                             │
│  • Conversational recommendation systems (CRS)                          │
│  • Explainable recommendations via dialogue                             │
│  • Preference elicitation through conversation                          │
│                                                                          │
│  Example: Chat-REC, RecLLM, InteRecAgent                                │
│                                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                          │
│  3. SIMULATION-ORIENTED (Multi-Agent)                                   │
│  ───────────────────────────────────────                                 │
│                                                                          │
│  LLM-powered agents simulate users, items, and system dynamics          │
│                                                                          │
│  Approaches:                                                             │
│  • User simulation for training/evaluation                              │
│  • Item agents for dynamic pricing/availability                         │
│  • Ecosystem simulation for policy testing                              │
│                                                                          │
│  Example: RecAgent, Agent4Rec, CRAVE                                    │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Operational Distinctions

Within these paradigms, systems differ in how the LLM operates:

Model-Centric LLMRec: The LLM is fine-tuned or prompt-engineered to directly produce recommendations. Items are mapped to tokens, and the LLM generates item sequences.

Hybrid LLMRec: The LLM augments traditional models—generating features, enhancing embeddings, or providing semantic signals that feed into collaborative filtering.

Agentic LLMRec: The LLM acts as an autonomous agent, using tools (search, database queries, APIs) to gather information and make recommendations through multi-step reasoning.


Part II: Knowledge Enhancement

LLMs as Feature Generators

The simplest integration: use LLMs to generate rich features for items and users. This approach is low-risk and immediately valuable—you're not replacing your recommendation system, just making it smarter with better features.

Why LLM-generated features are powerful:

Traditional item features are either:

  • Structured metadata: Category, brand, price. Limited and requires manual curation.
  • Embeddings: Dense vectors from models trained on similar items. Good but opaque.

LLMs can generate semantic features that capture nuances humans understand but traditional systems miss:

Code
Traditional features for "Patagonia Fleece Jacket":
─────────────────────────────────────────────────────────────────────────
category: "outerwear"
brand: "patagonia"
price: 150
color: "blue"

LLM-generated features:
─────────────────────────────────────────────────────────────────────────
target_audience: "environmentally-conscious outdoor enthusiasts, 25-45"
use_cases: ["hiking", "casual everyday wear", "light camping"]
emotional_appeal: "rugged reliability, environmental responsibility"
style: "casual athletic, works with jeans or hiking pants"
similar_buyers_also_like: ["hiking boots", "wool base layers", "camping gear"]

The LLM-generated features enable recommendations that traditional systems can't make: "Users who care about sustainability might also like these eco-friendly products."

When to use LLM feature generation:

  • Cold-start items: New products with no user interaction data
  • Long-tail items: Products with sparse interaction history
  • Cross-category recommendations: Understanding that camping gear buyers might want sustainable products
  • Explanation generation: Why did we recommend this?

Cost considerations:

LLM calls are expensive. Don't call them per-request. Instead:

  • Batch processing: Generate features for all items offline
  • Caching: Store generated features in your feature store
  • Selective enrichment: Only use LLMs for items where traditional features are insufficient
Python
from anthropic import Anthropic

client = Anthropic()

def generate_item_features(item: dict) -> dict:
    """
    Use LLM to generate rich semantic features for items.
    These features can augment traditional embeddings.
    """

    prompt = f"""Analyze this product and extract structured features:

Product: {item['title']}
Category: {item['category']}
Description: {item['description']}
Price: ${item['price']}

Extract:
1. Target audience (demographics, interests)
2. Use cases (when/why someone would buy this)
3. Key attributes (quality level, style, features)
4. Emotional appeal (what feelings it evokes)
5. Similar products (what else might interest this buyer)

Format as JSON."""

    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1000,
        messages=[{"role": "user", "content": prompt}]
    )

    return parse_json(response.content[0].text)


def generate_user_profile(user_history: list[dict]) -> dict:
    """
    Generate semantic user profile from interaction history.
    """

    history_text = "\n".join([
        f"- {item['title']} ({item['category']}) - {item['action']}"
        for item in user_history[-20:]  # Recent history
    ])

    prompt = f"""Based on this user's recent activity, create a preference profile:

Recent Activity:
{history_text}

Extract:
1. Primary interests (top 3 categories/themes)
2. Price sensitivity (budget, mid-range, premium)
3. Style preferences (if discernible)
4. Likely needs (what problems they're solving)
5. Recommendation strategy (what to show next)

Format as JSON."""

    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=500,
        messages=[{"role": "user", "content": prompt}]
    )

    return parse_json(response.content[0].text)

LLMRec: Graph Augmentation with LLMs

LLMRec (WSDM 2024) uses LLMs to augment the user-item interaction graph. This is a clever approach: instead of replacing your graph-based recommender, use LLMs to add missing edges to the graph.

The sparsity problem in recommendation graphs:

User-item interaction graphs are extremely sparse. A typical user interacts with <0.01% of items. This sparsity hurts recommendations:

  • Users with few interactions get poor recommendations (cold start)
  • Items with few interactions are never recommended (popularity bias)
  • Implicit similarities aren't captured (if no user bought both A and B, no edge exists)

LLMRec's insight: LLMs can infer missing edges

LLMs understand semantic relationships that aren't in the interaction data:

Code
User History: [Python Book, Machine Learning Course, GPU]
─────────────────────────────────────────────────────────────────────────

What the graph knows:
  User → Python Book (purchased)
  User → ML Course (enrolled)
  User → GPU (purchased)

What LLM can infer (new edges to add):
  Python Book ↔ ML Course (both for learning ML)
  GPU ↔ ML Course (GPU needed for ML training)
  User → "Data Science Tools" (implicit interest cluster)

Three types of augmentation LLMRec performs:

  1. User profile augmentation: Generate textual profile from interaction history, embed it as a new node connected to the user

  2. Item relationship augmentation: Ask LLM to identify semantically related items, add edges between them

  3. Interaction reasoning: For each user-item pair, generate explanation of why this interaction happened, use explanation embedding to enrich the edge

Why this works better than just using LLM embeddings:

  • Preserves graph structure: GNN-based recommenders rely on graph topology. Adding edges improves message passing.
  • Cheaper than inference-time LLM: Augmentation is done once offline. Inference uses fast GNN.
  • Combines strengths: LLM semantic understanding + GNN collaborative filtering

Implementation pattern:

Code
OFFLINE PIPELINE:
1. For each item: LLM generates "related items" → add item-item edges
2. For each user: LLM generates "interest summary" → add user profile node
3. Retrain GNN on augmented graph

ONLINE INFERENCE:
Same as before—fast GNN inference, no LLM calls
Python
class LLMRecAugmenter:
    """
    LLMRec-style graph augmentation.
    Uses LLM to generate synthetic interactions and enrich item features.
    """

    def __init__(self, llm_client, item_catalog: dict):
        self.llm = llm_client
        self.items = item_catalog

    def augment_item_graph(self, item_id: str) -> list[tuple[str, float]]:
        """
        Generate synthetic item-item relationships via LLM reasoning.
        """

        item = self.items[item_id]

        prompt = f"""Given this item:
Title: {item['title']}
Category: {item['category']}
Description: {item['description'][:200]}

List 5 items that would strongly appeal to the same customer.
For each, explain why and rate confidence (0-1).

Format:
1. [Item type/category] - [Reason] - [Confidence]"""

        response = self.llm.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=500,
            messages=[{"role": "user", "content": prompt}]
        )

        # Parse and match to actual catalog items
        synthetic_edges = self._match_to_catalog(response.content[0].text)
        return synthetic_edges

    def generate_user_augmentation(
        self,
        user_history: list[str],
        num_synthetic: int = 5
    ) -> list[str]:
        """
        Generate synthetic interactions for sparse users.
        Helps with cold start.
        """

        history_items = [self.items[i] for i in user_history if i in self.items]

        prompt = f"""A user has interacted with these items:
{self._format_items(history_items)}

Based on these preferences, what other items would they likely enjoy?
List {num_synthetic} item types/categories with confidence scores.
Focus on items that reveal underlying preferences (not obvious similar items)."""

        response = self.llm.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=300,
            messages=[{"role": "user", "content": prompt}]
        )

        synthetic_items = self._match_to_catalog(response.content[0].text)
        return synthetic_items

Part III: LLMs as Recommenders

Direct Recommendation via Prompting

The most direct approach: ask the LLM to recommend items. This sounds simple, but doing it well requires understanding LLM limitations and designing around them.

Why direct prompting is appealing:

  • Zero training: No ML infrastructure needed. Just prompt.
  • Rich reasoning: LLM can explain why each recommendation fits.
  • Context awareness: Can incorporate real-time context ("I'm shopping for a gift for my mom").
  • Language understanding: Handles natural language queries that keyword search can't.

Why direct prompting is dangerous:

The LLM doesn't actually know your catalog. It hallucinates:

Code
User: "Recommend running shoes under $100"
─────────────────────────────────────────────────────────────────────────

LLM response (without grounding):
  "I recommend the Nike Air Zoom Pegasus 38..."

Problems:
  ❌ That shoe might cost $130 in your store
  ❌ You might not carry Nike at all
  ❌ The "Pegasus 38" might be discontinued
  ❌ LLM might invent products that don't exist

The solution: Retrieval-Augmented Recommendation

Never let the LLM recommend from its imagination. Always:

  1. Use traditional retrieval to get candidate items from YOUR catalog
  2. Provide those candidates in the prompt
  3. Ask LLM to rank/select from the provided candidates only
Code
Correct pattern:
─────────────────────────────────────────────────────────────────────────
1. Embedding search: "running shoes" → 100 candidates from your catalog
2. Filter: price < $100 → 40 candidates
3. Prompt LLM: "From these 40 shoes, which 5 best match [user history]?"
4. LLM returns IDs from your candidate list (can't hallucinate)

Why candidate pre-filtering is essential:

LLMs can't efficiently process millions of items. Their context window is limited (even Claude's 200K tokens can only hold ~50K product descriptions). Pre-filter to 50-200 candidates using fast traditional methods, then use LLM for intelligent ranking.

When to use direct LLM recommendation:

  • Conversational commerce: User is chatting, asking questions
  • Complex queries: "Something for a dinner party with vegetarians"
  • Explanation-heavy: When users want to know WHY this recommendation
  • Low-volume, high-value: B2B sales, luxury goods where personalization matters

When NOT to use:

  • High-volume feeds: Homepage recommendations (too slow, too expensive)
  • Latency-sensitive: Search results where 100ms matters
  • Simple queries: "Show me popular laptops" (traditional RecSys is faster/cheaper)
Python
class LLMRecommender:
    """
    LLM as direct recommender via in-context learning.
    """

    def __init__(self, llm_client, item_catalog: list[dict]):
        self.llm = llm_client
        self.items = item_catalog
        self.item_index = {item['id']: item for item in item_catalog}

    def recommend(
        self,
        user_history: list[str],
        context: str = None,
        num_recommendations: int = 10,
    ) -> list[dict]:
        """
        Generate recommendations via LLM reasoning.
        """

        # Format user history
        history_text = self._format_history(user_history)

        # Format candidate items (subset for efficiency)
        candidates = self._get_candidates(user_history, n=100)
        candidates_text = self._format_candidates(candidates)

        prompt = f"""You are a recommendation system. Based on the user's history,
recommend items they would enjoy.

## User History (most recent first):
{history_text}

{f"## Current Context: {context}" if context else ""}

## Available Items:
{candidates_text}

## Task:
Select the {num_recommendations} best items for this user.
For each, explain why it matches their preferences.

Format:
1. [Item ID] - [Title] - [Reason]
2. ..."""

        response = self.llm.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1000,
            messages=[{"role": "user", "content": prompt}]
        )

        recommendations = self._parse_recommendations(response.content[0].text)
        return recommendations

    def _get_candidates(self, user_history: list[str], n: int) -> list[dict]:
        """
        Pre-filter candidates using traditional retrieval.
        LLM can't efficiently search millions of items.
        """
        # Use embedding similarity, popularity, or collaborative filtering
        # to get candidate set for LLM to rank
        pass

P5: Pretrain, Prompt, and Predict

P5 frames multiple recommendation tasks as text generation:

Python
class P5Recommender:
    """
    P5-style unified recommendation via text generation.
    All tasks formulated as sequence-to-sequence.
    """

    # Task templates
    TEMPLATES = {
        "sequential": (
            "User {user_id} has purchased {history}. "
            "What will they purchase next?"
        ),
        "rating": (
            "How will user {user_id} rate {item}? "
            "User's previous ratings: {history}"
        ),
        "explanation": (
            "User {user_id} purchased {item}. "
            "Explain why based on their history: {history}"
        ),
        "search": (
            "User {user_id} searched for '{query}'. "
            "Given their history {history}, recommend items."
        ),
    }

    def __init__(self, model_name: str = "google/flan-t5-xl"):
        from transformers import T5ForConditionalGeneration, T5Tokenizer

        self.tokenizer = T5Tokenizer.from_pretrained(model_name)
        self.model = T5ForConditionalGeneration.from_pretrained(model_name)

    def recommend_next(self, user_id: str, history: list[str]) -> str:
        """Sequential recommendation via text generation."""

        prompt = self.TEMPLATES["sequential"].format(
            user_id=user_id,
            history=", ".join(history[-10:])
        )

        inputs = self.tokenizer(prompt, return_tensors="pt")
        outputs = self.model.generate(**inputs, max_length=50)

        return self.tokenizer.decode(outputs[0], skip_special_tokens=True)

    def explain_recommendation(
        self,
        user_id: str,
        item: str,
        history: list[str]
    ) -> str:
        """Generate explanation for a recommendation."""

        prompt = self.TEMPLATES["explanation"].format(
            user_id=user_id,
            item=item,
            history=", ".join(history[-10:])
        )

        inputs = self.tokenizer(prompt, return_tensors="pt")
        outputs = self.model.generate(**inputs, max_length=200)

        return self.tokenizer.decode(outputs[0], skip_special_tokens=True)

CoLLM: Collaborative Embeddings in LLMs

CoLLM (TKDE 2025) integrates collaborative filtering embeddings directly into the LLM:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                         CoLLM ARCHITECTURE                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  TRADITIONAL LLM RECOMMENDATION:                                        │
│  ─────────────────────────────────                                       │
│                                                                          │
│  "User liked: iPhone, MacBook, AirPods" → LLM → "Recommend: iPad"      │
│                                                                          │
│  Problem: LLM only sees text, not collaborative signals                 │
│                                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                          │
│  CoLLM APPROACH:                                                         │
│  ────────────────                                                        │
│                                                                          │
│  1. Train collaborative filtering model (e.g., matrix factorization)   │
│     → User embeddings U, Item embeddings V                              │
│                                                                          │
│  2. Map CF embeddings to LLM token space                                │
│     CF embedding → Projection → "Soft tokens" in LLM vocabulary        │
│                                                                          │
│  3. Inject soft tokens into LLM prompt                                  │
│     "[USER_EMB] liked: iPhone, MacBook. Recommend: [ITEM_EMB]?"        │
│                                                                          │
│  Benefits:                                                               │
│  + LLM sees collaborative signals (who else liked these items)         │
│  + Combines semantic understanding with behavioral patterns             │
│  + Can be fine-tuned end-to-end                                         │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
Python
import torch
import torch.nn as nn

class CoLLM(nn.Module):
    """
    Collaborative LLM: Inject CF embeddings into LLM.
    """

    def __init__(
        self,
        llm_model,  # Pre-trained LLM
        cf_user_embeddings: torch.Tensor,  # (num_users, cf_dim)
        cf_item_embeddings: torch.Tensor,  # (num_items, cf_dim)
        llm_dim: int = 4096,
        cf_dim: int = 64,
    ):
        super().__init__()
        self.llm = llm_model

        # Store CF embeddings
        self.user_cf = nn.Embedding.from_pretrained(cf_user_embeddings, freeze=False)
        self.item_cf = nn.Embedding.from_pretrained(cf_item_embeddings, freeze=False)

        # Project CF embeddings to LLM hidden dimension
        self.user_proj = nn.Sequential(
            nn.Linear(cf_dim, llm_dim),
            nn.LayerNorm(llm_dim),
        )
        self.item_proj = nn.Sequential(
            nn.Linear(cf_dim, llm_dim),
            nn.LayerNorm(llm_dim),
        )

    def forward(
        self,
        input_ids: torch.Tensor,
        user_ids: torch.Tensor,
        item_ids: torch.Tensor = None,
    ):
        """
        Forward pass with collaborative embedding injection.
        """

        # Get LLM input embeddings
        input_embeds = self.llm.get_input_embeddings()(input_ids)

        # Get collaborative embeddings
        user_cf_emb = self.user_proj(self.user_cf(user_ids))  # (B, llm_dim)
        user_cf_emb = user_cf_emb.unsqueeze(1)  # (B, 1, llm_dim)

        # Prepend user collaborative embedding as soft token
        input_embeds = torch.cat([user_cf_emb, input_embeds], dim=1)

        # If item_ids provided (for scoring), append item embedding
        if item_ids is not None:
            item_cf_emb = self.item_proj(self.item_cf(item_ids))
            item_cf_emb = item_cf_emb.unsqueeze(1)
            input_embeds = torch.cat([input_embeds, item_cf_emb], dim=1)

        # Forward through LLM
        outputs = self.llm(inputs_embeds=input_embeds)

        return outputs

Part IV: Conversational Recommendation

Chat-REC: Interactive LLM Recommendations

Chat-REC enables multi-turn conversational recommendations:

Python
class ChatREC:
    """
    Conversational Recommendation System using LLM.
    Supports multi-turn dialogue for preference elicitation.
    """

    def __init__(self, llm_client, retriever, item_catalog):
        self.llm = llm_client
        self.retriever = retriever  # Traditional RecSys for candidates
        self.items = item_catalog

    def chat(
        self,
        user_message: str,
        conversation_history: list[dict],
        user_profile: dict,
    ) -> dict:
        """
        Process user message and generate response with recommendations.
        """

        # Classify intent
        intent = self._classify_intent(user_message, conversation_history)

        if intent == "ask_recommendation":
            return self._handle_recommendation_request(
                user_message, conversation_history, user_profile
            )
        elif intent == "provide_feedback":
            return self._handle_feedback(
                user_message, conversation_history, user_profile
            )
        elif intent == "ask_explanation":
            return self._handle_explanation_request(
                user_message, conversation_history
            )
        elif intent == "refine_preferences":
            return self._handle_preference_refinement(
                user_message, conversation_history, user_profile
            )
        else:
            return self._handle_general_query(
                user_message, conversation_history
            )

    def _classify_intent(
        self,
        message: str,
        history: list[dict]
    ) -> str:
        """Classify user intent for routing."""

        prompt = f"""Classify the user's intent in this conversation:

Conversation:
{self._format_history(history[-5:])}

User: {message}

Intent categories:
- ask_recommendation: User wants item suggestions
- provide_feedback: User gives opinion on suggested items
- ask_explanation: User wants to know why something was recommended
- refine_preferences: User clarifying or updating preferences
- general: Other queries

Reply with just the intent category."""

        response = self.llm.messages.create(
            model="claude-3-5-haiku-20241022",
            max_tokens=20,
            messages=[{"role": "user", "content": prompt}]
        )

        return response.content[0].text.strip().lower()

    def _handle_recommendation_request(
        self,
        message: str,
        history: list[dict],
        profile: dict,
    ) -> dict:
        """Generate recommendations based on conversation."""

        # Extract preferences from conversation
        preferences = self._extract_preferences(message, history)

        # Get candidates via traditional retrieval
        candidates = self.retriever.retrieve(
            user_profile=profile,
            preferences=preferences,
            n=50
        )

        # LLM selects and explains best matches
        prompt = f"""Based on this conversation, recommend items:

Conversation:
{self._format_history(history[-5:])}

User's current request: {message}

Extracted preferences: {preferences}

Available items:
{self._format_items(candidates[:20])}

Select the 5 best items and explain why each matches the user's needs.
Be conversational and helpful."""

        response = self.llm.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=800,
            messages=[{"role": "user", "content": prompt}]
        )

        recommendations = self._parse_recommendations(response.content[0].text)

        return {
            "response": response.content[0].text,
            "recommendations": recommendations,
            "intent": "ask_recommendation",
        }

    def _extract_preferences(
        self,
        message: str,
        history: list[dict]
    ) -> dict:
        """Extract structured preferences from conversation."""

        prompt = f"""Extract user preferences from this conversation:

Conversation:
{self._format_history(history)}

Current message: {message}

Extract:
- Category/type preferences
- Price range
- Specific features wanted
- Features to avoid
- Style/aesthetic preferences
- Use case/occasion

Format as JSON."""

        response = self.llm.messages.create(
            model="claude-3-5-haiku-20241022",
            max_tokens=300,
            messages=[{"role": "user", "content": prompt}]
        )

        return parse_json(response.content[0].text)

Proactive Preference Elicitation

The best conversational systems don't just respond—they proactively ask questions to understand preferences:

Python
class ProactiveRecommender:
    """
    Proactively elicits preferences through strategic questions.
    """

    def __init__(self, llm_client, item_catalog):
        self.llm = llm_client
        self.items = item_catalog

    def generate_clarifying_question(
        self,
        user_query: str,
        known_preferences: dict,
        candidate_items: list[dict],
    ) -> str:
        """
        Generate a clarifying question to narrow down recommendations.
        """

        # Identify dimensions with high variance in candidates
        differentiating_dims = self._find_differentiating_dimensions(
            candidate_items, known_preferences
        )

        prompt = f"""The user asked: "{user_query}"

We know these preferences: {known_preferences}

We have {len(candidate_items)} potential matches, varying mainly in:
{differentiating_dims}

Generate ONE clarifying question that would most help narrow down
the recommendations. Make it natural and conversational.

Don't ask about preferences we already know.
Focus on the most impactful differentiator."""

        response = self.llm.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=100,
            messages=[{"role": "user", "content": prompt}]
        )

        return response.content[0].text

    def should_ask_question(
        self,
        candidates: list[dict],
        confidence_threshold: float = 0.7
    ) -> bool:
        """
        Decide whether to ask a clarifying question or recommend.
        """

        # If top candidates are very similar, we're confident
        # If they're diverse, we should clarify

        diversity = self._compute_diversity(candidates[:10])

        return diversity > confidence_threshold

Part V: Agentic Recommendations

LLM Agents for Recommendations

The most sophisticated approach: LLMs as autonomous agents that use tools to gather information and make decisions.

Python
from typing import Callable

class RecommendationAgent:
    """
    LLM-powered recommendation agent with tool use.
    """

    def __init__(self, llm_client, tools: dict[str, Callable]):
        self.llm = llm_client
        self.tools = tools

    def recommend(
        self,
        user_request: str,
        user_context: dict,
        max_steps: int = 10,
    ) -> dict:
        """
        Multi-step recommendation via agent reasoning.
        """

        messages = [{
            "role": "user",
            "content": f"""You are a recommendation agent. Help the user find what they need.

User request: {user_request}

User context:
- Previous purchases: {user_context.get('purchase_history', [])}
- Browsing history: {user_context.get('browsing_history', [])}
- Preferences: {user_context.get('preferences', {})}

Available tools:
- search_catalog(query): Search items by text query
- get_item_details(item_id): Get detailed information about an item
- get_similar_items(item_id): Find items similar to a given item
- get_user_history(user_id): Get user's full interaction history
- get_trending_items(category): Get trending items in a category
- check_availability(item_id): Check stock and delivery options

Think step by step. Use tools to gather information, then make recommendations."""
        }]

        for step in range(max_steps):
            response = self.llm.messages.create(
                model="claude-3-5-sonnet-20241022",
                max_tokens=1000,
                messages=messages,
                tools=self._format_tools(),
            )

            # Check if agent wants to use a tool
            if response.stop_reason == "tool_use":
                tool_use = response.content[-1]
                tool_name = tool_use.name
                tool_input = tool_use.input

                # Execute tool
                tool_result = self.tools[tool_name](**tool_input)

                # Add to conversation
                messages.append({"role": "assistant", "content": response.content})
                messages.append({
                    "role": "user",
                    "content": [{
                        "type": "tool_result",
                        "tool_use_id": tool_use.id,
                        "content": str(tool_result)
                    }]
                })
            else:
                # Agent is done, return final response
                return {
                    "response": response.content[0].text,
                    "steps": step + 1,
                    "messages": messages,
                }

        return {"response": "Max steps reached", "steps": max_steps}

    def _format_tools(self) -> list[dict]:
        """Format tools for Claude API."""
        return [
            {
                "name": "search_catalog",
                "description": "Search the product catalog by text query",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "query": {"type": "string", "description": "Search query"}
                    },
                    "required": ["query"]
                }
            },
            {
                "name": "get_item_details",
                "description": "Get detailed information about a specific item",
                "input_schema": {
                    "type": "object",
                    "properties": {
                        "item_id": {"type": "string", "description": "Item ID"}
                    },
                    "required": ["item_id"]
                }
            },
            # ... more tools
        ]

Multi-Agent Recommendation Systems

RecAgent and Agent4Rec use multiple specialized agents:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                MULTI-AGENT RECOMMENDATION ARCHITECTURE                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│                         ┌─────────────────┐                             │
│                         │  ORCHESTRATOR   │                             │
│                         │     AGENT       │                             │
│                         └────────┬────────┘                             │
│                                  │                                       │
│              ┌───────────────────┼───────────────────┐                  │
│              │                   │                   │                  │
│              ▼                   ▼                   ▼                  │
│     ┌────────────────┐  ┌────────────────┐  ┌────────────────┐        │
│     │   RETRIEVAL    │  │    RANKING     │  │  EXPLANATION   │        │
│     │     AGENT      │  │     AGENT      │  │     AGENT      │        │
│     └────────────────┘  └────────────────┘  └────────────────┘        │
│              │                   │                   │                  │
│     - Search catalog    - Score relevance   - Generate reasons        │
│     - Filter by rules   - Apply preferences - Answer questions        │
│     - Get candidates    - Re-rank results   - Justify choices         │
│                                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                          │
│  COMMUNICATION FLOW:                                                     │
│                                                                          │
│  1. User: "I need running shoes for marathon training"                  │
│                                                                          │
│  2. Orchestrator → Retrieval: "Search for marathon running shoes"      │
│     Retrieval → Orchestrator: [100 candidate shoes]                    │
│                                                                          │
│  3. Orchestrator → Ranking: "Rank for marathon training"               │
│     Ranking → Orchestrator: [Top 10 ranked shoes]                      │
│                                                                          │
│  4. Orchestrator → Explanation: "Explain top 3 picks"                  │
│     Explanation → Orchestrator: [Detailed explanations]                │
│                                                                          │
│  5. Orchestrator → User: Final recommendations with explanations       │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
Python
class MultiAgentRecommender:
    """
    Multi-agent system for recommendations.
    Specialized agents for different tasks.
    """

    def __init__(self, llm_client, item_catalog, user_db):
        self.llm = llm_client
        self.items = item_catalog
        self.users = user_db

        # Specialized agents
        self.agents = {
            "retrieval": RetrievalAgent(llm_client, item_catalog),
            "ranking": RankingAgent(llm_client),
            "explanation": ExplanationAgent(llm_client),
            "personalization": PersonalizationAgent(llm_client, user_db),
        }

    async def recommend(
        self,
        user_id: str,
        query: str,
    ) -> dict:
        """
        Coordinate agents to generate recommendations.
        """

        # Step 1: Understand user context
        user_profile = await self.agents["personalization"].get_profile(user_id)

        # Step 2: Retrieve candidates
        candidates = await self.agents["retrieval"].retrieve(
            query=query,
            user_preferences=user_profile["preferences"],
            n=100
        )

        # Step 3: Rank candidates
        ranked = await self.agents["ranking"].rank(
            candidates=candidates,
            user_profile=user_profile,
            query=query,
        )

        # Step 4: Generate explanations
        explained = await self.agents["explanation"].explain(
            items=ranked[:10],
            user_profile=user_profile,
            query=query,
        )

        return {
            "recommendations": explained,
            "query_understanding": candidates["query_analysis"],
            "personalization": user_profile["summary"],
        }


class RetrievalAgent:
    """Agent specialized in candidate retrieval."""

    def __init__(self, llm_client, item_catalog):
        self.llm = llm_client
        self.items = item_catalog
        self.vector_store = self._build_vector_store(item_catalog)

    async def retrieve(
        self,
        query: str,
        user_preferences: dict,
        n: int = 100
    ) -> dict:
        """
        Retrieve candidates using multiple strategies.
        """

        # LLM analyzes query
        query_analysis = await self._analyze_query(query)

        # Multiple retrieval strategies
        semantic_results = self.vector_store.search(query, k=n)
        category_results = self._category_filter(query_analysis["categories"])
        attribute_results = self._attribute_filter(query_analysis["attributes"])

        # LLM merges and deduplicates
        merged = await self._merge_results(
            semantic_results,
            category_results,
            attribute_results,
            user_preferences,
        )

        return {
            "candidates": merged[:n],
            "query_analysis": query_analysis,
        }


class ExplanationAgent:
    """Agent specialized in generating explanations."""

    def __init__(self, llm_client):
        self.llm = llm_client

    async def explain(
        self,
        items: list[dict],
        user_profile: dict,
        query: str,
    ) -> list[dict]:
        """
        Generate personalized explanations for recommendations.
        """

        explained_items = []

        for item in items:
            explanation = await self._generate_explanation(
                item, user_profile, query
            )

            explained_items.append({
                **item,
                "explanation": explanation["short"],
                "detailed_explanation": explanation["detailed"],
                "match_reasons": explanation["reasons"],
            })

        return explained_items

    async def _generate_explanation(
        self,
        item: dict,
        user_profile: dict,
        query: str,
    ) -> dict:
        """Generate explanation for single item."""

        prompt = f"""Explain why this item is recommended:

Item: {item['title']}
Category: {item['category']}
Features: {item['features']}
Price: ${item['price']}

User query: {query}
User preferences: {user_profile['preferences']}
User history themes: {user_profile['themes']}

Generate:
1. Short explanation (1 sentence)
2. Detailed explanation (2-3 sentences)
3. List of specific match reasons

Format as JSON."""

        response = self.llm.messages.create(
            model="claude-3-5-haiku-20241022",
            max_tokens=300,
            messages=[{"role": "user", "content": prompt}]
        )

        return parse_json(response.content[0].text)

Part VI: User Simulation for Evaluation

Synthetic Users via LLMs

LLMs can simulate user behavior for testing and evaluation:

Python
class LLMUserSimulator:
    """
    Simulate user behavior for recommendation evaluation.
    """

    def __init__(self, llm_client):
        self.llm = llm_client

    def create_persona(self, persona_description: str) -> dict:
        """Create a detailed user persona."""

        prompt = f"""Create a detailed user persona for recommendation testing:

Description: {persona_description}

Generate:
1. Demographics (age, location, occupation)
2. Interests and hobbies
3. Shopping preferences (price sensitivity, brand loyalty)
4. Past purchase patterns
5. Decision-making style
6. Common objections/concerns

Format as JSON."""

        response = self.llm.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=500,
            messages=[{"role": "user", "content": prompt}]
        )

        return parse_json(response.content[0].text)

    def simulate_response(
        self,
        persona: dict,
        recommendations: list[dict],
        context: str = None,
    ) -> dict:
        """
        Simulate how this persona would respond to recommendations.
        """

        prompt = f"""You are simulating this user persona:
{json.dumps(persona, indent=2)}

They received these recommendations:
{self._format_recommendations(recommendations)}

{f"Context: {context}" if context else ""}

Simulate their response:
1. Which items would they click on? Why?
2. Which would they ignore? Why?
3. What would they say about the recommendations?
4. Would they convert (purchase)? Which item?
5. What's missing that they would want?

Be consistent with the persona's characteristics.
Format as JSON."""

        response = self.llm.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=500,
            messages=[{"role": "user", "content": prompt}]
        )

        return parse_json(response.content[0].text)

    def generate_interaction_trajectory(
        self,
        persona: dict,
        item_catalog: list[dict],
        num_interactions: int = 20,
    ) -> list[dict]:
        """
        Generate a realistic interaction sequence for a persona.
        Useful for creating synthetic training data.
        """

        trajectory = []
        browsing_context = []

        for i in range(num_interactions):
            prompt = f"""User persona:
{json.dumps(persona, indent=2)}

Previous interactions in this session:
{self._format_trajectory(trajectory[-5:])}

Available items (sample):
{self._format_items(random.sample(item_catalog, 20))}

What would this user do next?
- Browse a category?
- Search for something?
- Click on an item?
- Add to cart?
- Purchase?
- Leave?

Consider: time in session, previous actions, persona preferences.
Format: {{"action": "...", "item_id": "...", "reason": "..."}}"""

            response = self.llm.messages.create(
                model="claude-3-5-haiku-20241022",
                max_tokens=150,
                messages=[{"role": "user", "content": prompt}]
            )

            action = parse_json(response.content[0].text)
            trajectory.append(action)

            if action["action"] == "leave":
                break

        return trajectory

CRAVE: Collaborative Verbalized Experience

CRAVE (Best Paper at GenAIRecP 2025) uses agent experiences to improve recommendations:

Python
class CRAVESystem:
    """
    CRAVE: Collaborative Verbalized Experience for Recommendations.
    Agents learn from each other's experiences.
    """

    def __init__(self, llm_client):
        self.llm = llm_client
        self.experience_bank = []  # Stored experiences

    def collect_experience(
        self,
        user_query: str,
        recommendations: list[dict],
        user_feedback: dict,
        agent_reasoning: str,
    ):
        """
        Store verbalized experience from an interaction.
        """

        # Verbalize the experience
        experience = self._verbalize_experience(
            user_query, recommendations, user_feedback, agent_reasoning
        )

        self.experience_bank.append(experience)

    def _verbalize_experience(
        self,
        query: str,
        recommendations: list[dict],
        feedback: dict,
        reasoning: str,
    ) -> dict:
        """Convert interaction to verbalized experience."""

        prompt = f"""Summarize this recommendation interaction as a learning experience:

User query: {query}

Agent reasoning: {reasoning}

Recommendations made:
{self._format_recommendations(recommendations)}

User feedback:
- Clicked: {feedback.get('clicked', [])}
- Purchased: {feedback.get('purchased', [])}
- Dismissed: {feedback.get('dismissed', [])}
- Comments: {feedback.get('comments', '')}

Create a verbalized experience that captures:
1. What worked well
2. What could be improved
3. Key insight for similar future queries

Format as a concise lesson (2-3 sentences)."""

        response = self.llm.messages.create(
            model="claude-3-5-haiku-20241022",
            max_tokens=200,
            messages=[{"role": "user", "content": prompt}]
        )

        return {
            "query_type": self._classify_query(query),
            "lesson": response.content[0].text,
            "success_rate": len(feedback.get('purchased', [])) / len(recommendations),
            "timestamp": datetime.now().isoformat(),
        }

    def retrieve_relevant_experiences(
        self,
        current_query: str,
        n: int = 5
    ) -> list[dict]:
        """
        Find experiences relevant to current query.
        """

        # Embed and search (simplified)
        query_type = self._classify_query(current_query)

        relevant = [
            exp for exp in self.experience_bank
            if exp["query_type"] == query_type
        ]

        # Sort by success rate and recency
        relevant.sort(
            key=lambda x: (x["success_rate"], x["timestamp"]),
            reverse=True
        )

        return relevant[:n]

    def recommend_with_experience(
        self,
        query: str,
        candidates: list[dict],
        user_profile: dict,
    ) -> list[dict]:
        """
        Make recommendations informed by past experiences.
        """

        experiences = self.retrieve_relevant_experiences(query)

        prompt = f"""Make recommendations based on query and past learnings.

User query: {query}
User profile: {user_profile}

Lessons from similar past queries:
{self._format_experiences(experiences)}

Candidate items:
{self._format_items(candidates[:20])}

Apply the lessons learned to select and rank the best items.
Explain how past experiences informed your choices."""

        response = self.llm.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=800,
            messages=[{"role": "user", "content": prompt}]
        )

        return self._parse_recommendations(response.content[0].text)

Part VII: Production Considerations

Latency and Cost Management

LLMs are expensive and slow compared to traditional RecSys:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                    LATENCY & COST COMPARISON                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  TRADITIONAL RECSYS:                                                     │
│  • Embedding lookup: ~1ms                                               │
│  • ANN retrieval: ~5ms                                                  │
│  • Ranking model: ~10ms                                                 │
│  • Total: ~20ms                                                         │
│  • Cost: ~$0.0001 per request                                          │
│                                                                          │
│  LLM-BASED RECSYS:                                                       │
│  • LLM API call: 500-2000ms                                             │
│  • Multiple calls (agent): 2000-10000ms                                 │
│  • Total: 1-10 seconds                                                  │
│  • Cost: $0.01-0.10 per request                                        │
│                                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                          │
│  MITIGATION STRATEGIES:                                                  │
│                                                                          │
│  1. HYBRID ARCHITECTURE                                                  │
│     Traditional model for fast retrieval + LLM for explanation         │
│     LLM only for complex queries or high-value users                   │
│                                                                          │
│  2. CACHING                                                              │
│     Cache LLM responses for similar queries                             │
│     Pre-compute explanations for popular items                          │
│     Semantic caching (similar queries → cached response)               │
│                                                                          │
│  3. SMALLER MODELS                                                       │
│     Use Haiku/small models for simple tasks                            │
│     Reserve large models for complex reasoning                          │
│                                                                          │
│  4. ASYNC PROCESSING                                                     │
│     Show fast traditional recs immediately                              │
│     Enhance with LLM explanations async                                 │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘
Python
class HybridRecommender:
    """
    Hybrid system: fast traditional + smart LLM.
    """

    def __init__(
        self,
        traditional_model,
        llm_client,
        cache,
        llm_threshold: float = 0.7,  # When to use LLM
    ):
        self.traditional = traditional_model
        self.llm = llm_client
        self.cache = cache
        self.llm_threshold = llm_threshold

    async def recommend(
        self,
        user_id: str,
        query: str = None,
        context: dict = None,
    ) -> dict:
        """
        Recommend with intelligent LLM usage.
        """

        # Always start with fast traditional recommendations
        traditional_recs = self.traditional.recommend(user_id, n=20)

        # Decide if LLM is needed
        needs_llm = self._should_use_llm(query, context)

        if not needs_llm:
            return {
                "recommendations": traditional_recs,
                "explanations": None,
                "method": "traditional",
            }

        # Check cache first
        cache_key = self._make_cache_key(user_id, query, traditional_recs)
        cached = await self.cache.get(cache_key)

        if cached:
            return {**cached, "method": "cached_llm"}

        # Use LLM to enhance/re-rank
        enhanced = await self._llm_enhance(
            traditional_recs, query, context
        )

        # Cache result
        await self.cache.set(cache_key, enhanced, ttl=3600)

        return {**enhanced, "method": "llm"}

    def _should_use_llm(self, query: str, context: dict) -> bool:
        """Decide if LLM adds value for this request."""

        # Use LLM for:
        # - Natural language queries
        # - Complex multi-criteria requests
        # - Explanation requests
        # - High-value user segments

        if query and len(query.split()) > 3:
            return True

        if context and context.get("wants_explanation"):
            return True

        if context and context.get("user_tier") == "premium":
            return True

        return False

Evaluation Challenges

LLM recommendations are harder to evaluate:

Python
class LLMRecEvaluator:
    """
    Evaluation metrics for LLM-based recommendations.
    """

    def evaluate_offline(
        self,
        model,
        test_data: list[dict],
    ) -> dict:
        """Standard offline evaluation."""

        metrics = {
            "hr@10": [],
            "ndcg@10": [],
            "coverage": set(),
            "diversity": [],
        }

        for sample in test_data:
            recs = model.recommend(
                user_id=sample["user_id"],
                history=sample["history"],
            )

            rec_ids = [r["id"] for r in recs[:10]]

            # Hit rate
            hit = sample["target"] in rec_ids
            metrics["hr@10"].append(int(hit))

            # NDCG
            if hit:
                rank = rec_ids.index(sample["target"])
                ndcg = 1 / np.log2(rank + 2)
            else:
                ndcg = 0
            metrics["ndcg@10"].append(ndcg)

            # Coverage
            metrics["coverage"].update(rec_ids)

            # Diversity (intra-list)
            diversity = self._compute_diversity(recs[:10])
            metrics["diversity"].append(diversity)

        return {
            "hr@10": np.mean(metrics["hr@10"]),
            "ndcg@10": np.mean(metrics["ndcg@10"]),
            "coverage": len(metrics["coverage"]) / self.num_items,
            "diversity": np.mean(metrics["diversity"]),
        }

    def evaluate_explanations(
        self,
        explanations: list[str],
        items: list[dict],
        user_profiles: list[dict],
    ) -> dict:
        """
        Evaluate explanation quality.
        """

        # Use LLM to judge explanation quality
        scores = []

        for exp, item, profile in zip(explanations, items, user_profiles):
            prompt = f"""Rate this recommendation explanation:

Item: {item['title']}
User profile: {profile['summary']}
Explanation: {exp}

Rate 1-5 on:
1. Relevance: Does it address why this item fits the user?
2. Specificity: Does it mention specific features/preferences?
3. Accuracy: Is the reasoning sound?
4. Helpfulness: Would this help the user decide?

Format: {{"relevance": X, "specificity": X, "accuracy": X, "helpfulness": X}}"""

            response = self.judge_llm.messages.create(
                model="claude-3-5-haiku-20241022",
                max_tokens=100,
                messages=[{"role": "user", "content": prompt}]
            )

            scores.append(parse_json(response.content[0].text))

        return {
            "relevance": np.mean([s["relevance"] for s in scores]),
            "specificity": np.mean([s["specificity"] for s in scores]),
            "accuracy": np.mean([s["accuracy"] for s in scores]),
            "helpfulness": np.mean([s["helpfulness"] for s in scores]),
        }

Part VIII: Future Directions

Emerging Research Areas

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                    FUTURE OF LLM + RECSYS                                │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  1. MULTIMODAL RECOMMENDATIONS                                          │
│  ──────────────────────────────                                          │
│  • Image + text + behavior signals                                      │
│  • "Find me something like this photo but cheaper"                      │
│  • Video understanding for content recommendations                      │
│                                                                          │
│  2. REAL-TIME PERSONALIZATION                                           │
│  ─────────────────────────────                                           │
│  • LLMs that update beliefs within conversation                         │
│  • Streaming recommendations that adapt instantly                       │
│  • Edge-deployed small LLMs for latency                                 │
│                                                                          │
│  3. PRIVACY-PRESERVING LLM RECS                                         │
│  ───────────────────────────────                                         │
│  • On-device processing of preferences                                  │
│  • Federated learning for collaborative signals                         │
│  • Differential privacy for LLM fine-tuning                            │
│                                                                          │
│  4. AUTONOMOUS SHOPPING AGENTS                                          │
│  ───────────────────────────────                                         │
│  • Agents that browse, compare, and purchase                           │
│  • Multi-platform optimization                                          │
│  • Negotiation and deal-finding                                         │
│                                                                          │
│  5. GENERATIVE ITEM CREATION                                            │
│  ─────────────────────────────                                           │
│  • "Generate a product that would appeal to users like X"               │
│  • Personalized content generation                                      │
│  • Dynamic bundle creation                                              │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Part IX: LLM RecSys in Production (2024-2025)

Industry Deployments

Major tech companies have moved beyond research to deploy LLM-powered recommendations at scale:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│              LLM RECSYS IN PRODUCTION (2024-2025)                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  NETFLIX                                                                 │
│  ─────────                                                               │
│  • UniCoRn: Unified contextual ranker for search + recommendations      │
│  • FM-Intent: Predicts user intent AND next item simultaneously         │
│  • Trace: Meta-optimization of rec pipelines with LLM agents            │
│  • Conversational RS: Context-aware preference understanding            │
│                                                                          │
│  SPOTIFY                                                                 │
│  ─────────                                                               │
│  • Semantic IDs: Discretized embeddings added to LLaMA vocabulary       │
│  • Domain-aware LLMs: Fine-tuned on catalog entities                    │
│  • Unified model: Combined search + recommendation retrieval            │
│  • Use cases: Playlist sequencing, podcast recs, explanations           │
│                                                                          │
│  AMAZON                                                                  │
│  ─────────                                                               │
│  • Semantic IDs for product retrieval                                   │
│  • 30% recall increase in beauty category                               │
│  • LLM-powered product descriptions and comparisons                     │
│                                                                          │
│  MICROSOFT                                                               │
│  ───────────                                                             │
│  • RecAI: Open-source LLM4Rec research platform                        │
│  • InteRecAgent: LLMs + traditional RecSys integration                  │
│  • Copilot Shopping: Conversational commerce                            │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Netflix: From Static Models to LLM-Powered Personalization

Netflix has been at the forefront of LLM adoption for recommendations. Key insights from the Netflix PRS 2025 Workshop:

Python
# Netflix's approach: Unified model consolidation
class NetflixUniCoRn:
    """
    UniCoRn: Unified Contextual Ranker
    Serves both search and recommendations with a single model.
    """

    def __init__(self):
        # Single transformer model for multiple tasks
        self.unified_model = UnifiedRanker()

        # Task-specific heads
        self.search_head = nn.Linear(hidden_dim, 1)
        self.rec_head = nn.Linear(hidden_dim, 1)

        # Context encoder (handles diverse signals)
        self.context_encoder = ContextEncoder()

    def rank(
        self,
        user_context: dict,
        candidates: list[dict],
        task: str,  # "search" or "recommend"
    ) -> list[float]:
        """
        Unified ranking for search and recommendations.
        Key insight: Same user signals, same item features,
        just different task heads.
        """
        # Encode context (same for both tasks)
        context_emb = self.context_encoder(user_context)

        # Encode candidates
        candidate_embs = self.item_encoder(candidates)

        # Cross-attention
        hidden = self.unified_model(context_emb, candidate_embs)

        # Task-specific scoring
        if task == "search":
            scores = self.search_head(hidden)
        else:
            scores = self.rec_head(hidden)

        return scores


# FM-Intent: Predict intent and item together
class FMIntent:
    """
    Netflix's intent-aware recommendation.
    Predicts WHAT user wants to do and WHICH item simultaneously.
    """

    def predict(self, user_state: dict) -> tuple[str, list[dict]]:
        """
        Returns:
            intent: "browse", "search", "continue_watching", etc.
            items: Recommended items for that intent
        """
        # Joint prediction of intent and items
        # Not sequential (intent → items) but parallel
        pass

Netflix key learnings:

  • Model consolidation: Fewer specialized models, more unified architectures
  • LLMs for meta-optimization: Trace uses LLM agents to optimize recommendation pipelines
  • Periodic fine-tuning + RAG: Keeps models fresh without constant retraining

Spotify: Domain-Aware LLMs with Semantic IDs

Spotify's approach makes LLMs "domain-aware" by grounding them in catalog knowledge:

Python
class SpotifyDomainLLM:
    """
    Spotify's approach: Add catalog knowledge to LLM vocabulary.
    """

    def __init__(self, base_llm: str = "llama-3-8b"):
        self.llm = AutoModelForCausalLM.from_pretrained(base_llm)
        self.tokenizer = AutoTokenizer.from_pretrained(base_llm)

        # Semantic tokenization of catalog entities
        self.semantic_tokenizer = SemanticTokenizer()

    def add_catalog_to_vocabulary(self, catalog: list[dict]):
        """
        Convert catalog entities to semantic IDs and add to vocabulary.

        Process:
        1. Encode entities (artists, tracks, podcasts) with embeddings
        2. Discretize embeddings via LSH into "semantic tokens"
        3. Add semantic tokens to LLM vocabulary
        4. Fine-tune LLM on recommendation tasks
        """
        for entity in catalog:
            # Get embedding from content encoder
            embedding = self.content_encoder(entity)

            # Discretize to semantic ID (e.g., 4-8 tokens)
            semantic_id = self.semantic_tokenizer.encode(embedding)

            # Add to vocabulary with special prefix
            token_str = f"<{entity['type']}:{semantic_id}>"
            self.tokenizer.add_tokens([token_str])

        # Resize model embeddings
        self.llm.resize_token_embeddings(len(self.tokenizer))

    def recommend_with_instructions(
        self,
        user_history: list[str],  # Semantic IDs of past interactions
        instruction: str,  # e.g., "Create an upbeat workout playlist"
    ) -> list[str]:
        """
        Generate recommendations that follow user instructions.
        Unique capability: Steerable recommendations via natural language.
        """
        prompt = f"""User's listening history:
{' '.join(user_history[-20:])}

Instruction: {instruction}

Generate a sequence of recommended tracks:"""

        output = self.llm.generate(prompt, max_tokens=100)
        return self._parse_semantic_ids(output)

Spotify use cases enabled:

  • Playlist sequencing with coherent flow
  • Cold-start video recommendations
  • Personalized podcast discovery
  • Natural language recommendation explanations
  • Unified search + recommendation

Key Frameworks and Tools

InteRecAgent (Microsoft, TOIS 2025)

InteRecAgent bridges LLMs and traditional recommenders:

Python
class InteRecAgent:
    """
    InteRecAgent: LLM as brain, RecSys as tools.
    Paper: https://dl.acm.org/doi/10.1145/3731446
    """

    def __init__(self, llm_client, rec_tools: dict):
        self.llm = llm_client

        # Traditional RecSys models as tools
        self.tools = {
            "collaborative_filter": rec_tools["cf_model"],
            "content_based": rec_tools["content_model"],
            "popularity": rec_tools["popularity_model"],
            "search": rec_tools["search_index"],
        }

        # Memory for conversation state
        self.memory = ConversationMemory()

        # Task planner
        self.planner = TaskPlanner(llm_client)

    async def interact(self, user_message: str, user_id: str) -> str:
        """
        Interactive recommendation through conversation.
        LLM decides which tools to use and how to combine results.
        """
        # Plan tasks based on user message
        tasks = await self.planner.plan(user_message, self.memory)

        results = {}
        for task in tasks:
            if task.type == "get_recommendations":
                results["recs"] = self.tools["collaborative_filter"].recommend(
                    user_id, n=task.params.get("n", 10)
                )
            elif task.type == "search":
                results["search"] = self.tools["search"].search(
                    task.params["query"]
                )
            elif task.type == "explain":
                results["explanation"] = await self._generate_explanation(
                    results.get("recs", [])
                )

        # Synthesize response
        response = await self._synthesize_response(results, user_message)

        # Update memory
        self.memory.add(user_message, response)

        return response

InteRecAgent benefits:

  • Traditional RecSys handles behavioral patterns efficiently
  • LLM handles natural language understanding and explanation
  • Modular: Can upgrade either component independently

TALLRec (RecSys 2023)

TALLRec provides a tuning framework for aligning LLMs with recommendations:

Python
# TALLRec: Two-stage tuning for recommendation LLMs
class TALLRecTrainer:
    """
    TALLRec tuning framework.
    Stage 1: Instruction tuning (general capability)
    Stage 2: Recommendation tuning (domain-specific)
    """

    def __init__(self, base_model: str = "llama-7b"):
        self.model = AutoModelForCausalLM.from_pretrained(base_model)
        self.tokenizer = AutoTokenizer.from_pretrained(base_model)

    def stage1_instruction_tuning(self, instruction_data: list[dict]):
        """
        Stage 1: General instruction following.
        Uses Stanford Alpaca or similar data.
        """
        # Standard instruction tuning
        for example in instruction_data:
            prompt = f"Instruction: {example['instruction']}\nResponse:"
            target = example['response']
            # Train with cross-entropy loss
            pass

    def stage2_recommendation_tuning(self, rec_data: list[dict]):
        """
        Stage 2: Recommendation-specific tuning.
        Teaches the model to recommend items.
        """
        # Recommendation-specific prompts
        for example in rec_data:
            prompt = f"""User has interacted with: {example['history']}
Based on this history, recommend the next item."""
            target = example['next_item']
            # Train with cross-entropy loss
            pass

    def create_rec_prompt(self, history: list[str], task: str) -> str:
        """Create recommendation prompt in TALLRec format."""
        templates = {
            "sequential": "Given the user's history: {history}, predict the next item.",
            "rating": "How would this user rate {item}? History: {history}",
            "explanation": "Explain why {item} is recommended given: {history}",
        }
        return templates[task].format(history=", ".join(history))

MSRBench: Evaluating LVLMs for Recommendations

MSRBench (ACM Web Conference 2025) provides the first comprehensive benchmark for Large Vision-Language Models in multimodal sequential recommendation:

Python
class MSRBenchEvaluator:
    """
    MSRBench: Benchmark for LVLMs in recommendation.
    Tests GPT-4V, GPT-4o, Claude-3-Opus on next-item prediction.
    """

    # Integration strategies tested
    STRATEGIES = [
        "lvlm_as_recommender",  # Direct recommendation
        "lvlm_as_item_enhancer",  # Generate item descriptions
        "lvlm_as_reranker",  # Rerank traditional candidates
        "hybrid_enhance_rerank",  # Combination
    ]

    def evaluate(
        self,
        model: str,  # "gpt-4-vision", "gpt-4o", "claude-3-opus"
        strategy: str,
        dataset: str = "amazon_review_plus",
    ) -> dict:
        """
        Evaluate LVLM on next-item prediction with images.
        """
        results = {}

        for strategy in self.STRATEGIES:
            if strategy == "lvlm_as_reranker":
                # Best performing strategy
                # Traditional model retrieves, LVLM reranks
                candidates = self.traditional_model.retrieve(user, k=100)
                reranked = self.lvlm_rerank(model, user, candidates)
                results[strategy] = self.compute_metrics(reranked)

        return results

    def lvlm_rerank(
        self,
        model: str,
        user_context: dict,
        candidates: list[dict],
    ) -> list[dict]:
        """
        Use LVLM to rerank candidates based on images + text.
        """
        prompt = f"""Given this user's recent purchases:
{self._format_history_with_images(user_context['history'])}

Rank these candidate items by relevance:
{self._format_candidates_with_images(candidates)}

Return ranked item IDs."""

        response = self.call_lvlm(model, prompt)
        return self._parse_ranking(response)

MSRBench key findings:

  • LVLMs as rerankers is the most effective strategy
  • GPT-4o consistently outperforms GPT-4V and Claude-3-Opus
  • Computational cost remains a barrier to real-time adoption
  • Multimodal context significantly improves cold-start performance

RecSys 2025 Best Paper Insights

The RecSys 2025 Best Paper focused on conformal risk control for mitigating unwanted recommendations—a key concern as LLMs generate more creative outputs.

Key 2025 research themes:

  1. Fine-tuning + RAG combination: Keeps models fresh without constant retraining
  2. LLM agents for pipeline optimization: Meta-level improvements
  3. Multimodal integration: Images, video, audio in recommendations
  4. Scalability solutions: Efficient LLM serving for real-time recommendations

Part X: Prompt Engineering for Recommendations

Why Prompting Matters for RecSys

The quality of LLM-powered recommendations depends heavily on how you structure prompts. Unlike traditional ML where the model architecture determines capability, LLMs can perform radically different tasks based on prompt design. A well-crafted prompt can mean the difference between generic suggestions and personalized, actionable recommendations.

The fundamental insight: The same LLM with different prompts produces vastly different recommendation quality. Prompts determine what user context the LLM considers, how it reasons about preferences, and whether outputs are reliable enough for production use.

Five key dimensions of recommendation prompts:

  1. Context Framing: How you present user history and preferences. Recency, relevance, and diversity of context all matter. Dumping entire history is counterproductive—selective context yields better results.

  2. Task Specification: What exactly you want the LLM to do. "Recommend items" is vague. "Select 5 items under $100 that match their casual style preferences" is actionable.

  3. Output Structure: Format for reliable parsing. Free-text responses are hard to use programmatically. JSON arrays of item IDs integrate cleanly with downstream systems.

  4. Reasoning Guidance: Whether to encourage chain-of-thought. For complex recommendations, asking the LLM to first analyze preferences, then match candidates, improves quality and provides explainability.

  5. Constraints: Guardrails on what can/cannot be recommended. In-stock items only, price limits, excluded categories, and valid item ID lists prevent hallucination.

Core Prompt Patterns

Pattern 1: Direct Recommendation

The simplest pattern: provide context, request recommendations, specify format. Best for fast recommendations when you have a good candidate set. Structure the prompt with clear sections: user history (most recent first), candidate items (with IDs, titles, categories, prices), the task (select exactly N items), and output format (JSON array of item IDs).

Pattern 2: Chain-of-Thought Recommendation

Encourage explicit reasoning for better recommendations and explainability. Structure the prompt to guide step-by-step analysis: first identify patterns in user history (categories, price range, brands, time patterns), then understand current intent (browsing vs buying, new interest vs continuing pattern), then match candidates (explain fit for each), and finally provide ranked recommendations with confidence scores.

This pattern is more expensive (more tokens) but produces higher-quality recommendations for complex queries and provides reasoning that can be shown to users or used for debugging.

Pattern 3: Persona-Based Prompting

Assign the LLM a specific expert persona for domain-specific recommendations. A fashion recommendation prompt might begin: "You are a personal stylist with 15 years of experience at luxury fashion houses. You understand body types, color theory, occasion dressing, and current trends."

Different domains benefit from different personas—a sommelier for wine, a tech reviewer for electronics, a literary curator for books. The persona shapes the recommendation style, vocabulary, and what factors the LLM emphasizes.

Pattern 4: Few-Shot Learning

Show examples of good recommendations to guide the model's output style. Include 2-3 examples showing: user history summary, user query, recommended item, and explanation. Then present the current task in the same format. This is particularly effective for maintaining consistent tone and reasoning depth across your recommendation system.

Optimization Techniques

Dynamic Context Selection: Not all user history is equally relevant. For a query about running shoes, recent athletic wear purchases matter more than a book bought last year. Select context based on recency, relevance to current query (via embedding similarity or keyword matching), and diversity (include variety of categories to capture full preference profile). Typically 10-20 carefully selected interactions outperform hundreds of undifferentiated history items.

Output Constraints and Validation: The most critical technique for production systems. Constrain the LLM to ONLY recommend from a provided list of valid item IDs. Specify constraints explicitly: maximum price, allowed categories, excluded brands, in-stock only. After receiving the response, always validate that returned IDs exist in your catalog—never trust LLM output without verification.

Temperature for Diversity: Lower temperature (0.3) produces focused, consistent recommendations—good for "more like this" scenarios. Higher temperature (0.9-1.0) produces more creative, unexpected suggestions—good for discovery. For most use cases, balanced temperature (0.6-0.7) provides a mix of safe bets and discoveries.

Multi-Sample Aggregation: For discovery-focused recommendations, generate multiple recommendation sets with high temperature and aggregate. Items appearing in multiple samples are more robust recommendations. Items appearing in only one sample are more exploratory.

Versioned Prompt Templates

Production systems need tested, versioned prompt templates for different scenarios:

  • Quick suggestions: Fast, low-token prompts for homepage recommendations. Temperature 0.5, max 100 tokens.
  • Detailed recommendation: Full context, chain-of-thought, explanations. Temperature 0.7, max 1000 tokens.
  • Cold start: For new users with no history. Focus on stated interests and popular items. Temperature 0.6.
  • Explanation only: Generate explanations for recommendations made by traditional models. Temperature 0.5, max 150 tokens.

Version your templates, track which versions are in production, and A/B test changes. Prompt engineering is iterative—small wording changes can significantly impact recommendation quality.

Common Mistakes to Avoid

Code
┌─────────────────────────────────────────────────────────────────────────┐
│              COMMON PROMPTING MISTAKES IN RECSYS                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  1. TOO MUCH CONTEXT                                                    │
│  ─────────────────────                                                   │
│  ✗ "Here's the user's entire 3-year history..."                        │
│  ✓ Select 10-20 most relevant recent interactions                       │
│                                                                          │
│  2. VAGUE INSTRUCTIONS                                                  │
│  ───────────────────────                                                 │
│  ✗ "Recommend some good items"                                          │
│  ✓ "Recommend 5 items matching their style preferences, under $100"    │
│                                                                          │
│  3. NO OUTPUT FORMAT                                                    │
│  ─────────────────────                                                   │
│  ✗ "Give me your recommendations"                                       │
│  ✓ "Return a JSON array of item IDs: [\"id1\", \"id2\", ...]"          │
│                                                                          │
│  4. ALLOWING HALLUCINATION                                              │
│  ───────────────────────────                                             │
│  ✗ "Recommend items for this user"                                      │
│  ✓ "Recommend ONLY from this list: [item_id_1, item_id_2, ...]"        │
│                                                                          │
│  5. IGNORING CONSTRAINTS                                                │
│  ─────────────────────────                                               │
│  ✗ Generic recommendations regardless of availability                   │
│  ✓ Specify: in_stock, max_price, excluded_categories                   │
│                                                                          │
│  6. ONE-SIZE-FITS-ALL                                                   │
│  ──────────────────────                                                  │
│  ✗ Same prompt for all recommendation scenarios                         │
│  ✓ Different templates for: quick, detailed, cold_start, explanation   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Enrico Piovano, PhD

Co-founder & CTO at Goji AI. Former Applied Scientist at Amazon (Alexa & AGI), focused on Agentic AI and LLMs. PhD in Electrical Engineering from Imperial College London. Gold Medalist at the National Mathematical Olympiad.

Related Articles

RecSysPersonalization

Recommendation Systems: From Collaborative Filtering to Deep Learning

A comprehensive journey through recommendation system architectures. From the Netflix Prize and matrix factorization to neural collaborative filtering and two-tower models—understand the foundations before the transformer revolution.

30 min read
RecSysPersonalization

Transformers for Recommendation Systems: From SASRec to HSTU

A comprehensive deep dive into transformer-based recommendation systems. From the fundamentals of sequential recommendation to Meta's trillion-parameter HSTU, understand how attention mechanisms revolutionized personalization.

30 min read
EducationAgentic AI

Building Agentic AI Systems: A Complete Implementation Guide

A comprehensive guide to building AI agents—tool use, ReAct pattern, planning, memory, context management, MCP integration, and multi-agent orchestration. With full prompt examples and production patterns.

30 min read
EducationLLMs

LLM-Powered Search for E-Commerce: Beyond NER and Elasticsearch

A deep dive into building intelligent e-commerce search systems that understand natural language, leverage metadata effectively, and support multi-turn conversations—moving beyond classical NER + Elasticsearch approaches.

30 min read
LLMsAgentic AI

Structured Outputs and Tool Use: Patterns for Reliable AI Applications

Master structured output generation and tool use patterns—JSON mode, schema enforcement, Instructor library, function calling best practices, error handling, and production patterns for reliable AI applications.

8 min read

Embedding Models & Strategies: Choosing and Optimizing Embeddings for AI Applications

Comprehensive guide to embedding models for RAG, search, and AI applications. Comparison of text-embedding-3, BGE, E5, Cohere Embed v4, and Voyage with guidance on fine-tuning, dimensionality, multimodal embeddings, and production optimization.

15 min read
PromptingLLMs

Advanced Prompt Engineering: From Basic to Production-Grade

Master the techniques that separate amateur prompts from production systems—chain-of-thought, structured outputs, model-specific optimization, and prompt architecture.

10 min read