Skip to main content
Back to Blog

Building a Local Concierge: Combining Retrieval and Agentic AI for Real-World Actions

A comprehensive guide to building AI systems that both retrieve information (restaurants, events, attractions) and take real-world actions (booking reservations, purchasing tickets). Covers hybrid architectures combining MCP servers for API integrations with browser automation for universal site access.

30 min read
Share:

Introduction

Most AI assistants today excel at one thing: answering questions. Ask about the best Italian restaurants nearby, and they'll provide a helpful list. But ask them to book a table at one of those restaurants, and they hit a wall. The gap between information retrieval and real-world action represents one of the most significant opportunities in applied AI.

This post explores how to build a local concierge system—an AI that can both find information about restaurants, bars, events, and attractions, and actually take actions like making reservations, booking tickets, and scheduling appointments. We'll examine the architectural patterns that make this possible, the decision frameworks for choosing between different implementation approaches, and the practical considerations for production deployment.

The value proposition is clear: users want to go from intent to outcome in a single conversation. "Find me a romantic Italian restaurant with outdoor seating and book a table for two this Saturday" should just work. Building systems that deliver on this promise requires thoughtful architecture spanning retrieval, action execution, conversation management, and production operations.

The Two Sides of a Concierge

A useful local concierge must excel at two fundamentally different tasks that require different architectural approaches.

The Retrieval Side

When a user asks "What are some good tapas places near Union Square?", the system needs to search across multiple data sources, aggregate and rank results, filter based on user preferences, and present information in a helpful format. This is a retrieval problem—the system is finding and presenting existing information.

Retrieval queries have clear characteristics: they seek information rather than action, they can be answered from existing data sources, they don't require authentication or user-specific state, and their success is measured by relevance and completeness.

The retrieval side draws on established RAG (Retrieval-Augmented Generation) patterns but with domain-specific considerations. Place data has strong geographic and temporal dimensions. Relevance depends heavily on context—"good restaurant" means different things for a business dinner versus a casual lunch. User preferences significantly shape what counts as a good result.

The Agentic Side

When the same user says "Book me a table for 4 at Coqueta tonight at 7pm", the system needs to navigate a booking interface, handle authentication, input reservation details, confirm availability, complete the booking, and report the outcome. This is an agentic problem—the system is taking actions in the world.

Agentic tasks have different characteristics: they modify state in external systems, they often require authentication and user credentials, they involve multi-step workflows with branching logic, their success is measured by whether the action completed successfully, and failures have real consequences (missed reservations, double bookings, wasted money).

The agentic side requires careful attention to reliability, error handling, and user control. Unlike retrieval where a suboptimal result is merely unhelpful, agentic failures can have tangible negative consequences. The system must be trustworthy enough that users feel comfortable delegating real-world actions.

The Hybrid Challenge

The interesting challenge is that real conversations often involve both. A user might ask "What's a good Italian place with outdoor seating that I can get a reservation at tonight?"—a query that requires searching for restaurants, filtering by features, checking availability across multiple venues, and potentially making a reservation. The system must seamlessly blend retrieval and action.

Hybrid queries create interesting ordering considerations. Should the system search first and then check availability? Or should it only show restaurants that definitely have tables? The former gives more options but some may be unavailable. The latter is more actionable but might miss good options that have availability not reflected in APIs.

The best approach often depends on context. For casual exploration, show all relevant options and indicate availability where known. For urgent bookings, filter aggressively to only show actionable options. The system should adapt based on user signals and explicit preferences.

High-Level Architecture

A local concierge system requires several interconnected components working together. The architecture must support both information retrieval and action execution while maintaining a coherent conversation experience.

System Components Overview

The complete system comprises several layers. The conversation layer manages dialogue state, handles user input, and coordinates responses. The intent layer classifies user requests and routes them appropriately. The retrieval layer searches for places, events, and information. The action layer executes bookings, reservations, and purchases. The integration layer connects to external APIs and services. The persistence layer stores user preferences, conversation history, and cached data.

These layers communicate through well-defined interfaces. The conversation layer doesn't need to know whether a request was handled by retrieval or action—it receives structured results in a consistent format. This separation enables independent development and testing of each layer.

Intent Classification Layer

The first decision point is understanding what the user wants. Intent classification determines whether a query is purely informational (retrieval), action-oriented (agentic), or a combination requiring both capabilities.

Rule-Based Classification

Simple rule-based classification catches obvious cases. Queries containing explicit action words like "book", "reserve", "buy tickets", "make an appointment", "purchase", or "schedule" likely require action. Questions starting with "what", "where", "which", or "how many" typically seek information. Imperatives often indicate desired actions.

Rule-based classification is fast and predictable but limited. It misses implicit requests—"I need a table at a nice restaurant" could be a search query or a booking request depending on context. It also fails with ambiguous phrasing and doesn't handle the many ways users express the same intent.

LLM-Based Classification

More sophisticated classification uses the LLM itself to determine intent. The model receives the user message along with conversation history and outputs a structured classification.

The classification output includes the primary intent type (search, action, hybrid, clarification_needed), confidence level (high, medium, low), extracted entities (location, date, time, party size, venue name, cuisine type, price range), required slots for action (what information is still needed), and suggested clarification questions if intent is unclear.

LLM-based classification handles nuance better. It understands that "I'm looking for somewhere to take my parents for their anniversary" implies a search for special occasion restaurants, while "Can you get me into State Bird Provisions this Friday?" implies a booking request for a specific venue. Context from conversation history helps disambiguate—if the user just received restaurant recommendations, "book the second one" clearly requests action.

Hybrid Classification Approach

The practical approach combines both methods. Rule-based classification handles clear cases quickly. LLM-based classification handles ambiguous cases. Confidence thresholds determine when to ask for clarification versus proceeding with the most likely interpretation.

The classification layer should also identify multi-part requests. "Find Italian restaurants and book the best one" requires both search and action in sequence. "Book tables at two restaurants for the same time as backup" requires parallel actions. Decomposing complex requests into atomic operations enables proper handling.

Retrieval Layer

The retrieval layer handles all information-seeking queries, pulling data from multiple sources and presenting unified results.

Data Sources for Places

Different data sources serve different query types with varying strengths and coverage.

Google Places API provides the most comprehensive coverage for restaurants, bars, and attractions. It offers ratings, reviews, hours, photos, price level, and contact information. The API supports text search, nearby search, and place details. Pricing is per-request with different tiers for basic and detailed information. The data tends to be accurate and up-to-date due to Google's scale.

Yelp Fusion API offers detailed reviews and business attributes. Yelp's strength is review depth—users write longer, more detailed reviews than on Google. The API provides business search, business details, and review access. Yelp categories are often more specific than Google's, which helps for niche queries. The free tier has limited requests; commercial use requires a partnership.

OpenStreetMap provides free geographic data with community contributions. Coverage varies by region—excellent in some areas, sparse in others. The data includes geographic features that other sources lack. OSM is particularly useful for outdoor activities, parks, and public spaces. The Overpass API enables complex geographic queries.

Foursquare Places API offers location data with user tips and check-in patterns. Foursquare's strength is understanding venue popularity patterns—when places are busy, who visits them. The Tastes feature captures subjective attributes like "romantic" or "trendy". API access includes search, venue details, and tips.

For restaurants specifically, The Infatuation, Eater, and local food critics provide curated recommendations with editorial quality. These sources don't typically have APIs but can be scraped or accessed through partnerships. Their value is curation—a curated list of "best pizza" is often more useful than algorithmic ranking.

Data Sources for Events

Event data requires different sources than place data.

Eventbrite API covers ticketed events with comprehensive search and filtering. It's strong for concerts, conferences, workshops, and community events. The API supports event search by location, date, category, and keyword. For events hosted on Eventbrite, you can also access ticket availability and pricing.

Ticketmaster Discovery API covers major concerts, sports, and entertainment events. Coverage focuses on larger venues and professional events. The API provides event search, venue information, and ticket availability. Integration with Ticketmaster's ticketing system enables purchase flows.

Meetup API covers community events and gatherings. Strong for tech meetups, hobby groups, and social events. The API supports group search, event search, and RSVP functionality. Many smaller, informal events only appear on Meetup.

Facebook Events covers social events but is increasingly difficult to access programmatically. The Graph API has limited event access. Web scraping is fragile and against ToS. Consider Facebook Events as a gap in coverage rather than a reliable source.

Local sources vary by market. City event calendars, venue websites, and local publications often have event listings. Some provide APIs or structured data. Others require scraping. Building local source integrations provides differentiation but requires market-by-market effort.

Multi-Source Aggregation

Real queries often require combining multiple sources. A search for "romantic restaurants" might query Google Places for restaurant listings, Yelp for detailed ambiance reviews, and OpenTable for availability—all to provide a comprehensive answer.

Entity resolution is the first challenge. The same restaurant appears in multiple sources with slightly different names, addresses, and identifiers. "Café de la Presse" in Google might be "Cafe de la Presse" in Yelp and "Café De La Presse" in OpenTable. The aggregation layer must recognize these as the same entity.

Entity resolution approaches include exact address matching (most reliable but addresses have variations), fuzzy name matching with location proximity (handles name variations), phone number matching (reliable when available), and cross-reference databases (some services provide ID mappings).

Score normalization makes ratings from different sources comparable. Google uses a 5-star scale with typically high ratings (4.0+ is common). Yelp uses 5 stars with more variation and lower averages. Foursquare uses a 10-point scale. Direct comparison requires normalization—converting all scores to a common scale while preserving relative rankings within each source.

Result ranking combines signals from multiple sources into a unified ranking. The ranking function considers relevance to query (semantic match to what the user asked), quality signals (ratings, review counts, editorial mentions), freshness (recent reviews weighted more for restaurants), user preferences (personalization based on past behavior), and availability (for hybrid queries, prioritize bookable options).

Semantic Search Over Place Data

Beyond structured API queries, semantic search enables natural language understanding of place attributes. Not everything users care about is captured in structured fields. "Restaurants with a speakeasy vibe" matches places where reviews mention "hidden entrance", "cocktail bar feel", or "prohibition era decor"—even if no structured field captures this.

Building semantic search over places involves several steps. First, gather text for each place: descriptions, reviews, tips, menu items, and any other textual content. Then, generate embeddings for this text using a model like OpenAI's text-embedding-3-small or an open-source alternative. Store embeddings in a vector database with place metadata. At query time, embed the query and find places with similar embeddings.

This creates a two-stage retrieval pipeline. Stage one uses structured API queries to filter by location, category, price range, and basic attributes—reducing the candidate set to a manageable size. Stage two uses semantic search over the filtered results to rank by match to the natural language query. This combination provides both precision (structured filters) and nuance (semantic understanding).

Caching Strategies

Place data changes slowly—restaurant hours, locations, and ratings don't change minute to minute. A cache layer with appropriate TTLs reduces API calls, improves response latency, and reduces costs.

Different data types warrant different cache durations. Basic place information (name, address, phone) can be cached for days or weeks. Ratings and review counts can be cached for hours to a day. Hours of operation should be checked daily or when queried near opening/closing. Availability information should not be cached or cached only for minutes.

Cache invalidation is straightforward for time-based expiry but harder for event-driven changes. A restaurant might close permanently, change hours, or undergo significant changes. Periodic full refreshes combined with user feedback ("this place is permanently closed") handles most cases.

Geographic caching organizes data by location. When a user searches in a new area, prefetch data for nearby places anticipating follow-up queries. Tile-based caching (similar to map tiles) enables efficient geographic coverage.

Agentic Layer

The agentic layer handles all action-oriented tasks, executing multi-step workflows to accomplish user goals.

Tool Architecture

Actions are implemented as tools that the LLM can invoke. Each tool has a clear interface: name and description (for the model to understand when to use it), input schema (structured parameters the tool accepts), execution logic (the code that performs the action), and output schema (structured results including success/failure and relevant data).

Well-designed tool interfaces share common characteristics. Names are verb-based and descriptive: search_restaurant_availability, make_reservation, cancel_booking. Descriptions explain when to use the tool and what it accomplishes. Input schemas use clear parameter names with descriptions. Required versus optional parameters are explicit. Output schemas include success/failure indication, relevant data on success, and error details on failure.

MCP Servers for API Integrations

When booking systems offer APIs, MCP (Model Context Protocol) servers provide clean integration. An MCP server wraps the API, exposing tools that the agent can call.

MCP server architecture separates concerns effectively. The MCP server handles API authentication, manages rate limits, translates between tool interface and API format, handles errors and retries, and maintains any necessary session state. The agent sees only the clean tool interface.

For restaurant reservations, OpenTable and Resy offer APIs for searching availability and making reservations. An MCP server for OpenTable might expose these tools:

The search_availability tool accepts restaurant_id, date, time, party_size, and optional time_flexibility parameters. It returns a list of available time slots with slot_id, time, and any notes.

The make_reservation tool accepts slot_id, guest_name, guest_email, guest_phone, and optional special_requests parameters. It returns confirmation_number, confirmed_time, and any restaurant-specific confirmation details.

The cancel_reservation tool accepts confirmation_number and optional reason parameters. It returns cancellation_status and any applicable cancellation policies or fees.

The get_reservation_details tool accepts confirmation_number and returns full reservation details for user reference.

For event tickets, Ticketmaster and Eventbrite provide ticketing APIs. An MCP server for Eventbrite might expose:

The search_events tool for finding events by location, date range, category, and keywords.

The get_event_details tool for retrieving comprehensive event information including description, venue, schedule, and ticket types.

The check_ticket_availability tool for checking what tickets are available and at what prices.

The reserve_tickets tool for temporarily holding tickets while the user confirms.

The purchase_tickets tool for completing the ticket purchase with payment information.

MCP servers handle authentication complexity. Many APIs use OAuth2, requiring token management, refresh flows, and scope handling. The MCP server manages this, presenting a simple authenticated interface to the agent. For user-specific credentials (booking on behalf of the user), the server handles secure credential storage and per-user authentication state.

Rate limiting happens at the MCP server level. The server tracks API usage, queues requests when approaching limits, and returns appropriate errors when limits are exceeded. This prevents the agent from overwhelming external APIs and incurring overage charges.

Browser Automation for Universal Access

Many booking systems don't offer public APIs—or their APIs don't support the actions users want. Browser automation provides universal access—if a human can book it through a website, an agent can too.

Browser automation works by controlling a real browser (typically headless Chrome or Firefox). The automation layer navigates to booking pages, extracts page structure to understand available actions, fills forms with user-provided information, handles interactive elements like date pickers and seat maps, completes checkout flows including payment, and extracts confirmation information from result pages.

The Browser-Use library provides a foundation for this. It handles DOM extraction (converting page structure to a format the agent can understand), element identification (finding buttons, forms, and interactive elements), and action execution (clicking, typing, scrolling, waiting).

The agent receives a structured representation of the page including visible text, interactive elements with identifiers, form fields with current values, and navigation options. Based on this representation, the agent decides what action to take: click a specific element, type into a field, scroll to reveal more content, or wait for a page to load.

Browser automation is more fragile than API integration for several reasons. Sites change layouts, breaking element selectors. A/B tests mean different users see different interfaces. Anti-bot measures detect and block automated access. JavaScript-heavy sites may not render properly in headless mode. CAPTCHAs require human intervention or solving services.

Strategies for improving browser automation reliability include using semantic selectors (button containing "Reserve" rather than specific CSS classes), implementing retry logic with re-evaluation of page state, maintaining fallback selectors for common patterns, monitoring for failures and alerting when success rates drop, and having human fallback for cases automation can't handle.

The Hybrid Routing Approach

The hybrid approach uses MCP servers where APIs are available and falls back to browser automation otherwise. This combines the reliability of API integration with the universality of browser automation.

A routing layer determines which approach to use for each action. The router maintains a registry of supported services and their integration methods. When an action is requested, the router looks up the target service and routes to the appropriate handler.

Routing logic considers several factors. Is there an API integration available? Route to MCP. Is there a tested browser automation flow? Route to browser automation. Is this an unknown service? Attempt generic browser automation with lower confidence. Has this service recently failed via browser? Alert for human review.

For services where the API is limited, use API for what's available and browser for the rest. OpenTable's API might support availability search but not special requests—use API for availability, browser for completing reservations with special requests.

The routing layer also handles failover. If an API call fails, the system can fall back to browser automation for that action. If browser automation fails repeatedly, the system can escalate to human handling or inform the user that automatic booking isn't available.

Conversation Management

Effective concierge interactions require sophisticated conversation management that tracks context, manages multi-turn flows, and handles interruptions gracefully.

State Tracking

The conversation state must track several things to support coherent multi-turn interactions.

Current task state tracks what the user is trying to accomplish right now. This might be "searching for restaurants", "booking at Coqueta", or "browsing events for Saturday". The task state affects how follow-up messages are interpreted.

Gathered information tracks what we know so far. For a booking task, this includes venue, date, time, party size, and any special requests. For a search, this includes search criteria that should be maintained across refinements.

Entity references track mentioned entities for later reference. When the user says "the second one" or "that Italian place", the system needs to resolve these references to specific entities.

Action status tracks in-progress and completed actions. If a booking is in progress, the state tracks where we are in the flow. Completed actions are recorded for reference and potential modification.

Conversation history provides context for understanding new messages. The LLM needs access to recent messages to understand references and maintain coherence.

Slot Filling for Actions

Many actions require multiple pieces of information. Booking a restaurant requires venue, date, time, and party size at minimum. More complex bookings might need guest name, contact information, seating preferences, and special requests.

The system must track which slots are filled, prompt for missing information naturally, and handle partial information. Slot filling should feel conversational, not like filling out a form.

Proactive slot filling infers information when possible. If the user says "book dinner for my wife and me", party size is 2. If they say "this Saturday evening", date is the upcoming Saturday and time is approximately 6-9 PM. If they've been searching for Italian restaurants, cuisine preference is established.

Multi-slot prompts feel more natural than one-by-one questioning. Rather than asking "What date?" then "What time?" then "How many people?", combine: "What day were you thinking, and how many people?" Let users provide information in any order: "Saturday, 7pm, four of us" fills three slots at once.

Handling ambiguity requires clarification. "Next Friday" is ambiguous near week boundaries. "Evening" could mean 5pm or 9pm depending on context. When ambiguity affects the action, ask for clarification. When it doesn't materially matter, make a reasonable choice and be transparent about it.

Slot modification handles changes. The user might say "Actually, make that 6 people" after providing party size. The system should update the slot and confirm the change, not restart the entire flow.

Confirmation and Feedback

Before taking irreversible actions, the system should confirm with the user. This is especially important for actions with financial implications (ticket purchases) or social commitments (reservations).

Confirmation should summarize what the system is about to do in clear terms. "I'm going to book a table for 4 at Coqueta on Saturday, January 11th at 7:00 PM. The reservation will be under John Smith. Should I proceed?"

The confirmation includes all relevant details so the user can verify correctness. It should be clear what happens if they confirm—the action is taken, and they're committed to the reservation. Users should be able to modify details at this stage: "Yes, but change it to 7:30" should work without restarting.

After actions complete, feedback should be immediate and clear. Success messages include confirmation numbers, relevant details, and any next steps: "Your reservation is confirmed! Confirmation number: R12345. They'll hold the table for 15 minutes past your reservation time."

Failure messages explain what went wrong and suggest alternatives. "Sorry, that time is no longer available. I found tables at 6:30 PM and 8:00 PM instead. Would either of those work?"

Handling Interruptions and Context Switches

Users change their minds mid-conversation. A user might start booking one restaurant, then ask about another, then return to the first. The system must handle these context switches gracefully.

Context switch detection recognizes when the user is changing topics versus continuing the current task. "Actually, what about Italian places?" while booking at a Spanish restaurant suggests a context switch. "Do they have vegetarian options?" while booking is a question about the current task.

State preservation saves progress when switching contexts. If the user was halfway through booking restaurant A and asks about restaurant B, the state for booking A should be preserved. If they decide to return to A, they shouldn't have to restart.

Return detection recognizes when the user wants to resume a previous task. "Let's go with the first one" after looking at alternatives returns to a previous context. "Never mind, book Coqueta" explicitly returns to a prior booking task.

Abandoned task handling decides when to discard preserved state. A task abandoned for hours is unlikely to be resumed. A task abandoned for a different booking can probably be discarded. Explicit cancellation ("forget about that") clears state immediately.

Multi-Turn Flow Management

Complex tasks span multiple turns with dependencies between steps. Booking dinner and a show requires finding restaurants, selecting one, finding nearby theaters, selecting a show, and potentially booking both with timing coordination.

Flow orchestration manages these multi-step processes. The system tracks the overall goal, current step, completed steps, and dependencies. Each step might itself be multi-turn (selecting a restaurant involves search, refinement, and final selection).

Dependency handling ensures steps happen in the right order. You need to select a restaurant before checking availability. You need availability before booking. Some dependencies are flexible—you might book dinner before or after booking show tickets.

Parallel execution handles independent steps simultaneously. While checking restaurant availability, the system can also search for shows. This reduces overall latency for complex requests.

Failure handling in multi-step flows is nuanced. If restaurant booking fails after show tickets are purchased, what happens? The system should either roll back the entire flow or clearly communicate the partial state to the user.

Tool Design Patterns

Effective tools follow patterns that make them reliable and composable.

Idempotency and Safety

Tools that modify state should be designed for safety. Idempotent tools can be called multiple times without different effects—important when retrying after failures. If a network error occurs after a booking is made but before confirmation is received, retrying the booking shouldn't create a duplicate reservation.

Achieving idempotency often requires server-side support. The client provides an idempotency key (a unique identifier for the request). If the server sees the same key again, it returns the previous result rather than processing again. For APIs without idempotency support, the client must track in-flight requests and detect duplicates.

Non-destructive tools avoid irreversible actions without explicit confirmation. Rather than a single book_and_charge tool, separate tools for reserve (holdable) and confirm (commits) enable user verification before irreversible commitment.

For booking tools, this means separating "hold" from "confirm" where possible. A hold_reservation tool temporarily reserves a slot without committing. A confirm_reservation tool finalizes the booking. This allows the agent to secure availability while waiting for user confirmation—critical when availability is limited and might disappear.

Error Handling and Recovery

Tools should return structured errors that the agent can act on. Rather than generic "booking failed", errors should indicate the type of failure, whether it's recoverable, and suggested next steps.

Error taxonomy for a concierge system includes several categories.

Availability errors indicate the requested slot is taken. These are common and expected. Response: suggest alternatives. The error should include what alternatives are available if known.

Authentication errors indicate credentials are invalid or expired. Response: prompt user to re-authenticate or provide credentials. The error should indicate which service needs authentication.

Validation errors indicate input doesn't meet requirements. Response: explain what's wrong and what valid input looks like. Example: party size must be 1-20, time must be during business hours.

Rate limit errors indicate too many requests to the external service. Response: wait and retry, or inform user of delay. The error should include retry timing if known.

Transient errors indicate temporary failures that might succeed on retry. Response: automatic retry with backoff. After multiple failures, inform user and offer manual alternatives.

Permission errors indicate the user isn't authorized for this action. Response: explain what's needed. Example: premium events require a verified account.

Service errors indicate the external service is having problems. Response: inform user, suggest trying later. Monitor for widespread issues.

Each error type maps to different agent behavior. The agent should understand the error taxonomy and take appropriate action—not just report the error to the user.

Composability

Complex actions often require multiple tools working together. Booking dinner and a show requires finding restaurants, checking availability, finding nearby theaters, checking show times, and potentially booking both.

Tools should be composable—the output of one tool serves as input to another. This requires consistent data formats for entities. A restaurant entity includes a standard identifier, name, location, and other attributes. The same format is returned by search tools and accepted by availability tools.

Entity references enable multi-tool workflows. When a search returns restaurants, each includes an identifier. The availability tool accepts that identifier. The booking tool references the slot from availability. This chain works because the entity format is consistent.

Batch operations improve efficiency for common patterns. Rather than checking availability at five restaurants sequentially, a batch_check_availability tool takes multiple restaurants and returns all results. This reduces latency and simplifies agent logic.

Observability

Tools should emit events that enable monitoring and debugging. Comprehensive logging captures tool invocations with parameters, execution time and stages, external API calls made, responses received, errors encountered, and retry attempts.

Structured logging enables analysis. Each log entry includes tool name, invocation ID (for correlation), timestamp, parameters (sanitized of PII), outcome (success/failure), duration, and any error details.

Metrics aggregation tracks tool health over time. Key metrics include success rate by tool, latency percentiles, error rate by type, external API usage, and cost per operation.

Alerting triggers on anomalies. If restaurant booking success rate drops from 95% to 60%, something has changed—maybe an API change or site update. Early detection enables quick response.

This observability is crucial for understanding agent behavior, debugging failures, improving tool reliability over time, and maintaining user trust.

Location and Personalization

A local concierge must understand location and user preferences to provide relevant results.

Location Handling

Location can come from multiple sources with different reliability and precision.

Explicit query location is most reliable: "restaurants in the Mission" clearly specifies the area. Parse location references from the query and geocode to coordinates or boundaries.

User profile locations provide defaults. Home and work addresses enable "restaurants near home" or "quick lunch near work". Saved favorite locations support "near the gym" or "close to mom's place".

Device location (with permission) provides current position. Useful for "restaurants near me" when the user is mobile. Requires permission handling and privacy considerations.

Inferred location uses context when no location is specified. Time of day suggests home versus work. Recent queries suggest area of interest. Previous bookings suggest frequented neighborhoods.

The system should handle location at different granularities. Specific addresses support directions and precise proximity ("5-minute walk"). Neighborhoods support discovery queries ("restaurants in SoMa"). Cities support trip planning ("best pizza in Chicago"). Regions support broader exploration ("wine country restaurants").

Ambiguous location references need resolution. "Near work" requires knowing where the user works. "Downtown" means different areas in different cities. "That neighborhood with all the galleries" requires understanding local geography. "Where we went last time" requires conversation history.

Location parsing handles various formats. Street addresses, neighborhood names, landmarks ("near the ballpark"), relative locations ("10 minutes from here"), and colloquial references ("the Haight") all need translation to searchable location data.

User Preference Learning

Preferences shape both retrieval and action. For retrieval, preferences filter and rank results—a vegetarian user sees vegetarian-friendly restaurants ranked higher. For actions, preferences provide defaults—preferred party size, dietary restrictions communicated to restaurants, seating preferences.

Explicit preferences come from user-stated information. The user says "I'm vegetarian" or sets preferences in a profile. These are highly reliable but require user effort.

Inferred preferences emerge from behavior. The user consistently chooses Italian restaurants—they probably prefer Italian. They always book for 2—that's likely their default party size. They often ask about outdoor seating—factor that into results.

Preference confidence varies. Stated preferences are certain. Inferred preferences have confidence levels based on consistency and recency. Low-confidence inferences should be applied gently or confirmed before acting.

Preference application should be transparent. "I'm showing you Italian restaurants since you've enjoyed them before" lets users understand the system's reasoning. "Would you like me to mention your shellfish allergy when booking?" confirms before sharing personal information.

Preference conflicts require resolution. The user prefers vegetarian restaurants but is booking for a group where others eat meat. The system should recognize this tension and optimize for the group, perhaps highlighting vegetarian options at meat-friendly restaurants.

Time Awareness

Time context affects most queries. "Restaurants open now" requires knowing the current time and day. "Events this weekend" requires understanding date references. "Brunch spots" implies morning/early afternoon timing.

Temporal expressions need parsing and resolution. "Tonight" means different times at 2pm versus 10pm. "Next Friday" is ambiguous near week boundaries. "In a couple hours" is relative to now. "Valentines Day" requires knowing the date.

Operating hours filtering uses time context. A midnight query for food should filter for late-night options. A Sunday morning query should check Sunday hours (which often differ). Holiday hours require special handling.

Time-based ranking surfaces timely options. Events happening soon might rank higher (urgency). Restaurants with available reservations rank higher (actionability). Places that are about to close rank lower (limited utility).

Availability timing affects action feasibility. You can't book a table for a time that's passed. Event tickets might not be available close to the event. Cancellation policies depend on how close to the reservation time. The system should handle these constraints gracefully, explaining limitations rather than just failing.

Multi-Agent Coordination

Complex concierge tasks often benefit from specialized agents working together.

Agent Specialization

Different aspects of the concierge task benefit from different expertise. A restaurant specialist agent understands cuisines, dining occasions, neighborhood dining scenes, and restaurant-specific considerations. An events specialist understands event types, venues, ticket purchasing patterns, and scheduling. A logistics agent handles timing, transportation, and coordination.

Specialized agents can be implemented as separate prompts with tailored instructions, separate models fine-tuned for specific domains, or separate retrieval indices with domain-specific data. The right approach depends on the complexity of each domain and available resources.

Coordination Patterns

When multiple agents contribute to a task, coordination ensures coherent results.

Orchestrator pattern: A master agent receives the user request, decomposes it into subtasks, delegates to specialist agents, and synthesizes their outputs. "Plan date night" might delegate restaurant search to the dining agent, activity search to the events agent, and timing coordination to logistics.

Critique pattern: One agent proposes, another critiques. The dining agent suggests a restaurant; the logistics agent notes it's far from the planned activity; the dining agent suggests an alternative nearby.

Consensus pattern: Multiple agents independently evaluate options and converge on recommendations. Each brings different perspectives, and the aggregated ranking reflects multiple considerations.

Handoff Management

Conversation handoff between agents must be seamless for users. The user shouldn't notice when they're talking to a different specialized agent. Context must transfer completely. The receiving agent needs full conversation history, user preferences, current task state, and prior agent conclusions.

Testing and Evaluation

Concierge systems require comprehensive testing across both retrieval and action capabilities.

Retrieval Evaluation

Retrieval quality measures how well the system finds relevant places and events. Standard metrics include precision at K (how many top K results are relevant), recall (what fraction of relevant options are found), NDCG (how well-ranked are the results), and diversity (do results cover different options).

Test sets include queries with known relevant results. These might come from editorial judgments ("best Italian restaurants in SoMa" with expert-selected answers), user feedback (queries where users found satisfactory results), and synthetic tests (generate queries that should match specific places).

Evaluation should cover various query types: specific searches ("quiet coffee shops for working"), broad discovery ("things to do this weekend"), preference-heavy queries ("vegetarian-friendly with great cocktails"), and location variations (different neighborhoods, cities).

Action Evaluation

Action reliability measures how often actions complete successfully. Track success rate by action type (search availability, make reservation, buy tickets), by service (OpenTable versus Resy), by integration type (API versus browser automation), and over time (detecting regressions).

End-to-end testing exercises complete flows in test environments. Create test reservations, verify they appear in the booking system, cancel them. Purchase test tickets (in sandbox mode) and verify the purchase completed. These tests catch integration issues that unit tests miss.

Staging environments mirror production for testing without affecting real bookings. Some services offer sandbox APIs for testing. For browser automation, test against staging sites when available or carefully against production with immediate cancellation.

Conversation Evaluation

Conversation quality affects user experience beyond retrieval and action success. Evaluate slot filling efficiency (how many turns to gather needed information), clarification appropriateness (does the system ask when needed, not when obvious), error recovery (does the system handle failures gracefully), and context handling (does the system understand references and maintain state).

User studies provide qualitative evaluation. Have users complete realistic tasks and gather feedback on the experience. What confused them? What felt effortful? What delighted them? This qualitative data guides improvements that metrics might miss.

Production Deployment

Deploying a local concierge system at scale requires attention to several operational concerns.

Authentication and Credentials

Agentic actions often require user credentials for booking platforms. The system must handle credentials securely throughout their lifecycle.

Credential storage uses encryption at rest, access controls, and secure retrieval. Never store credentials in conversation logs or error messages. Use proper secrets management (HashiCorp Vault, AWS Secrets Manager, or similar).

OAuth flows handle services that support them. The system redirects users to authenticate with the service, receives tokens, and manages refresh. Token storage must be secure, and refresh flows must be reliable.

Scope management ensures credentials are used only for their intended purpose. A credential granted for booking shouldn't be used for profile modification. Principle of least privilege applies.

Session management tracks active sessions across services. Sessions expire; the system should handle re-authentication gracefully rather than failing mid-task.

User-provided credentials require special handling. Some users may want to use their own API keys or accounts. Support this while maintaining security—never log or store these credentials beyond encrypted storage.

Rate Limiting and Quotas

External APIs have rate limits that the system must respect.

Limit tracking monitors API usage against limits. Each API has different limits (requests per second, per minute, per day) and different consequences for exceeding them (errors, temporary blocks, account suspension).

Request queuing handles limit approaches gracefully. Rather than failing immediately, queue requests and process them as budget allows. Priority queuing ensures user-facing requests take precedence over background tasks.

Graceful degradation handles exhausted limits. If Google Places is rate-limited, fall back to cached data or alternative sources. Communicate limitations to users: "I'm getting a lot of traffic right now, so this might take a moment."

Quota allocation distributes limited resources fairly. If there are 10,000 daily API calls and 1,000 users, how are calls allocated? Per-user quotas, priority for active conversations, and background versus foreground priorities all matter.

Browser automation faces implicit limits. Too many requests trigger anti-bot measures. Implement reasonable delays between requests, rotate IP addresses if needed, respect robots.txt, and back off when blocked.

Cost Management

API calls cost money, and costs can grow quickly at scale.

Per-request costs vary by API and operation. Google Places charges differently for basic versus detailed place information. Booking APIs may charge per search, per booking, or both. Track costs by API, by operation type, and by user.

Caching reduces costs for read operations. A cached search result costs nothing beyond storage. Aggressive caching for slowly-changing data (place information, business hours) dramatically reduces API costs.

Compute costs include running browsers for automation. Headless browsers consume memory and CPU. Container costs scale with usage. Optimize by sharing browsers where possible and terminating idle browsers promptly.

Cost attribution tracks costs to user actions. This enables usage-based pricing, identifies expensive operations, and informs optimization priorities. Cost per successful booking is a key metric.

Consider tiered access where premium features require payment. Heavy users might pay more than casual users. API-intensive features might be premium. This aligns costs with value delivered.

Monitoring and Alerting

Production systems need comprehensive monitoring across all components.

Availability monitoring tracks whether the system is operational. Can users connect? Are queries being processed? Are actions completing? Basic health checks catch catastrophic failures.

Latency monitoring tracks how long operations take. P50, P95, and P99 latencies for searches and actions. Alert on latency degradation that affects user experience.

Success rate monitoring tracks operation outcomes. What fraction of searches return useful results? What fraction of booking attempts succeed? Segment by service, by action type, by time.

Error monitoring captures and categorizes errors. Which errors are common? Are error rates increasing? What's the root cause distribution? This drives debugging and prioritization.

Specific alerts for a concierge system include booking failure rate spike (something changed in a booking flow), API error rate increase (service issue or credential problem), browser automation failure patterns (site may have changed), latency degradation (infrastructure or API issues), and cost anomaly (unexpected API usage or runaway compute).

On-call procedures ensure issues are addressed promptly. Define severity levels, escalation paths, and response expectations. Major booking systems failing during peak hours is higher severity than minor features degraded during off-hours.

Compliance and Privacy

Location data and booking history are sensitive information requiring careful handling.

Data retention policies define how long data is kept. Conversation logs, booking history, location data—each has appropriate retention based on user value and privacy risk. Users should be able to see and control what's retained.

Deletion requests must be honored promptly. When a user requests data deletion (GDPR, CCPA, or good practice), actually delete the data. This includes backups and derived data, not just primary stores.

Consent and transparency inform users about data usage. What data is collected? How is it used? Who is it shared with? Users should understand and consent to data practices.

Third-party data sharing requires special care. When booking, user information is shared with restaurants and services. The user should understand what's shared. Minimize sharing to what's necessary. Don't share data with third parties beyond what's required for the requested action.

Cross-border data considerations apply for international services. Where is data stored? Does it cross borders? Different jurisdictions have different requirements.

Scalability Considerations

Growth requires systems that scale efficiently.

Stateless design enables horizontal scaling. Conversation state stored externally (Redis, database) allows any server to handle any request. This simplifies scaling and failover.

Connection pooling manages external API connections efficiently. Don't open a new connection for every request. Pool and reuse connections to external services.

Queue-based architecture handles traffic spikes. Rapid increases in requests queue rather than overwhelming backends. Workers process queued requests as capacity allows.

Geographic distribution reduces latency for users in different regions. Deploy in multiple regions, routing users to nearby instances. Consider data residency requirements.

Frequently Asked Questions

Enrico Piovano, PhD

Co-founder & CTO at Goji AI. Former Applied Scientist at Amazon (Alexa & AGI), focused on Agentic AI and LLMs. PhD in Electrical Engineering from Imperial College London. Gold Medalist at the National Mathematical Olympiad.

Related Articles

EducationAgentic AI

Building Agentic AI Systems: A Complete Implementation Guide

A comprehensive guide to building AI agents—tool use, ReAct pattern, planning, memory, context management, MCP integration, and multi-agent orchestration. With full prompt examples and production patterns.

30 min read
Agentic AIAutomation

Browser-Use: How AI Agents Control Web Browsers

A comprehensive technical analysis of Browser-Use—the library powering browser automation in AI agents like OpenManus and Cline. Understanding DOM extraction, element indexing, action execution, and how LLMs interact with web pages.

18 min read
Agentic AILLMs

Building MCP Servers: Custom Tool Integrations for AI Agents

A comprehensive guide to building Model Context Protocol (MCP) servers—from basic tool exposure to production-grade integrations with authentication, streaming, and error handling.

18 min read
EducationRAG

Building Production-Ready RAG Systems: Lessons from the Field

A comprehensive guide to building Retrieval-Augmented Generation systems that actually work in production, based on real-world experience at Goji AI.

16 min read
LLMsAgentic AI

Structured Outputs and Tool Use: Patterns for Reliable AI Applications

Master structured output generation and tool use patterns—JSON mode, schema enforcement, Instructor library, function calling best practices, error handling, and production patterns for reliable AI applications.

8 min read
EducationAgentic AI

Building Customer Support Agents: A Production Architecture Guide

A comprehensive guide to building multi-agent customer support systems—triage routing, specialized agents, context handoffs, guardrails, and production patterns with full implementation examples.

13 min read
Agentic AIData

Agentic Data Analysis: A Deep Dive into LAMBDA and Vanna Architectures

A comprehensive exploration of agentic data analysis systems—examining LAMBDA's multi-agent code generation and Vanna's Text-to-SQL architecture, understanding how they enable natural language data analysis and enterprise-ready insights.

10 min read
LLMsML Engineering

LLM Observability and Monitoring: From Development to Production

A comprehensive guide to LLM observability—tracing, metrics, cost tracking, and the tools that make production AI systems reliable. Comparing LangSmith, Langfuse, Arize Phoenix, and more.

8 min read