VAD Whitepaper 8 of 8 14 min read

Agent Architecture and Integration: How Visitor-Aware Design Connects to Your AI Infrastructure

The conversational agent in a Visitor-Aware site is its most visible intelligence component and its most significant operational cost, with the widest range of integration requirements.

The platform decides when and why. The agent backend decides what to say. These are separate concerns with separate requirements.

Abstract

The conversational agent in a Visitor-Aware site is its most visible intelligence component and its most significant operational cost. It is also the component with the widest range of integration requirements: some organizations are content with a hosted LLM answering questions from their website content; others need retrieval-augmented generation against proprietary databases; others require fine-tuned models for specialized domains; and regulated industries may require that no conversation data leaves their infrastructure. This paper defines the agent architecture for Visitor-Aware Design: how the platform separates the decision to engage (always platform-controlled) from the conversation itself (pluggable, configurable, and tier-appropriate), and how this architecture accommodates everything from a standard hosted agent to a fully self-hosted, client-controlled LLM.

Part I: The Two Layers of Agent Intelligence

Layer 1: Engagement Intelligence (Platform)

The platform controls when to engage the agent, what context to provide, and how to use the outcome. This layer is the visitor-aware architecture — it cannot be removed or replaced without losing the paradigm:

When to engage. The agent does not appear for every visitor. The platform's journey detection and behavioral analysis determine when a conversational interaction would serve the visitor. Signals include:

Journey type (evaluation and decision journeys are agent-appropriate; discovery and exploration are not)
Engagement depth (deep readers who have consumed multiple content pieces are ready; quick scanners are not)
Return frequency (a third-visit evaluator is more agent-appropriate than a first-time browser)
Behavioral intent signals (pricing page engagement, comparison behavior, search queries indicating specific needs)
Time and context (a mobile visitor at 11pm may prefer a different interaction than a desktop visitor at 2pm)

What context to provide. When the agent engages, the platform provides the visitor's complete behavioral context:

Pages visited and time spent on each
Content read carefully vs. skimmed
Detected journey type
Inferred visitor profile (industry, role, organization size, decision stage)
Search queries made on the site
Previous visit history (if returning visitor)
Any previous agent conversations in prior sessions

This context enables the agent to open with relevance rather than generic greeting. "I see you've been looking at our enterprise plans, particularly for multi-site teams" is possible because the platform provides the behavioral summary to the agent.

How to use the outcome. After the conversation, the platform:

Captures the full conversation as a structured artifact
Extracts stated needs, questions, and intent signals
Updates the visitor model with conversation-derived data
Routes qualified leads with the behavioral journey + conversation summary to the appropriate human
Feeds conversation patterns back into the engagement intelligence to improve future timing and context

Layer 2: Conversational Intelligence (Pluggable)

The agent backend — the model that generates responses, the knowledge base it draws from, and the infrastructure it runs on — is a pluggable component. The platform provides a standard integration interface; the backend can be swapped without changing the platform architecture.

This separation is fundamental. The platform's value is in knowing when and why to engage. The agent backend's value is in knowing what to say. These are different competencies with different requirements.

Part II: The Four Tiers

Tier 1: Standard (Platform-Hosted)

How it works: The platform hosts the LLM. The client's website content is automatically indexed into a knowledge base. The agent answers visitor questions, qualifies intent, and routes to humans — all from the site's own content.

Knowledge source: Client's website content (pages, blog posts, case studies, whitepapers, FAQs) — automatically crawled and indexed by the platform. No client effort required beyond having the content on the site.

Best for: Most organizations. The site's existing content is sufficient for qualification conversations and common questions. The agent doesn't need to know things that aren't on the site.

Cost profile: Typically included in a platform subscription up to a defined conversation volume, with the platform optimizing model selection (smaller models for FAQ-level queries, larger models for complex conversations) to control costs.

Example interaction:

Visitor (after reading enterprise plan page and two case studies): "How long does the enterprise rollout take to implement?"

Agent (drawing from the site's delivery methods page): "Implementation typically takes 8-12 weeks depending on team size and customization requirements. I notice you've been looking at our case studies in your sector — for multi-site teams, we usually recommend the 12-week track because it includes industry-specific scenarios. Would you like to talk to someone who's done this for companies like yours?"

Tier 2: RAG-Enhanced (Proprietary Data)

How it works: The platform-hosted LLM is augmented with Retrieval-Augmented Generation against the client's proprietary data stores. When a visitor asks a question, the agent retrieves relevant information from the client's data before generating a response.

Knowledge sources: Everything in Tier 1, plus:

Product catalogs with detailed specifications, pricing, and availability
Internal knowledge bases and documentation
CRM data (anonymized) for organizational context
Case study databases with detailed outcome data
Policy documents, compliance information, eligibility criteria
FAQ databases maintained by support teams

Best for: Organizations with rich proprietary content that visitors need access to but that isn't published on the website. E-commerce with complex product lines. Professional services with detailed engagement models. Healthcare organizations with provider networks and insurance compatibility data. Government agencies with eligibility requirements and process documentation.

Technical requirements: The client provides access to their data via API, database connection, or document upload. The platform manages the vector store, embedding pipeline, and retrieval logic. The client maintains their data; the platform maintains the retrieval infrastructure.

Cost profile: Higher than Tier 1 due to retrieval operations and potentially larger context windows. Priced per conversation or as a tier upgrade.

Example interaction:

Visitor (on a healthcare site, after researching Type 2 diabetes): "Do you have endocrinologists near Austin who take Blue Cross?"

Agent (retrieving from provider database + insurance compatibility data): "Yes, we have three endocrinologists in the Austin area who accept Blue Cross Blue Shield. Dr. [Name] at our North Austin location specializes in newly diagnosed patients and has appointments available next week. Would you like me to help you schedule?"

Tier 3: Custom Model

How it works: The client provides a fine-tuned model or specifies model requirements. The platform manages engagement intelligence (when to engage, what context to provide) and the conversation infrastructure (session management, artifact capture, handoff). The client controls the model.

Knowledge sources: Whatever the client's model was trained on or has access to. The platform provides visitor context; the model provides domain expertise.

Best for:

Legal: Firms with specific terminology, precedent databases, and compliance requirements that general models handle poorly
Medical: Organizations requiring clinical accuracy and appropriate caveating that generic models cannot guarantee
Financial: Institutions with regulatory constraints on what an automated system can and cannot say about products and services
Defense/Government: Agencies with classification requirements or domain-specific knowledge

Technical requirements: The client provides a model endpoint (API) that accepts the platform's standard context payload and returns conversational responses. The platform handles everything else.

Cost profile: The client bears model hosting/inference costs. The platform charges for engagement intelligence and infrastructure.

Example interaction:

Visitor (on a financial services site, evaluating wealth management): "What's the difference between your discretionary and advisory services for accounts over $5M?"

Agent (using a model fine-tuned on the firm's product specifications and compliance-approved language): [Provides detailed comparison using exactly the terminology and disclaimers the firm requires, drawing from proprietary product documentation that a general model would not have access to]

Tier 4: Self-Hosted (Client Infrastructure)

How it works: The entire agent stack — model, inference, vector store, conversation data — runs on the client's infrastructure. No visitor conversation data leaves the client's network. The platform provides the engagement intelligence (when to engage, what context to provide) via API; the client's infrastructure handles everything else.

Knowledge sources: Entirely client-controlled. The platform has no access to the client's knowledge base or conversation content.

Best for:

Healthcare (HIPAA): Patient conversations may contain protected health information. No third-party processing acceptable.
Financial services (SOC 2, PCI): Client financial data and conversations must remain within the institution's security boundary.
Government (FedRAMP, classified): Citizen interactions with government services may be subject to data sovereignty requirements.
Legal (privilege): Attorney-client conversations routed through the site may be privileged.
Any organization with strict data residency requirements.

Technical requirements: The client deploys a model endpoint within their infrastructure that conforms to the platform's agent API contract. The platform sends visitor context (behavioral summary, journey type, inferred profile) to the client's endpoint; the client's model generates the response; the platform renders it in the visitor interface.

Cost profile: The client bears all inference and infrastructure costs. The platform charges for engagement intelligence, visitor modeling, and the platform itself. This is the most expensive tier from the client's perspective but provides the strongest data sovereignty guarantees.

Data flow:

Platform (visitor model, context) → API call to client's infrastructure
Client's infrastructure (model, knowledge base) → Response
Platform (renders response, captures metadata) → Visitor sees response

The platform captures conversation metadata (duration, turn count, outcome) for analytics and engagement optimization but does not capture conversation content. The content stays on the client's infrastructure.

Part III: What the Platform Always Controls

Regardless of tier, the platform provides:

Engagement timing. When to surface the agent based on behavioral signals. No tier changes this — the platform decides when a conversation would serve the visitor.
Visitor context. The behavioral summary provided to the agent at conversation start. The agent always knows what the visitor has done on the site, regardless of which backend generates the responses.
Conversation orchestration. Session management, turn handling, handoff detection (when to route to a human), and graceful degradation (what happens if the model is unavailable).
Outcome capture. Conversation metadata (timing, length, outcome classification) feeds the analytics pipeline regardless of tier. Content capture varies by tier and client preference.
Engagement optimization. The platform learns from conversation outcomes across all tiers: which behavioral patterns lead to productive conversations, which lead to early abandonment, and how to improve the timing and context of future engagements.

This is the platform's core value. The agent backend is the voice. The platform is the judgment.

Part IV: The Cost Conversation

Why CEOs Ask About Cost

The conversational agent is the most visible AI component in a visitor-aware site. It is the feature that prospects and stakeholders immediately associate with "AI cost." The concern is legitimate: LLM inference at scale is not free. But the concern is usually framed wrong.

The Wrong Frame: "What does the AI cost?"

This frames the agent as a line-item expense, like hosting or software licenses. Under this frame, every agent conversation is a cost to be minimized.

The Right Frame: "What is the ROI of automated qualification?"

Agent conversations are not costs. They are automated sales and service functions that replace — or augment — human functions that cost far more:

Function	Human cost	Agent cost	Multiplier
Lead qualification call (SDR)	$50-150 per call	$0.50-5 per conversation	10-100x cheaper
After-hours inquiry handling	Lost (visitor leaves)	Available 24/7 at marginal cost	Infinite (previously zero)
Support question deflection	$15-50 per call	$0.50-2 per interaction	10-50x cheaper
Initial needs discovery	15-30 min of sales professional time	Captured from behavioral data + conversation	Time cost → near zero

The Natural Cost Curve

Agent costs in a visitor-aware system are naturally self-regulating:

70-80% of visitors never interact with the agent (browsing, discovery, exploration). Cost: zero beyond baseline analytics.
15-25% of visitors receive adapted experiences without agent engagement. Cost: minimal (client-side computation).
3-8% of visitors have short agent interactions (1-3 turns, FAQ-level). Cost: small per interaction.
1-3% of visitors have substantive qualification conversations. Cost: moderate per interaction, but these are the highest-value visitors.

The cost curve is shaped like the value curve: the visitors who cost the most to serve are the ones most likely to generate revenue. This is not an accident — it is a consequence of the engagement intelligence layer that controls when the agent surfaces.

Budget Controls

For organizations that need hard cost boundaries, a Visitor-Aware platform should support:

Monthly conversation budget. A configurable ceiling on agent conversations per month. When reached, visitors see an alternative engagement path (contact form, scheduling tool, callback request).
Tier-based routing. Short interactions use lighter (cheaper) models. Deep conversations escalate to more capable (more expensive) models — managed automatically based on conversation depth signals.
Load-aware deferral. When using a platform-hosted model, inference costs vary by load. Non-urgent interactions can be deferred or served from cached responses during peak periods.
Cost-and-outcome dashboards. Real-time visibility into agent conversation volume, cost per conversation, and attributed outcomes. The buyer always knows what they're spending and what it's producing.

Part V: RAG Integration in Depth

Why RAG Changes the Value Proposition

Standard Tier 1 agents draw from the client's published website content. This is sufficient for common questions and basic qualification. But the most valuable conversations — the ones that convert — often require information that is not on the website:

"What's the pricing for a team of 200?" (pricing data not published)
"Do you have experience with [specific industry niche]?" (case study details not on the site)
"Can your program integrate with our existing LMS?" (technical compatibility data)
"What's the timeline for a custom implementation?" (scoping data from past engagements)

RAG unlocks these conversations by giving the agent access to the client's proprietary knowledge while keeping that knowledge under the client's control.

Architecture

Visitor asks question
  → Platform provides visitor context + question to agent
  → Agent generates search queries from the question
  → Queries hit the client's vector store (proprietary data)
  → Relevant documents/chunks retrieved
  → Agent generates response using retrieved context + visitor context
  → Platform renders response and captures metadata

Data Sovereignty

The client's proprietary data stays in the client's vector store. The platform queries it at inference time but does not copy, cache, or retain the retrieved documents. The vector store can be:

Platform-hosted but client-owned (the platform manages infrastructure; the client owns the data and controls access)
Client-hosted (the vector store runs on the client's infrastructure; the platform queries it via API)

This distinction matters for organizations with data classification requirements.

Keeping RAG Content Current

Proprietary data changes: prices update, products launch, staff changes, policies evolve. The RAG pipeline must handle this:

Automated sync. For structured data sources (CRM, product catalog, knowledge base), the platform provides connectors that re-index on a schedule or trigger.
Manual upload. For unstructured documents (PDFs, presentations, internal memos), the client uploads to the platform which processes and indexes them.
Freshness signals. The platform tracks when each document was last indexed. Agent responses from stale documents are flagged for review.

Part VI: Self-Hosted Models — The Regulated Industry Path

Why Self-Hosting Matters

For some organizations, the question is not "which model?" but "where does it run?" Regulated industries have constraints that no amount of platform security can satisfy:

HIPAA (healthcare): Patient conversations may reference symptoms, conditions, medications, or providers. Processing this data on third-party infrastructure requires Business Associate Agreements and creates compliance risk.
SOC 2 / PCI (financial services): Client financial data discussed in agent conversations must remain within the institution's security boundary.
FedRAMP (government): Government services may be subject to data sovereignty requirements that restrict processing to approved infrastructure.
Attorney-client privilege (legal): Conversations that touch legal matters may be privileged. Processing them on shared infrastructure could be argued to compromise privilege.

The Self-Hosted Architecture

The platform provides:

A documented agent API contract (request/response format, context payload schema)
SDKs for common infrastructure (Python, Node.js, Rust)
Reference implementations for popular model servers (vLLM, Ollama, TensorRT-LLM)
Monitoring and observability hooks (latency, error rates, availability)

The client deploys:

A model server within their infrastructure
A knowledge base / vector store (if RAG is needed)
An endpoint that conforms to the platform's agent API contract

The platform sends visitor context to the client's endpoint, receives the response, and renders it. The conversation content never touches the platform's infrastructure. The platform captures metadata (conversation duration, turn count, outcome classification) for engagement optimization.

The Trade-off

Self-hosting gives the client complete data control. It also gives them complete operational responsibility: model updates, scaling, monitoring, latency optimization, and cost management. The platform provides the architecture that makes the agent valuable (engagement timing, visitor context, journey detection). The client provides the infrastructure that makes it compliant.

For organizations where compliance is non-negotiable, this trade-off is straightforward. For others, the platform-hosted tiers are simpler, cheaper, and sufficient.

Part VII: Integration Patterns

CRM Integration

Agent conversations are most valuable when they flow into the client's existing sales infrastructure:

Lead creation. When an agent conversation identifies a qualified lead, the platform creates a lead record in the client's CRM with the full behavioral journey and conversation summary.
Contact enrichment. When a known contact engages with the agent, the conversation data enriches their CRM record — new needs, updated context, stated timeline.
Activity logging. Agent conversations appear as activities on the CRM record, visible to sales alongside calls, emails, and meetings.

Supported via standard CRM APIs (HubSpot, Salesforce, Dynamics, Pipedrive, or custom webhook).

Analytics Integration

Agent conversation data feeds the platform's analytics pipeline:

Which journey types produce the most productive conversations?
Which content paths lead to agent engagement?
What questions are visitors asking that the site doesn't answer?
How does agent engagement affect conversion rates compared to non-agent paths?

This data is available in the platform's analytics dashboard and exportable to the client's BI tools.

Workflow Integration

Agent conversations can trigger workflows beyond CRM:

Schedule a meeting (calendar integration)
Send a follow-up resource (email integration)
Route to a specialist (team routing rules)
Create a support ticket (ticketing system integration)

The platform provides a webhook/event system that fires when agent conversations reach defined outcomes. The client configures integrations to their existing tools.

Paper 8 of 8 in the Visitor-Aware Design series

PKG Systems — Defining the Visitor-Aware Design and User-Aware Design Paradigms