How long should a deep research report take to generate?

It depends on scope and depth. Quick market overviews: 2-5 minutes. Comprehensive strategic analyses: 15-30 minutes. Complex multi-topic investigations: 30-60 minutes. Set user expectations appropriately and show progress during generation.

How do you handle topics with rapidly changing information?

Set recency requirements in the research specification. Weight recent sources more heavily in synthesis. Include publication dates in citations. Add "as of [date]" qualifiers to time-sensitive claims. For very dynamic topics, acknowledge that information may have changed.

What's the cost per report?

Varies significantly by depth. Quick summaries: $0.50-2. Comprehensive reports: $5-20. Deep investigations with extensive search: $20-100. Cost scales with token usage, search API calls, and processing time. Offer tiered options to match value to user needs.

How do you ensure factual accuracy?

Multiple layers: (1) Source credibility filtering, (2) Corroboration requirements for claims, (3) Citation verification, (4) Automated fact-checking against known databases, (5) Human review for high-stakes reports. Accept that 100% accuracy is impossible—communicate confidence levels.

Can the system conduct primary research (surveys, interviews)?

Not directly—it synthesizes existing information. However, it can: analyze existing survey data, process interview transcripts if provided, identify gaps that require primary research, and generate interview guides or survey instruments for humans to execute.

How do you handle conflicting information from different sources?

Explicitly acknowledge conflicts rather than hiding them. Present the range of views with attribution. Explain potential reasons for disagreement (methodology, timing, perspective). When possible, provide analysis of which view is better supported. Let readers make informed judgments.

Building Deep Research AI: From Query to Comprehensive Report | Enrico Piovano

The Deep Research Challenge

Surface-level answers are easy. Ask an LLM a question, get a response. But real research—the kind that informs decisions—requires depth: exploring multiple angles, synthesizing contradictory sources, identifying gaps, and producing structured analysis.

2025: The year deep research went mainstream: Both OpenAI and Google launched production deep research capabilities in 2025. OpenAI's Deep Research uses a version of o3 "trained using end-to-end reinforcement learning on hard browsing and reasoning tasks," learning to "plan and execute a multi-step trajectory to find needed data, backtracking and reacting to real-time information." Google's Gemini Deep Research "formulates a detailed research plan, breaking the problem into smaller sub-tasks" and "intelligently determines which sub-tasks can be tackled simultaneously and which need to be done sequentially."

Why this matters for your organization: According to Deutsche Bank Research, deep research AI will have "profound consequences for knowledge work and the economy." The models produce research analyst-quality reports by synthesizing hundreds of online sources—work that previously took days now takes minutes.

Deep research AI systems tackle questions like:

"What are the key risks and opportunities in the quantum computing market over the next 5 years?"
"How do different countries regulate AI in healthcare, and what are the implications for our product?"
"What caused our competitor's recent market share gain?"

These aren't questions with simple answers. They require investigation, synthesis, and judgment.

At Goji AI, we've built deep research systems that produce analyst-quality reports in minutes instead of days. This post shares the architecture and techniques that make this possible.

Architecture Overview

A deep research system orchestrates multiple capabilities:

Code

Research Query
    ↓
[Query Understanding]
    ↓
[Research Planning] → Generate research outline
    ↓
[Parallel Investigation]
    ├── Web Search Agent
    ├── Document Analysis Agent
    ├── Data Analysis Agent
    └── Expert Knowledge Agent
    ↓
[Information Synthesis]
    ↓
[Report Generation]
    ↓
[Quality Assurance]
    ↓
Final Report with Citations

Phase 1: Query Understanding

Transform the user's question into a research specification.

Why query understanding determines research quality: A vague query produces vague research. "How is AI changing the legal industry?" could generate a 500-page treatise or a two-paragraph summary. Without explicit scope, depth, and focus, the system has no way to know what level of detail is appropriate. Query understanding forces these implicit decisions to become explicit, ensuring the research matches what the user actually needs.

The specification serves as a contract: Once generated, the research specification becomes the document against which the final output is evaluated. Did we cover all the aspects? Did we hit the right depth? Did we respect the constraints? Without a specification, you can't objectively evaluate whether the research succeeded.

Interactive refinement is often necessary: For complex research requests, the system should present the specification back to the user for approval before proceeding. "You asked about AI in legal—I'm planning to cover document review, contract analysis, legal research, and predictive analytics, focusing on US/EU markets, with a 3-5 year outlook. Should I proceed, or would you like me to adjust the scope?" This prevents hours of wasted research in the wrong direction.

Input: "How is AI changing the legal industry?"

Output:

Code

Research Specification:
- Core question: Impact of AI on legal industry
- Scope: Global, focus on US/EU markets
- Timeframe: Current state + 3-5 year outlook
- Aspects to cover:
  - Current AI applications in legal
  - Adoption rates and barriers
  - Impact on jobs and workflows
  - Regulatory considerations
  - Key vendors and technologies
  - Case studies
- Output format: Executive report with sections
- Depth: Comprehensive (suitable for strategic planning)
- Constraints: Public sources only

Phase 2: Research Planning

Generate a structured research plan.

Why upfront planning beats iterative exploration: You could let the system start searching immediately and see what it finds. But this leads to rabbit holes, missed topics, and inconsistent depth. An outline forces comprehensive coverage—you can see at a glance whether important topics are missing. It also enables parallelization: once you have an outline, different agents can work on different sections simultaneously.

The outline is a hypothesis, not a commitment: The initial outline is based on the system's prior knowledge of what topics typically matter for a given research area. As investigation proceeds, the outline may need revision. A section might need to be split (more content than expected), merged (topics overlap), or added (investigation revealed something important not in the original outline). The system should track these revisions and explain why they occurred.

Query generation is the bridge to investigation: Each section in the outline needs to become search queries. This is non-trivial: "Impact on jobs" might generate queries like "AI legal job displacement statistics," "law firm layoffs AI," "legal AI augmentation vs replacement," and "paralegal AI impact studies." The quality of generated queries directly determines what information the investigation phase will find.

Outline Generation:

Code

1. Executive Summary
2. Current State of AI in Legal
   2.1 Document review and e-discovery
   2.2 Contract analysis
   2.3 Legal research
   2.4 Predictive analytics
3. Market Adoption
   3.1 Adoption rates by firm size
   3.2 Regional differences
   3.3 Barriers to adoption
4. Impact Analysis
   4.1 Efficiency gains
   4.2 Job displacement vs augmentation
   4.3 Quality and accuracy implications
5. Regulatory Landscape
   5.1 Bar association guidance
   5.2 Liability considerations
   5.3 Ethical frameworks
6. Key Players and Technologies
7. Case Studies
8. Future Outlook
9. Recommendations

Query Generation: For each section, generate specific search queries:

"AI legal document review market size 2024"
"law firm AI adoption statistics"
"AI contract analysis accuracy studies"
"ABA AI ethics guidelines"

Phase 3: Parallel Investigation

Multiple specialized agents work simultaneously.

Why parallelization matters for research: Serial investigation is slow. If each section requires 5 search queries, each taking 2 seconds, and you have 10 sections, that's 100 seconds just for search—before any processing. With parallel agents, all sections can be researched simultaneously, reducing total time to ~10 seconds for the search phase. For comprehensive reports that might require hundreds of queries, parallelization is the difference between minutes and hours.

Specialized agents outperform generalist agents: A Web Search Agent that only does web search can be optimized for that task: better query formulation, more sophisticated source filtering, smarter passage extraction. A generalist agent that does everything tends to do everything poorly. Specialization also enables easier debugging—if document extraction is failing, you know exactly which agent to examine.

Information handoff between agents is critical: Agents need to pass information to each other in structured formats. The Web Search Agent might find a PDF link that the Document Analysis Agent needs to process. The Data Analysis Agent might need raw numbers that the Web Search Agent extracted. These handoffs require clear protocols: what format, what metadata, how to handle failures.

Web Search Agent:

Executes search queries
Filters for authoritative sources
Extracts relevant passages
Notes publication dates for recency

Document Analysis Agent:

Processes PDFs, reports, whitepapers
Extracts data from tables and charts
Identifies key findings and quotes

Data Analysis Agent:

Finds quantitative data
Normalizes across sources
Identifies trends and patterns
Creates visualizations

Expert Knowledge Agent:

Provides domain context
Identifies gaps in gathered information
Suggests additional investigation angles

Phase 4: Information Synthesis

Combine findings across agents.

Why synthesis is harder than collection: Collection is mechanical—run queries, extract passages, store results. Synthesis requires judgment: which findings matter most, how do they relate to each other, what story do they tell together? This is where the quality of deep research diverges from simple search-and-summarize systems.

The synthesis challenges in practice:

Deduplication: Same fact from multiple sources → single fact with multiple citations. This sounds simple but is surprisingly hard. "Market size of $5.2B" and "market valued at$ 5.2 billion" are the same. But "market size of $5.2B in 2024" and "market size of$ 4.8B in 2023" are different—one is newer data. The system must recognize semantic equivalence while preserving meaningful distinctions.

Conflict Resolution: Contradictory claims → note disagreement, prefer authoritative/recent sources. What happens when Gartner says the market is $5.2B and McKinsey says$ 6.1B? You can't just pick one. Good synthesis notes the disagreement, explains possible reasons (different market definitions, different methodologies), and either triangulates a reasonable estimate or presents the range with caveats.

Gap Identification: What questions remain unanswered? Trigger additional research or note as limitation. After the first round of investigation, the system should evaluate: "I found adoption rates for large firms but nothing about solo practitioners. I found US data but limited EU data." These gaps might trigger targeted follow-up searches, or might be noted as limitations in the final report.

Narrative Construction: Organize findings into coherent structure following the outline. Raw findings are disjointed bullet points. A good report tells a story: here's the current state, here's how we got here, here's where it's going, here's what you should do. Narrative construction transforms data into insight.

Phase 5: Report Generation

Transform synthesized information into polished output:

Section-by-Section Generation: Each section generated with:

Relevant findings from synthesis
Required length/depth
Tone and style guidelines
Citation requirements

Cross-Reference Verification:

Numbers mentioned in executive summary match body
Claims have supporting citations
Internal references are consistent

Phase 6: Quality Assurance

Before delivery.

Why QA is non-negotiable for deep research: The stakes for research reports are high. Strategic decisions, investments, and policy choices may depend on the findings. A single wrong number or misattributed claim can undermine the entire report's credibility. QA is the last line of defense against errors that slipped through earlier phases.

Automated QA can catch many errors: Citation verification can be automated: does the cited source actually contain the claimed information? Numerical consistency can be checked: does "revenue of $5.2B" in the executive summary match "revenue of$ 5.2B" in the detailed section, or did a typo create "revenue of $52B"? Coverage can be verified: does every outline section have content, or did a generation failure leave a section empty?

Human review remains essential for judgment calls: Automated QA can't evaluate whether the synthesis makes sense, whether the recommendations follow from the evidence, or whether the report answers the original question well. For high-stakes research, human review of the final output is worth the time investment.

Factual Verification:

Spot-check claims against sources
Verify calculations
Confirm citations are accurate

Completeness Check:

All outline sections covered
Original question answered
Appropriate depth achieved

Quality Scoring:

Source diversity score
Citation density
Recency of sources
Coverage of key aspects

Source Management

Source Credibility Assessment

Not all sources are equal. Build a credibility framework:

Source Type	Base Credibility	Notes
Academic journals	High	Peer-reviewed, may be dated
Government sources	High	Official but potentially biased
Industry analysts	Medium-High	Expert but may have conflicts
Major news outlets	Medium	Current but variable depth
Company websites	Low-Medium	Primary for company info, biased
Blogs/social media	Low	Current but unverified

Adjust based on:

Author credentials
Publication date
Corroboration by other sources
Potential conflicts of interest

Citation Management

Every claim needs attribution:

Citation Format:

Code

AI-powered document review can reduce review time by 60-80% compared to manual review [1][2].

[1] "AI in Legal: 2024 Market Report", Gartner, March 2024
[2] "E-Discovery Technology Survey", ILTA, January 2024

Citation Requirements:

Statistics: Always cite source
Opinions/predictions: Attribute to specific analysts
Common knowledge: No citation needed
Controversial claims: Multiple corroborating sources

Handling Paywalled Content

Many valuable sources are behind paywalls:

Strategies:

Check for free summaries/abstracts
Use institutional access if available
Find similar information from free sources
Acknowledge when key sources couldn't be accessed
Use press releases/coverage of paywalled reports

Report Quality Techniques

Depth vs. Breadth Balance

Breadth: Cover all relevant aspects Depth: Sufficient detail for decision-making

Balance through:

Tiered detail: Executive summary → sections → appendices
Priority ranking: More depth on higher-priority topics
User feedback: Adjust based on stated needs

Handling Uncertainty

Research rarely produces certainty. Communicate uncertainty appropriately:

Quantified uncertainty: "Market size estimates range from $2.1B to$ 3.4B, with most analysts projecting $2.7-2.9B"

Source agreement: "Three of four analysts surveyed expect >20% growth; one predicts consolidation"

Knowledge gaps: "Limited public data available on adoption rates in mid-size firms"

Confidence levels:

High confidence: Multiple corroborating sources, established facts
Medium confidence: Single authoritative source or partial corroboration
Low confidence: Limited sources, extrapolation required

Bias Awareness

Research systems can inherit biases:

Source bias: Over-reliance on sources with particular perspectives Recency bias: Favoring recent over historically important information Availability bias: Favoring easily searchable information Confirmation bias: Finding evidence for expected conclusions

Mitigate through:

Deliberate source diversity
Explicit search for counterarguments
Including dissenting views
Transparency about limitations

Specialized Research Modes

Competitive Intelligence

Research on competitors requires specialized handling:

Sources:

SEC filings, earnings calls
Patent databases
Job postings (signal priorities)
Press releases, news coverage
Industry analyst reports
Customer reviews

Analysis:

Product/feature comparison
Pricing analysis
Market positioning
Strategic direction signals

Market Research

Understanding markets and opportunities:

Quantitative:

Market size and growth rates
Segment breakdowns
Geographic distribution
Key metrics and benchmarks

Qualitative:

Customer needs and pain points
Competitive dynamics
Regulatory factors
Technology trends

Technical Research

Deep dives into technology topics:

Sources:

Academic papers (arXiv, Google Scholar)
Technical documentation
GitHub repositories
Conference proceedings
Expert blogs

Analysis:

State of the art
Comparative evaluation
Implementation considerations
Limitations and open problems

Performance Optimization

Parallelization

Research is embarrassingly parallel:

Multiple search queries execute simultaneously
Multiple documents process in parallel
Multiple sections generate concurrently

Typical speedup: 5-10x vs. sequential execution

Caching

Cache at multiple levels:

Search result cache (refresh daily)
Document processing cache (refresh on change)
Intermediate synthesis (per-report)

Progressive Generation

For long reports, stream output:

Generate outline → Show immediately
Generate executive summary → Append
Generate each section → Append as ready
Final quality check → Mark complete

User sees progress rather than waiting for complete report.

Token Efficiency

Research generates lots of text. Optimize:

Summarize retrieved documents before synthesis
Use hierarchical summarization for long documents
Generate sections at appropriate length, not maximum
Compress intermediate representations

Evaluation Framework

Automated Metrics

Metric	Measurement	Target
Query coverage	% of research questions addressed	> 95%
Source diversity	Unique sources per section	> 3
Citation density	Claims with citations	> 80%
Recency	% sources < 12 months old	> 60%
Readability	Flesch-Kincaid grade level	12-14

Human Evaluation

Expert review on:

Factual accuracy (spot-check claims)
Analytical depth (vs. surface summary)
Actionability (insights support decisions)
Balance (multiple perspectives represented)
Completeness (key aspects covered)

A/B Testing

Compare system versions:

User satisfaction ratings
Report usage metrics (time spent, sections read)
Decision quality (if measurable)
Iteration requests (fewer = better first draft)

Production Architecture

Scalability

Handle multiple simultaneous research requests:

Queue management for resource allocation
Priority levels (urgent vs. background)
Resource limits per request
Graceful degradation under load

Cost Management

Deep research is expensive. Manage through:

Tiered depth options (quick scan vs. comprehensive)
Token budgets per report type
Caching to avoid redundant processing
Model selection based on subtask complexity

Reliability

Research for decisions needs high reliability:

Retry logic for failed searches
Fallback sources when primary unavailable
Timeout handling with partial results
Clear indication of incomplete research

Case Study: Investment Research

We built a deep research system for investment analysis:

Input: Company name or ticker Output: Comprehensive investment analysis report

Components:

Financial data extraction (SEC filings, earnings)
News sentiment analysis
Competitor positioning
Industry trend synthesis
Risk factor identification
Valuation analysis

Results:

15-minute generation time (vs. 2-day analyst process)
87% alignment with human analyst conclusions
23% identification of factors analysts missed
Significant cost reduction for routine coverage

Conclusion

Deep research AI systems combine multiple capabilities—search, analysis, synthesis, writing—to produce comprehensive reports that previously required hours or days of human effort.

The key is orchestration: breaking research into manageable subtasks, executing in parallel, synthesizing intelligently, and maintaining quality throughout. The result is a system that augments human analysts, handling routine investigation so humans can focus on judgment and decisions.

Table of Contents

The Deep Research Challenge

Architecture Overview

Phase 1: Query Understanding

Phase 2: Research Planning

Phase 3: Parallel Investigation

Phase 4: Information Synthesis

Phase 5: Report Generation

Phase 6: Quality Assurance

Source Management

Source Credibility Assessment

Citation Management

Handling Paywalled Content

Report Quality Techniques

Depth vs. Breadth Balance

Handling Uncertainty

Bias Awareness

Specialized Research Modes

Competitive Intelligence

Market Research

Technical Research

Performance Optimization

Parallelization

Caching

Progressive Generation

Token Efficiency

Evaluation Framework

Automated Metrics

Human Evaluation

A/B Testing

Production Architecture

Scalability

Cost Management

Reliability

Case Study: Investment Research

Conclusion

Frequently Asked Questions

Enrico Piovano, PhD

Related Articles

Agentic RAG: When Retrieval Meets Autonomous Reasoning

Building Agentic AI Systems: A Complete Implementation Guide

LLM Memory Systems: From MemGPT to Long-Term Agent Memory