Turn Documents into Structured Intelligence
Extract, verify, and structure data from any document — with zero hallucinations. Deterministic-first processing keeps 70%+ of extraction at $0 cost.
15-day sandbox · No credit card · Full extraction included
curl -X POST http://api.getclearsight.in/v1/documents/upload \
-H "Authorization: Bearer cs_live_xxxxx" \
-F "file=@annual_report.pdf" \
-F "domain=corporate_finance"{
"document_id": "doc_8f3a2b1c",
"status": "processing",
"pages": 47,
"classification": "annual_report",
"estimated_cost": "$0.00",
"tier": "T0_DETERMINISTIC",
"extraction_ready": "~12s"
}The Problem
Enterprise data is trapped in documents
Drowning in Unstructured Data
PDFs, scanned documents, regulatory filings — critical data trapped in formats that resist automation.
LLM Hallucinations
LLM-only approaches fabricate data points. ClearSight uses deterministic extraction first — LLMs only verify what rules can't resolve.
Manual Extraction Costs
Teams spend hours copying data from documents into systems. One API call replaces the entire workflow.
How It Works
Four steps to structured intelligence
Upload
Send any PDF, scanned doc, or text file via API
Single POST endpoint. Automatic classification.
Extract
Tables, text, and metadata extracted deterministically
pdfplumber + camelot + OCR. Zero LLM cost for 70%+ of docs.
Verify
Every data point cross-referenced with source text
Page-level citations. Separate verification step catches gaps.
Structure
Clean, typed JSON with confidence scores
Role-specific outputs via persona lenses. Ready for downstream.
Capabilities
Everything you need for document intelligence
Document Extraction
Tables, text, and metadata from PDFs — deterministic-first with LLM fallback.
Zero-Hallucination Verification
Every claim cross-referenced against source text. Page-level citations included.
Semantic Search & RAG
Ask questions across your document corpus. Get answers with citations, not guesses.
Persona-Driven Outputs
Same document, different intelligence. Lens system tailors outputs by role.
Document Management
Folders, versions, ACLs, and full audit trail. Enterprise-grade DMS built in.
Knowledge Management
Synthesize insights across documents. Entity graphs and gap detection.
Industry Coverage
Document intelligence across verticals
Add new verticals with zero code changes — just YAML configuration.
Mutual Funds
Production Ready6 document types
NPS / Pensions
Production Ready4 document types
Insurance
In Development4 document types
Banking & Lending
Planned3 document types
For Developers
Ship document intelligence this week
REST API with structured JSON responses, verification scores, and page-level citations. Full OpenAPI spec and Postman collection included.
/v1/documents/upload
Upload and process a document end-to-end
/v1/documents/{id}/extract
Retrieve extraction results with citations
/v1/ask
Semantic search with RAG synthesis
/v1/search
Vector similarity search across documents
curl -X POST http://api.getclearsight.in/v1/ask \
-H "Authorization: Bearer cs_live_xxxxx" \
-H "Content-Type: application/json" \
-d '{
"query": "What is the total equity exposure?",
"document_ids": ["doc_8f3a2b1c"],
"persona": "portfolio_manager"
}'{
"answer": "Total equity exposure across the portfolio is 68.4%, comprising 45.2% in large-cap, 15.8% in mid-cap, and 7.4% in small-cap allocations.",
"citations": [
{
"text": "Equity: 68.4% of AUM",
"page": 12,
"confidence": 0.98
}
],
"verification_score": 0.97,
"tier_used": "T0_DETERMINISTIC",
"cost": "$0.00"
}Pricing
Start free. Scale when ready.
Sandbox
15-day trial with full API access
- ✓Tier 0 deterministic extraction
- ✓10 documents per day
- ✓Pre-seeded demo data (MF + NPS)
- ✓Full API access + Swagger UI
Pro
- ✓Tiers 0–3 extraction
- ✓Unlimited documents
- ✓Semantic search + RAG
- ✓Budget caps per tenant
Enterprise
- ✓All tiers including Tier 4
- ✓Dedicated PostgreSQL + Redis
- ✓Custom domain repositories
- ✓Meeting intelligence
FAQ
Frequently asked questions
ClearSight processes PDFs, scanned documents (via OCR), and structured text files. It supports domain-specific documents like Scheme Information Documents, CAS statements, policy wordings, financial statements, and regulatory filings. New document types are added through YAML configuration — no code changes needed.
ClearSight uses a deterministic-first pipeline. Over 70% of extraction happens using rule-based methods (pdfplumber for text, camelot for tables, OCR for scans) — with zero LLM involvement. When LLMs are used for verification, every claim is cross-referenced against the source text with page-level citations. A separate verification step catches discrepancies.
The 15-day sandbox gives you full API access with Tier 0 extraction (deterministic, $0 cost). You get pre-loaded sample documents, a Postman collection, and OpenAPI documentation. No credit card required. Process up to 10 documents per day.
Tier 0 (deterministic extraction) is $0 — it handles 70%+ of processing. When LLMs are needed for verification or synthesis, costs scale based on tier: Tier 2 at $0.15/M tokens, Tier 3 at $3/$15/M tokens. Average cost per document is under $0.05. You set budget caps per tenant.
Yes. ClearSight's domain repository system uses YAML configuration files to define document types, extraction rules, validation schemas, and lens configurations. Adding a new document type requires no code changes — just a new YAML definition.
Yes. ClearSight uses PostgreSQL Row-Level Security (RLS) enforced at the database level on every table. Tenant isolation cannot be bypassed by application code. Each tenant's data is cryptographically separated.
ClearSight is API-first. A single POST to /v1/documents/upload processes a document end-to-end. You get structured JSON back with extracted data, verification scores, and citations. Most integrations are live within a day using the Postman collection.
Ship document intelligence this week
15-day sandbox. No credit card. Full Tier 0 extraction included. Pre-loaded with ClearSight sample documents.
Start Free Trial