DocStrange by Nanonets
Document extraction API — convert PDFs, images, and documents to markdown, JSON, or CSV with per-field confidence scoring.
Get your API key: https://docstrange.nanonets.com/app
Quick Start
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "[email protected]" \
-F "output_format=markdown"
Response:
{
"success": true,
"record_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"result": {
"markdown": {
"content": "# Invoice\n\n**Invoice Number:** INV-2024-001..."
}
}
}
Setup
1. Get Your API Key
# Visit the dashboard
https://docstrange.nanonets.com/app
Save your API key:
export DOCSTRANGE_API_KEY="your_api_key_here"
2. OpenClaw Configuration (Optional)
Recommended: Use environment variables (most secure):
{
skills: {
entries: {
"docstrange": {
enabled: true,
// API key loaded from environment variable DOCSTRANGE_API_KEY
},
},
},
}
Alternative: Store in config file (use with caution):
{
skills: {
entries: {
"docstrange": {
enabled: true,
env: {
DOCSTRANGE_API_KEY: "your_api_key_here",
},
},
},
},
}
Security Note: If storing API keys in ~/.openclaw/openclaw.json:
Set file permissions:
chmod 600 ~/.openclaw/openclaw.jsonNever commit this file to version control
Prefer environment variables or your agent's secret store when possible
Rotate keys regularly and limit API key permissions if supported
Common Tasks
Extract to Markdown
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "[email protected]" \
-F "output_format=markdown"
Access content: response["result"]["markdown"]["content"]
Extract JSON Fields
Simple field list:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "[email protected]" \
-F "output_format=json" \
-F 'json_options=["invoice_number", "date", "total_amount", "vendor"]' \
-F "include_metadata=confidence_score"
With JSON schema:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "[email protected]" \
-F "output_format=json" \
-F 'json_options={"type": "object", "properties": {"invoice_number": {"type": "string"}, "total_amount": {"type": "number"}}}'
Response with confidence scores:
{
"result": {
"json": {
"content": {
"invoice_number": "INV-2024-001",
"total_amount": 500.00
},
"metadata": {
"confidence_score": {
"invoice_number": 98,
"total_amount": 99
}
}
}
}
}
Extract Tables to CSV
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/sync" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "[email protected]" \
-F "output_format=csv" \
-F "csv_options=table"
Async Extraction (Large Documents)
For documents >5 pages, use async and poll:
Queue the document:
curl -X POST "https://extraction-api.nanonets.com/api/v1/extract/async" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY" \
-F "[email protected]" \
-F "output_format=markdown"
# Returns: {"record_id": "12345", "status": "processing"}
Poll for results:
curl -X GET "https://extraction-api.nanonets.com/api/v1/extract/results/12345" \
-H "Authorization: Bearer $DOCSTRANGE_API_KEY"
# Returns: {"status": "completed", "result": {...}}
Advanced Features
Bounding Boxes
Get element coordinates for layout analysis:
-F "include_metadata=bounding_boxes"
Hierarchy Output
Extract document structure (sections, tables, key-value pairs):
-F "json_options=hierarchy_output"
Financial Documents Mode
Enhanced table and number formatting:
-F "markdown_options=financial-docs"
Custom Instructions
Guide extraction with prompts:
-F "custom_instructions=Focus on financial data. Ignore headers."
-F "prompt_mode=append"
Multiple Formats
Request multiple formats in one call:
-F "output_format=markdown,json"
When to Use
Use DocStrange For:
Invoice and receipt processing
Contract text extraction
Bank statement parsing
Form digitization
Image OCR (scanned documents)
Don't Use For:
Documents >5 pages with sync (use async)
Video/audio transcription
Non-document images
Best Practices
| Document Size | Endpoint | Notes |
|---|---|---|
| <=5 pages | /extract/sync |
Immediate response |
| >5 pages | /extract/async |
Poll for results |
JSON Extraction:
Field list:
["field1", "field2"]— quick extractionsJSON schema:
{"type": "object", ...}— strict typing, nested data
Confidence Scores:
Add
include_metadata=confidence_scoreScores are 0-100 per field
Review fields <80 manually
Schema Templates
Invoice
{
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"date": {"type": "string"},
"vendor": {"type": "string"},
"total": {"type": "number"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"price": {"type": "number"}
}
}
}
}
}
Receipt
{
"type": "object",
"properties": {
"merchant": {"type": "string"},
"date": {"type": "string"},
"total": {"type": "number"},
"items": {
"type": "array",
"items": {"type": "object", "properties": {"name": {"type": "string"}, "price": {"type": "number"}}}
}
}
}
Security & Privacy
Data Handling
Important: Documents uploaded to DocStrange are transmitted to https://extraction-api.nanonets.com and processed on external servers.
Before uploading sensitive documents:
Review Nanonets' privacy policy and data retention policies: https://docstrange.nanonets.com/docs
Verify encryption in transit (HTTPS) and at rest
Confirm data deletion/retention timelines
Test with non-sensitive sample documents first
Best practices:
Do not upload highly sensitive PII (SSNs, medical records, financial account numbers) until you've confirmed the service's security and compliance posture
Use API keys with limited permissions/scopes if available
Rotate API keys regularly (every 90 days recommended)
Monitor API usage logs for unauthorized access
Never log or commit API keys to repositories or examples
File Size Limits
Sync endpoint: Recommended for documents ≤5 pages
Async endpoint: Use for documents >5 pages to avoid timeouts
Large files: Consider using
file_urlwith publicly accessible URLs instead of uploading large files directly
Operational Safeguards
Always use environment variables or secure secret stores for API keys
Never include real API keys in code examples or documentation
Use placeholder values like
"your_api_key_here"in examplesSet appropriate file permissions on configuration files (600 for JSON configs)
Enable API key rotation and monitor usage through the dashboard
Troubleshooting
400 Bad Request:
Provide exactly one input:
file,file_url, orfile_base64Verify API key is valid
Sync Timeout:
Use async for documents >5 pages
Poll
/extract/results/{record_id}
Missing Confidence Scores:
Requires
json_options(field list or schema)Add
include_metadata=confidence_score
Authentication Errors:
Verify
DOCSTRANGE_API_KEYenvironment variable is setCheck API key hasn't expired or been revoked
Ensure no extra whitespace in API key value
Pre-Publish Security Checklist
Before publishing or updating this skill, verify:
[ ]
package.jsondeclaresrequiredEnvandprimaryEnvforDOCSTRANGE_API_KEY[ ]
package.jsonlists API endpoints inendpointsarray[ ] All code examples use placeholder values (
"your_api_key_here") not real keys[ ] No API keys or secrets are embedded in
SKILL.mdorpackage.json[ ] Security & Privacy section documents data handling and risks
[ ] Configuration examples include security warnings for plaintext storage
[ ] File permission guidance is included for config files
References
API Docs: https://docstrange.nanonets.com/docs
Get API Key: https://docstrange.nanonets.com/app
Privacy Policy: https://docstrange.nanonets.com/docs (check for privacy/data policy links)