# A Stack View of the Document Processing Market

Custody, representation, and review in practice

Published 2026-06-05 · Placet Experiri · canonical: https://placetexperiri.com/posts/a-stack-view-of-the-document-processing-market/

---
"AI document processing" is a vast product category. It encompasses enterprise content platforms,[^1] hyperscaler APIs,[^2][^3][^4] frontier parsing services,[^5][^6] open-source converters,[^7][^8] OCR baselines,[^9] web-capture tools,[^10] review queues,[^11] and full workflow systems.[^12] The definition hides a complex stack, one running from record custody to approved output.

The products examined here tackle document processing with varying concerns. As a tentative map of the market, we propose to follow the responsibility boundary across the complementary concerns of document custody, representation and review.

## Criteria cut across markets


| Criterion             | Layer                   | Responsibility                                                                     |
| --------------------- | ----------------------- | ---------------------------------------------------------------------------------- |
| Intake coverage       | Layer 1: custody        | Accept the source forms the workflow has to govern.                                |
| Provenance            | Layer 1: custody        | Keep each derivative and field attached to source, page, block and version.        |
| Governance            | Layer 1: custody        | Enforce access, audit, retention and data residency around the record.             |
| Layout fidelity       | Layer 2: representation | Preserve page structure needed for extraction and review.                          |
| Structured extraction | Layer 2: representation | Turn recognised content into fields with schemas, confidence and source locations. |
| Agent-readable output | Layer 2: representation | Produce a derivative an agent can read without losing needed structure.            |
| Review routing        | Layer 3: review         | Send uncertain or policy-sensitive output to a reviewer as work.                   |


Criteria grouped by stack layer.

The boundaries we see in the table correspond to specific product capabilities. A DMS can own the record and route the work, while saying little about how a table cell turned into a field. A cloud API can return polygons, spans and confidence, and be reticent about which version is authoritative. A converter can give the agent readable Markdown, while dropping the page geometry that a reviewer later needs.

The literature treats document intelligence as a pipeline of services. Quality checks, classification, extraction, rule evaluation, routing and human review each occupy a separate step, and model inference is only one operation in the system.[^13]

These considerations are not merely taxonomical. When procurement buys one layer and just assumes the adjacent ones, the gap shows up at implementation time. The team is left to reconstruct the missing controls: the location of the approved version, the reason an outdated policy was used and the person responsible for approving the extracted field.

The stack can be read through three responsibilities: custody of the record, representation of the record as usable derivatives and review of the output accepted for use.

## Layer 1: Custody

### DMS custody and workflow

We start from custody, since every later layer depends on it. Custody takes care that a source can be clearly archived, identified and referenced. The DMS is the institutional home of the document record. Its custody work covers intake, versions, permissions, routing, approvals, retention and audit.

We will analyse [Doxis](https://www.doxis.com/) as a worked example. SER Group renamed the company after its platform in Q1/2026, while repositioning as "The Document Intelligence Company". Furthermore, it installed the co-founder and CEO of the document-processing vendor it had acquired the previous year as Chief AI Officer.[^14] This repositioning is exemplary of a trend in established vendors putting document management and document AI into the same stack.

The custody surface can be broad. Doxis describes lifecycle management from intake to archive, routing into digital workflows and third-party systems, compliance tracking, ERP and CRM integrations, and platform certifications for security and compliance.[^12] Its no-code application layer lets users build workflow applications on top of the record system. The same offering covers governance and review routing. The record environment includes files, users, roles, retention rules and process state.

Doxis strategically decomposes automated text capture into capture from email, scanner, and portal, classification for document type and routing, extraction of fields with or without a predefined schema and validation for formal accuracy, duplicates, and missing required details.[^15] The acquired layer adds a set of utils, such as conversion to structured formats, anonymisation, verification against trusted sources, fraud detection and review by a human in the loop as a dedicated module.[^16]

The integration point is the transaction system around the record. Doxis offers business connectors for ERP, accounting, CRM and HR systems, vendor-specific SAP, Microsoft and Salesforce interfaces, and a universal ERP connector that can pass extracted invoice data into an ERP workflow.[^17][^18][^19] Those connectors sit at the boundary between custody and execution, letting the DMS keep the approved document record while SAP, Salesforce or another line-of-business system receives the data it needs to continue the process.

### Web intake as custody work

Not all sources come as files. For example, a policy page, institutional FAQ, public regulation or vendor manual may enter the workflow through a URL. We have previously treated this as a source-interface problem.[^20] Modern agentic workflows may choose a representation at access time, with Markdown, HTML, JSON, links and status fields preserving different classes of evidence.

As a comparison, legal scholarship treats URL evidence as an archive problem. A Harvard Law Review essay separates link rot from reference rot. Reference rot names the case where a URL still resolves but no longer contains the cited material, and the essay treats page capture at citation time as the remedy.[^21] Administrative workflows need the same separation. The URL identifies where the source was found, while the record is the captured snapshot that can be reviewed later.

## Layer 2: Representation

A working system needs to preserve identity, page location, confidence, approval state and custody while the file inevitably changes shape. A **derivative** is the agent-readable representation generated from a record, usually Markdown, JSON, HTML, or other structured document formats. The agent usually works on the derivative rather than the canonical file, which gives the model a readable surface with both enough structure for reasoning and enough provenance for review.

### Cloud APIs expose page mechanics

A hyperscaler API is a hosted document-analysis service from a large public cloud provider, such as Microsoft Azure, Google Cloud or AWS. The name comes from hyperscale computing, where infrastructure and software architecture scale as demand grows.[^22] Page mechanics are the concrete facts returned by this layer. They give the surrounding system an input contract and an evidence model. The input contract says which source files and output formats the service supports, while the evidence model ties extracted content back to page structure. A DMS or the run record in a prototype uses those facts to attach a derivative to the right source page or route a difficult page to another parser.

Among the reviewed cloud APIs, [Azure AI Document Intelligence](https://azure.microsoft.com/en-us/products/ai-foundry/tools/document-intelligence) covers a broad product surface. It accepts PDF, images, Office formats and HTML, and can return either JSON with page polygons and character spans or Markdown through a documented output mode.[^2]

```jsonc
// Azure layout response sketch
{
  "status": "succeeded",
  "analyzeResult": {
    "apiVersion": "2024-11-30",
    "modelId": "prebuilt-layout",
    "stringIndexType": "textElements",
    "content": "Payment due\nEUR 184.20",
    "pages": [
      {
        "pageNumber": 1,
        "width": 8.5,
        "height": 11,
        "unit": "inch",
        "words": [
          {
            "content": "184.20",
            "polygon": [4.12, 6.38, 4.78, 6.38, 4.78, 6.55, 4.12, 6.55],
            "confidence": 0.997,
            "span": { "offset": 16, "length": 6 }
          }
        ]
      }
    ],
    "paragraphs": [
      {
        "role": "sectionHeading",
        "content": "Payment due",
        "boundingRegions": [
          { "pageNumber": 1, "polygon": [0.92, 5.8, 2.1, 5.8, 2.1, 6.0, 0.92, 6.0] }
        ],
        "spans": [{ "offset": 0, "length": 11 }]
      }
    ]
  }
}
```

*Azure output ties text spans to page geometry.*

When plain Markdown cannot express tables with merged cells or multi-row headers, the service embeds HTML tables. It can also mark paragraph roles, section nesting and handwriting spans with confidence. A workflow can stay on this route when embedded text is enough. If it needs embedded Office images or workbook-level spreadsheet logic, the file has to go elsewhere.

[Google Document AI](https://cloud.google.com/document-ai) is organised around processors, meaning configured parser or extractor endpoints. The examples relevant here are processor families. A form parser extracts key-value pairs, tables and checkboxes without a schema. A layout parser produces structure-aware chunks. Custom extractors take a developer-defined schema and can run through a generative foundation model, a trained custom model or a fixed template.

```jsonc
// Google processor object sketch
{
  "type": "CUSTOM_EXTRACTION_PROCESSOR",
  "displayName": "travel-expense-invoices",
  "name": "projects/123456/locations/eu/processors/a1b2c3d4e5f6",
  "state": "ENABLED",
  "processEndpoint": "https://eu-documentai.googleapis.com/v1/projects/123456/locations/eu/processors/a1b2c3d4e5f6:process",
  "defaultProcessorVersion": "projects/123456/locations/eu/processors/a1b2c3d4e5f6/processorVersions/pretrained"
}
```

*A processor object gives the endpoint and version behind a configured extractor.*

The useful planning parameter is the amount of labelled data each route expects. Google's own table puts production quality at three documents for fixed templates, 10-100+ documents for custom models, and 0-50+ documents for foundation-model extraction depending on layout variation.[^3] This determines whether an administrative prototype can begin with a few sample forms or needs time for a preliminary labelling project.

[Amazon Textract](https://aws.amazon.com/textract/) is especially explicit about page geometry in this group, with a narrower input and output surface. Its analysis response returns text, forms, tables, queries, signatures and layout as blocks, and each block can carry page number, bounding box, confidence and relationships to other blocks.[^4] Queries answer developer-supplied questions with answer blocks, while adapters customise those query responses after the team uploads representative documents, annotates query answers and trains the adapter.[^23]

```jsonc
// Textract query request sketch
{
  "Document": {
    "S3Object": {
      "Bucket": "expense-intake",
      "Name": "receipt-page-1.png"
    }
  },
  "FeatureTypes": ["TABLES", "FORMS", "QUERIES", "LAYOUT"],
  "QueriesConfig": {
    "Queries": [
      {
        "Text": "What is the reimbursable total?",
        "Alias": "reimbursable_total",
        "Pages": ["1"]
      },
      {
        "Text": "What is the invoice date?",
        "Alias": "invoice_date",
        "Pages": ["1"]
      }
    ]
  }
}
```

*Textract queries turn business questions into named answer fields.*

[Mistral Document AI](https://mistral.ai/solutions/document-ai/) is closer to the representation role than to custody or review. Its primary output is a readable derivative of the page, with structure and hierarchy preserved in Markdown. Tables can arrive inline or as separate Markdown or HTML objects, and confidence is available at page or word granularity under a broad language-coverage claim.[^5] That is a good shape when the output is immediately passed to a language model, but a weaker choice when the next consumer is a database field that later has to be audited against the source pixel. A word-level confidence score on generated Markdown anchors provenance less tightly than a polygon on the original page.

```jsonc
// Mistral OCR derivative sketch
{
  "pages": [
    {
      "index": 0,
      "markdown": "# Receipt\n\nTotal due [tbl-0.html]\n\n![img-0.jpeg](img-0.jpeg)",
      "tables": [
        "<table><tr><th>Total</th><td>EUR 184.20</td></tr></table>"
      ],
      "images": [
        {
          "id": "img-0.jpeg",
          "top_left_x": 118,
          "top_left_y": 220,
          "bottom_right_x": 412,
          "bottom_right_y": 540,
          "image_base64": "..."
        }
      ],
      "dimensions": {
        "dpi": 200,
        "height": 2200,
        "width": 1700
      },
      "confidence_scores": {
        "average_page_confidence_score": 0.982,
        "minimum_page_confidence_score": 0.91,
        "word_confidence_scores": [
          { "word": "Total", "confidence": 0.99 },
          { "word": "184.20", "confidence": 0.96 }
        ]
      }
    }
  ],
  "model": "mistral-ocr-latest",
  "usage_info": {
    "pages_processed": 1
  }
}
```

*Mistral centres the derivative on Markdown, extracted tables and image metadata.*

The model-side literature explains why page mechanics remain necessary even when the parser is a vision-language model. OCR-first pipelines can lose layout and reading-order information before the model sees the page. Model-native readers remove the separate OCR step, but long documents still stress context windows, and small layout changes can change what the model reads.[^24][^25] Newer OCR systems keep structure explicit by detecting layout and reading order first, then recognising content inside page regions.[^26] Difficult pages move page mechanics inside the model pipeline before extraction.

### Hard-tail parsing

Cloud APIs handle ordinary page description. The hard tail starts when a page needs more than text, tables and coordinates from a general layout API. Dense tables, charts, handwriting, nested sections and mixed scans are common in administrative work, and a few such pages are enough to collapse the pipeline.

The ParseBench comparison makes the hard-tail problem visible at the level of the model by testing whether parsers preserve source text, keep reading order and avoid omissions or hallucinations on enterprise pages.[^27] In the published benchmarks, extra VLM capability and compute produced only marginal gains on the parsing metrics. Parser quality is better measured on document-parsing tasks rather than inferred from the model's general rank.[^27][^28]

[LlamaCloud](https://www.llamaindex.ai/llamacloud) decomposes the product surface into Parse, Extract, Classify, Split, Sheets and Index. The platform maps those products to LLM-ready text, schema-shaped JSON, document categories, concatenated-document separation, spreadsheet-like data and hosted vector search.[^6] The managed service acts as an escalation route for pages in which the baseline API does not preserve enough layout or visual structure.

Document-ETL research reaches the same result by treating the whole processing pipeline as an object of optimisation. Complex document tasks improve when the pipeline rewrites the task, decomposes the data and evaluates candidate plans before execution.[^29] Parsing therefore encompasses orchestration concerns: the pipeline chooses the representation, routes page regions that need specialist handling and passes only acceptable results to extraction.

### Open converters and local baselines

A converter is the component that turns the source file into a derivative that the rest of the stack can consume. A quick converter is useful when the first goal is to make a large source set readable, while a more structured parser is necessary when tables, reading order and layout objects will later be more thoroughly inspected. Research systems that construct knowledge from heterogeneous documents follow this split by separating layout, metadata and semantic layers, with human review deciding which extracted relations survive.[^30]

[MarkItDown](https://github.com/microsoft/markitdown) is a fast converter for LLM-readable text rather than a high-fidelity document renderer.[^7] It converts a wide intake range, from Office and PDF to images, HTML and archives, while preserving headings, lists, tables and links where it can. Scanned pages use a vision-model plugin, and Azure AI Document Intelligence supplies the cloud escalation path. Its value is early corpus access. A team can turn mixed files into rough Markdown quickly enough to inspect, search and prototype over them, then reserve structured parsing for the files whose tables, reading order or provenance need inspection.

[Docling](https://docling-project.github.io/docling/) is the structured parser in this group. It parses each input into a unified document representation and exports Markdown, HTML, structured text or lossless JSON.[^8] It operates a distinction in which Markdown helps the agent read, and lossless JSON records the parser's objects. A pipeline that stores both can answer later questions about a table cell or reading-order decision which a Markdown-only pipeline usually cannot.

[OCRmyPDF](https://ocrmypdf.readthedocs.io/en/stable/) and [Tesseract](https://github.com/tesseract-ocr/tesseract) define the local OCR baseline. OCRmyPDF wraps Tesseract to add a searchable OCR text layer to scanned PDFs, locally and deterministically.[^9] The baseline works for clean printed scans because it produces repeatable searchable PDFs, but it is the wrong tool for complex layout, tables and handwriting. It gives the pipeline a cheap first pass and the evaluator a control row for downstream paid layers.

The deterministic baseline has also gained a model-native neighbour, [Surya](https://github.com/datalab-to/surya), an open-source OCR toolkit that installs from the package index and can run through local inference backends.[^31] A local fast tier can now add model-native OCR before it reaches a cloud API.

[LiteParse](https://www.llamaindex.ai/blog/liteparse-local-document-parsing-for-ai-agents) extends the local tier from OCR into parsing. It gives a pipeline a first-pass parser that can run locally, while failed pages escalate to LlamaCloud or another frontier parser.[^32] The local baseline can therefore include parser logic as well as OCR, with managed escalation reserved for pages that fail the local pass.

The remaining tools are specialised use cases around representation. [Apache Tika](https://tika.apache.org/) belongs at intake, where email attachments and legacy repositories arrive in many formats, and Tika detects file types while extracting text and metadata across more than a thousand formats.[^33] [GROBID](https://grobid.readthedocs.io/en/latest/) belongs to scholarly PDFs, where PDF-to-TEI, references and citation links are the needed derivative.[^34] [Pandoc](https://pandoc.org/) belongs at publication, turning approved Markdown back into DOCX or PDF for circulation.[^35] These tools cover boundary cases around the representation layer and leave layout-aware extraction to a dedicated parser.

## Layer 3: Review

Review is where a proposed field becomes an institutionally accepted value. The parser supplies candidate evidence; the review layer gives it institutional standing by recording the review outcome under the governing source version.

### Validation workbenches

Validation products make the review role into a concrete system interface. [Rossum](https://rossum.ai/) sends empty fields and low-confidence fields into a validation stage and points the reviewer to the relevant document area.[^11] [ABBYY Vantage](https://www.abbyy.com/vantage/) treats human-in-the-loop review as the step used when validation rules fail or when a document class and extracted data need manual correction, and it ties those corrections to continuous learning and straight-through-processing analytics.[^36] [Instabase](https://www.instabase.com/product/ai-hub/automate) turns failed validations from a deployment into review tasks by file, run or packet, with queues, escalation groups and service-level targets.[^37]

Validation is packaged as an exception workbench. The reviewer typically receives a queue, a document viewer, a field editor, validation rules, assignment, and a return path into downstream systems. The review layer should therefore be evaluated through work objects as much as through extraction metrics. The useful ergonomics test for these systems is whether a low-confidence field becomes a traceable task with a source location, reviewer, decision and replayable output.

## A sample implementation

A travel-expense workflow shows a full stack traversal at a small scale. The design choice is to separate responsibilities before choosing parsers. The DMS owns the case, evidence, policy version and approval trail. The agentic workflow reads those records, routes each source to a parser, creates field records, sends unsafe fields to review and writes the approved result back to the case.

This sample implementation uses an arbitrary DMS for custody, LlamaIndex for orchestration, LiteParse for the local parser route, LlamaParse for the escalation parser route, Apache Tika for uncertain intake metadata, DMS-attached derivative storage, a finance review queue and an ERP connector for the approved reimbursement payload. We treat the optimal stack as the configuration that preserves provenance and acceptance state at the lowest operational cost.

The workflow keeps three source classes: the case, the evidence and the governing policy. The hotel invoice and ticket screenshot are ordinary evidence. The receipt photo is the escalation candidate. The travel-expense policy is versioned governing material.

Evidence enters through ordinary administrative channels, usually an email attachment or self-service portal upload. The DMS creates the case record, while the intake edge may still receive forwarded messages, screenshots, zipped attachments or files with weak MIME metadata. Apache Tika belongs at that edge, before parsing, where file type and attachment metadata have to be established before a routing decision is made.

```ts
// Workflow record types
type WorkflowCase = {
  caseId: string
  dmsRecordId: string
  employeeId: string
  policyRecordId: string
  state: "submitted" | "in_review" | "approved" | "rejected"
}

type WorkflowSource = {
  sourceId: string
  role: "evidence" | "governing_policy"
  kind: "hotel_invoice" | "b2c_receipt" | "ticket_screenshot" | "travel_expense_policy"
  dmsRecordId: string
  fileName?: string
  version?: string
  custodyState: "submitted" | "approved" | "restricted"
}

type RouteDecision = {
  routeId: string
  sourceId: string
  route: "local_liteparse" | "managed_llamaparse" | "manual_reject"
  parser?: "liteparse" | "llamaparse"
  reason: string
  outputVersion?: string
}

type DocumentDerivative = {
  derivativeId: string
  sourceId: string
  routeId: string
  format: "markdown" | "structured_json"
  uri: string
  contentHash: string
}

type ExtractedField = {
  fieldId: string
  sourceId: string
  routeId?: string
  name: "vendor" | "date" | "total_amount" | "currency" | "policy_clause"
  value: string
  location: { page?: number; bbox?: number[]; section?: string }
  confidence?: number
  reviewState: "accepted" | "queued" | "corrected" | "reference"
}

type ReviewTask = {
  taskId: string
  caseId: string
  fieldIds: string[]
  trigger: "low_confidence_total" | "policy_conflict" | "total_mismatch"
  assigneeGroup: "finance_ops"
  state: "open" | "accepted" | "corrected" | "rejected"
}

type ApprovalPayload = {
  caseId: string
  approvedAmount: string
  accountCode: string
  approvalState: "approved" | "rejected"
}
```

*Record types keep custody, routing, derivatives, review and approval separate.*

Custody stays narrow in this example. The case is the parent DMS record. Evidence documents attach to it, while the policy remains an approved record referenced by version.

LlamaIndex makes the routing decision source by source.[^32] The hotel invoice stays on LiteParse because stable text positions are enough for extraction. The photographed receipt escalates to LlamaParse because the reimbursable amount may depend on skew, faint print, discounts and tax lines.[^38] The route record gives the review layer the parser, reason and output version behind each field.

The two parser routes produce different derivatives, but the workflow treats both as records. Markdown gives the agent a readable surface for comparing the receipt, invoice and policy. Parser JSON keeps the layout objects a reviewer needs when a total or line item is questioned.

The field record is the unit the agent and reviewer can share. It keeps the extracted value with the source, page location, route id, confidence and review state. That is enough for the agent to compare the receipt total with the policy clause and for the reviewer to reconstruct the path that produced the value.

Review starts where the workflow cannot safely validate its output. Low-confidence totals, policy conflicts and total-line mismatches become review tasks for finance operations. Once a reviewer accepts or corrects the field, the ERP connector receives the approved amount, account and approval state.

![Sequence view of the travel-expense reimbursement workflow](/document-processing-prototype-sequence.png)

*Parser routing, review, and approved ERP handoff.*

The sequence view shows the same separation in time. Tika types uncertain intake files before the DMS creates the case and receives the approval record. LlamaIndex routes sources to LiteParse or LlamaParse. Extraction emits field records, the review layer records corrections, and ERP receives only the approved payload.

Provenance keeps the agentic part tied to the record system. Source links and page locations show why a value was accepted. Confidence decides whether the value enters the queue, but the source relation makes the decision reviewable.[^39][^40]

## Stack failure points

Failure points concentrate in representation and provenance. Tables, handwriting, dense layouts and mixed scan-and-digital bundles do not behave like ordinary text extraction. A reliable stack routes at page level, keeps simple pages on the local or lower-cost path, escalates difficult pages, and stops pages that lack enough evidence before extraction. Tables need a separate test set drawn from the institution's documents, because recent parsing research separates page layout, table structure and region-level recognition instead of treating them as one text-recognition problem.[^25][^26]

The provenance failure is less visible during a successful run. A derivative can fall out of step with the record. The DMS record can move on while an agent keeps reading an older corpus, and confidence scores only route review because they do not prove a field is correct.[^41] A reviewer can correct a wrong amount or date only when the field still points to the source version and page region that produced it.

At some point, further parser accuracy gives diminishing returns. The more useful investment is a fast review path: the system identifies uncertainty, routes it to the right owner, and turns the decision back into maintained product state. Model accuracy can reduce the number of exceptions, but review design determines whether the remaining exceptions stay cheap and bounded.

## Appendix: Competitor matrix by stack role


| Tool                                                            | Role           | Product category                   | Strongest position                                                             |
| --------------------------------------------------------------- | -------------- | ---------------------------------- | ------------------------------------------------------------------------------ |
| Doxis                                                           | custody        | enterprise content platform        | system of record, workflow routing, retention, audit and review surfaces       |
| Azure AI Document Intelligence                                  | representation | cloud document analysis API        | broad intake, page geometry, spans, confidence and Markdown output             |
| Google Document AI                                              | representation | cloud document extraction platform | schema-driven extraction, form parsing and foundation-model extraction         |
| Amazon Textract                                                 | representation | cloud document analysis API        | PDF and image intake, explicit geometry, confidence, forms, tables and queries |
| Mistral Document AI                                             | representation | Markdown-native document API       | page Markdown, HTML tables, word confidence and language coverage              |
| LlamaCloud                                                      | representation | managed frontier parser            | enterprise tables, charts and agentic parse/extract                            |
| Docling                                                         | representation | structured open converter          | lossless document object plus Markdown export                                  |
| MarkItDown                                                      | representation | fast open converter                | rapid corpus conversion into LLM-readable Markdown                             |
| Unstructured                                                    | representation | document partitioning library      | element partitioning with opt-in table structure                               |
| OCRmyPDF / Tesseract                                            | representation | local OCR toolchain                | deterministic searchable PDF/A and sidecar text                                |
| Apache Tika / GROBID / Pandoc                                   | representation | format utility set                 | file-type extraction, scholarly TEI and approved Markdown export               |
| Cloudflare crawl                                                | custody        | web capture API                    | site capture to Markdown, HTML or JSON with crawl status                       |
| ABBYY Vantage                                                   | review         | validation workbench               | human verification, correction workflow and continuous-learning analytics      |
| Rossum                                                          | review         | transactional validation queue     | low-confidence and empty-field review tied to document context                 |
| Instabase                                                       | review         | deployment review queue            | failed validations become review tasks with queues and service targets         |
| UiPath Document Understanding[^42]                              | review         | RPA document automation platform   | validation actions suspend and resume orchestration                            |
| Automation Anywhere Document Automation / Tungsten TotalAgility[^43][^44] | review         | business automation platform       | extraction, validation, routing and audit inside automation                    |
| Amazon Augmented AI[^45]                                        | review         | managed human-review service       | human-review workflows around ML predictions and Textract                      |


Tools grouped by their strongest stack responsibility.

---

[^1]: [OpenText Enterprise Content Management Software](https://www.opentext.com/products/content-management).
[^2]: [Azure AI Document Intelligence, layout model](https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/prebuilt/layout).
[^3]: [Google Document AI, extraction overview](https://docs.cloud.google.com/document-ai/docs/extracting-overview).
[^4]: [Amazon Textract, analyzing documents](https://docs.aws.amazon.com/textract/latest/dg/how-it-works-analyzing.html).
[^5]: [Mistral Document AI, OCR](https://docs.mistral.ai/studio-api/document-processing/basic_ocr).
[^6]: [LlamaParse platform quickstart](https://developers.llamaindex.ai/llamaparse/).
[^7]: [microsoft/markitdown](https://github.com/microsoft/markitdown).
[^8]: [Docling, supported formats](https://docling-project.github.io/docling/usage/supported_formats/).
[^9]: [OCRmyPDF](https://ocrmypdf.readthedocs.io/en/stable/).
[^10]: [Cloudflare Browser Run crawl endpoint](https://developers.cloudflare.com/browser-run/quick-actions/crawl-endpoint/).
[^11]: [Rossum, validation and correction](https://rossum.ai/data-Capture/).
[^12]: [Doxis Intelligent Content Automation](https://www.doxis.com/en/business-platform/doxis-intelligent-content-automation).
[^13]: [Ferreira et al., 2025](https://consensus.app/papers/details/8e44625670265298a8e2c2b05cb09bab/?utm_source=claude_desktop).
[^14]: [Doxis, Jan 2026](https://www.doxis.com/en/about-us/news-press/ser-group-rebrands-to-doxis).
[^15]: [Doxis Content Understanding](https://www.doxis.com/en/business-platform/content-understanding).
[^16]: [Klippa DocHorizon / Doxis AI.dp](https://www.klippa.com/en/dochorizon/).
[^17]: [Doxis business connectors](https://www.doxis.com/en/solutions/business-connectors).
[^18]: [Doxis ERP integration](https://www.doxis.com/en/solutions/erp-integration).
[^19]: [Doxis Salesforce integration](https://www.doxis.com/en/solutions/salesforce).
[^20]: ["Towards a Reliance Layer in Document Agents", source interfaces](/posts/towards-a-reliance-layer-in-document-agents/#3-source-interfaces).
[^21]: [Zittrain, Albert and Lessig, 2014](https://harvardlawreview.org/forum/vol-127/perma-scoping-and-addressing-the-problem-of-link-and-reference-rot-in-legal-citations/).
[^22]: [Red Hat, hyperscaler](https://www.redhat.com/en/topics/cloud-computing/what-is-a-hyperscaler).
[^23]: [Amazon Textract, customizing query responses](https://docs.aws.amazon.com/textract/latest/dg/textract-using-adapters.html).
[^24]: [Gao et al., 2025](https://consensus.app/papers/details/915461db8875515290d8dca927ceb53c/?utm_source=claude_desktop).
[^25]: [Duan et al., 2026](https://consensus.app/papers/details/50006a512b095cbcb6661f202914eb59/?utm_source=claude_desktop).
[^26]: [MonkeyOCR v1.5, 2025](https://arxiv.org/abs/2511.10390).
[^27]: [LlamaIndex ParseBench, Apr 2026](https://www.llamaindex.ai/blog/parsebench).
[^28]: [Liu, Jun 2026](https://x.com/jerryjliu0/status/2064519456966205905).
[^29]: [Shankar et al., 2024](https://consensus.app/papers/details/dc078e808481514684409faf434cc5e4/?utm_source=claude_desktop).
[^30]: [Sun et al., 2025](https://consensus.app/papers/details/d2c7ec831d695f5fb3a02d3cd10ae6b0/?utm_source=claude_desktop).
[^31]: [surya](https://github.com/datalab-to/surya).
[^32]: [LiteParse, local document parsing](https://www.llamaindex.ai/blog/liteparse-local-document-parsing-for-ai-agents).
[^33]: [Apache Tika](https://tika.apache.org/).
[^34]: [GROBID](https://grobid.readthedocs.io/en/latest/Principles/).
[^35]: [Pandoc](https://pandoc.org/).
[^36]: [ABBYY, human-in-the-loop verification](https://www.abbyy.com/ai-document-processing/human-in-the-loop-verification/).
[^37]: [Instabase, reviewing results](https://docs.instabase.com/automate/review).
[^38]: [LlamaParse overview](https://developers.llamaindex.ai/llamaparse/parse/).
[^39]: [Kale et al., 2022](https://consensus.app/papers/details/a44647b0932e5b889d31ab6ca157da06/?utm_source=claude_desktop).
[^40]: [Macdonald et al., 2025](https://consensus.app/papers/details/86c2f8963eeb577e8bf631ed37d991df/?utm_source=claude_desktop).
[^41]: [Alan engineering, Mar 2026](https://medium.com/alan/lessons-from-running-an-llm-document-processing-pipeline-in-production-33d87f99cdb1).
[^42]: [UiPath, Create Document Validation Action](https://docs.uipath.com/activities/other/latest/document-understanding/create-document-validation-action).
[^43]: [Automation Anywhere Document Automation](https://www.automationanywhere.com/products/document-automation).
[^44]: [Tungsten TotalAgility Features Guide](https://docshield.tungstenautomation.com/KTA/en_US/8.1.0-rmx0b1ux3q/print/TungstenTotalAgilityFeaturesGuide_EN.pdf).
[^45]: [Amazon Augmented AI with Amazon Textract](https://docs.aws.amazon.com/sagemaker/latest/dg/a2i-textract-task-type.html).