Intelligence Profile
Tower AI
End-to-end AI due diligence platform. Built for M&A lawyers, asset managers, and real estate deal teams who need to collect, organise, and extract structured insights from messy data rooms -- replacing disconnected VDRs, Excel trackers, and email threads with a single collaborative workspace.
Due Diligence AI End-to-End Workflow Data Room Organisation Document Extraction SOC 2 Type II YC W25
Rich coverage
Q1 2026 -- Run #2
520 tasks -- CaliperDiligence-v1
Frontier update:  GPT-5.5 (April 2026) released with improved long-document reasoning and multi-step instruction following. Baseline recalculation in progress for cross-document synthesis and workflow organisation tasks. Updated gap scores publish within 14 days.
Q3 2025
Q4 2025
Q1 2026
Q2 2026
Capability Assessment Independent -- Q1 2026
Tower is an infrastructure product first and an extraction product second. The relevant benchmark question is not just whether it extracts clauses accurately -- it is whether the end-to-end workflow of collecting, organising, and querying a real data room produces reliable, decision-grade output. That is a harder question than document-level accuracy, and it is the right one for the buyers Tower serves.
1
Where the product leads
On document extraction and citation traceability -- the tasks that underpin the entire workflow -- Tower performs above the category average and within 6 points of the GPT-5.4 frontier on the Lab's structured extraction task battery. The product's structural edge is the combination of extraction capability with workflow infrastructure: automated data room organisation, document naming, missing document flagging, and role-based access controls in a single platform. This combination is what general-purpose LLMs cannot replicate without significant integration overhead.
  • Citation traceability on contract extraction: 88.4%, above the 81% category average. Structured answers are grounded in the source document with page-level references.
  • Automated data room organisation accuracy -- correct folder structure and naming conventions applied: 84.1% on the benchmark data room set. Materially faster than manual organisation.
  • Change-of-control and material contract flag detection: 86.7% recall on the Lab's annotated M&A contract set. Above category average of 79%.
2
The frontier question
The frontier is improving at 3.6 points per quarter on document extraction tasks. Tower's extraction gap from GPT-5.4 is 5.8 points -- manageable but compressing. The product's durable advantage is the workflow layer: data collection, request list management, missing document detection, team collaboration, and access controls are features that a frontier model accessed via API cannot provide without Tower-equivalent infrastructure. The extraction engine is the commodity risk. The workflow is the moat.
  • Document extraction gap: 5.8 points from frontier. At current frontier velocity, extraction-only products face meaningful compression by Q3 2026.
  • Workflow and collaboration features have no frontier equivalent. These are not subject to the same compression dynamic and represent the product's defensible differentiation.
3
Decision implication
For M&A lawyers and asset management teams, the buy decision for Tower is not primarily about extraction accuracy -- it is about whether replacing disconnected VDRs, Excel trackers, and email threads with a unified platform saves enough time to justify the switch. The Lab's workflow benchmarks suggest it does: automated data room organisation alone reduces the setup time in the benchmark test by an estimated 60--70% versus manual organisation. The AmLaw 50 and Chambers Band 1 deployment signal is the strongest commercial validation in the product's current category. For teams with high-volume transaction pipelines, the workflow infrastructure case is strong regardless of extraction accuracy relative to the frontier.
4
What the data does not yet cover
  • Asset management and real estate diligence workflows have not been separately benchmarked. Current scores reflect M&A legal diligence task sets only.
  • Cross-document synthesis accuracy -- finding the same clause across 200 contracts and surfacing discrepancies -- is a separate task type not yet in the CaliperDiligence dataset.
  • Integration reliability with third-party document sources (SharePoint, Google Drive, existing VDRs) has not been independently tested.
  • Panel signal is based on 16 practitioners, all M&A legal. Asset management and CRE segments require additional panel cycles.
Benchmark Scorecard vs. GPT-5.4 baseline -- 520 tasks
Tower AI
Frontier (GPT-5.4)
Formula generation from natural language L1
91.4vs93.8-2.4
Error detection -- logical correctness L2
94.2vs95.1-0.9
Scenario and sensitivity build L3
82.7vs89.4-6.7
Cross-sheet model restructuring L4
67.3vs81.4-14.1
Analytical judgment and assumption-setting L5
54.1vs73.2-19.1
Vendor Claim Verification Source: withtower.com
"Ask questions to thousands of documents at once and receive structured answers backed by verifiable citations"
verified Cross-document querying with citation traceability of 88.4% -- above category average. The structured answer format with page-level source references is consistent across the benchmark document set. Verified as the product's strongest independently testable claim.
"Organize and rename entire data rooms with custom naming conventions and folder structures"
verified Automated data room organisation accuracy of 84.1% on the benchmark data room set. The naming convention and folder structure application is consistent and materially faster than manual organisation. The strongest workflow-layer claim and a genuine differentiator from point extraction tools.
"1,000+ live use cases driving real business outcomes"
not independently tested Use case count is a commercial claim not verifiable through capability benchmarking. The AmLaw 50 and Chambers Band 1 deployment signal is strong market validation. YC W25 backing adds further institutional credibility. The claim is plausible given the product's multi-segment coverage across legal, asset management, real estate, and serial acquirers.
Frontier intelligence
Current frontier -- GPT-5.4
89.3
Weighted avg -- M&A document extraction tasks
Frontier velocity
+3.6 pts / qtr
Document extraction -- steady
Extraction gap runway
2 qtrs
Extraction-only vendors face parity pressure -- workflow moat is the differentiator
The extraction layer faces frontier compression. The workflow infrastructure layer -- data collection, organisation, collaboration, access controls -- does not. Tower's strategic position depends on whether the workflow layer commands sufficient switching cost as extraction becomes commoditised.
Practitioner signal n=16 -- M&A legal teams
Output acceptance rate
76% +7pp
Verify before use
54% -6pp
Workflow abandonment
8% flat
Trust trajectory
Building
Top correction type
Cross-document synthesis depth
76% acceptance on extraction and organisation tasks. Declining verification rate signals growing trust in the extraction layer. Practitioners flag cross-document synthesis as the capability they most want to see improve.
Score trajectory Tower AI weighted avg score
Higher bar = stronger performance vs. frontier
Q3 25Q4 25Q1 26
71.4Q3 2025
76.8Q1 2026
Methodology
Dataset
CaliperDiligence-v1 -- 520 tasks
Baseline
GPT-5.4 (Mar 2026)
Scoring L1-L2
Extraction F1 + organisation accuracy
Scoring L3-L5
LLM-as-judge + M&A lawyer review
Ground truth
Expert-constructed -- kappa 0.86
Run date
26 March 2026
Representative profile for discussion -- all scores and findings are illustrative, based on the Lab's published methodology applied to Tower AI's publicly stated capabilities. Full benchmark data will be published upon completion of the formal evaluation programme. thecaliperlab.com