Tower AI -- Intelligence Profile -- The Caliper Lab

Intelligence Profile

Tower AI

End-to-end AI due diligence platform. Built for M&A lawyers, asset managers, and real estate deal teams who need to collect, organise, and extract structured insights from messy data rooms -- replacing disconnected VDRs, Excel trackers, and email threads with a single collaborative workspace.

Due Diligence AI End-to-End Workflow Data Room Organisation Document Extraction SOC 2 Type II YC W25

Rich coverage

Q1 2026 -- Run #2
520 tasks -- CaliperDiligence-v1

Frontier update: GPT-5.5 (April 2026) released with improved long-document reasoning and multi-step instruction following. Baseline recalculation in progress for cross-document synthesis and workflow organisation tasks. Updated gap scores publish within 14 days.

Q3 2025

Q4 2025

Q1 2026

Q2 2026

Capability Assessment Independent -- Q1 2026

Tower is an infrastructure product first and an extraction product second. The relevant benchmark question is not just whether it extracts clauses accurately -- it is whether the end-to-end workflow of collecting, organising, and querying a real data room produces reliable, decision-grade output. That is a harder question than document-level accuracy, and it is the right one for the buyers Tower serves.

Where the product leads

On document extraction and citation traceability -- the tasks that underpin the entire workflow -- Tower performs above the category average and within 6 points of the GPT-5.4 frontier on the Lab's structured extraction task battery. The product's structural edge is the combination of extraction capability with workflow infrastructure: automated data room organisation, document naming, missing document flagging, and role-based access controls in a single platform. This combination is what general-purpose LLMs cannot replicate without significant integration overhead.

Citation traceability on contract extraction: 88.4%, above the 81% category average. Structured answers are grounded in the source document with page-level references.
Automated data room organisation accuracy -- correct folder structure and naming conventions applied: 84.1% on the benchmark data room set. Materially faster than manual organisation.
Change-of-control and material contract flag detection: 86.7% recall on the Lab's annotated M&A contract set. Above category average of 79%.

The frontier question

The frontier is improving at 3.6 points per quarter on document extraction tasks. Tower's extraction gap from GPT-5.4 is 5.8 points -- manageable but compressing. The product's durable advantage is the workflow layer: data collection, request list management, missing document detection, team collaboration, and access controls are features that a frontier model accessed via API cannot provide without Tower-equivalent infrastructure. The extraction engine is the commodity risk. The workflow is the moat.

Document extraction gap: 5.8 points from frontier. At current frontier velocity, extraction-only products face meaningful compression by Q3 2026.
Workflow and collaboration features have no frontier equivalent. These are not subject to the same compression dynamic and represent the product's defensible differentiation.

Decision implication

For M&A lawyers and asset management teams, the buy decision for Tower is not primarily about extraction accuracy -- it is about whether replacing disconnected VDRs, Excel trackers, and email threads with a unified platform saves enough time to justify the switch. The Lab's workflow benchmarks suggest it does: automated data room organisation alone reduces the setup time in the benchmark test by an estimated 60--70% versus manual organisation. The AmLaw 50 and Chambers Band 1 deployment signal is the strongest commercial validation in the product's current category. For teams with high-volume transaction pipelines, the workflow infrastructure case is strong regardless of extraction accuracy relative to the frontier.

What the data does not yet cover

Asset management and real estate diligence workflows have not been separately benchmarked. Current scores reflect M&A legal diligence task sets only.
Cross-document synthesis accuracy -- finding the same clause across 200 contracts and surfacing discrepancies -- is a separate task type not yet in the CaliperDiligence dataset.
Integration reliability with third-party document sources (SharePoint, Google Drive, existing VDRs) has not been independently tested.
Panel signal is based on 16 practitioners, all M&A legal. Asset management and CRE segments require additional panel cycles.

Benchmark Scorecard vs. GPT-5.4 baseline -- 520 tasks

Tower AI

Frontier (GPT-5.4)

Formula generation from natural language L1

91.4vs93.8-2.4

Error detection -- logical correctness L2

94.2vs95.1-0.9

Scenario and sensitivity build L3

82.7vs89.4-6.7

Cross-sheet model restructuring L4

67.3vs81.4-14.1

Analytical judgment and assumption-setting L5

54.1vs73.2-19.1

Vendor Claim Verification Source: withtower.com

"Ask questions to thousands of documents at once and receive structured answers backed by verifiable citations"

verified Cross-document querying with citation traceability of 88.4% -- above category average. The structured answer format with page-level source references is consistent across the benchmark document set. Verified as the product's strongest independently testable claim.

"Organize and rename entire data rooms with custom naming conventions and folder structures"

verified Automated data room organisation accuracy of 84.1% on the benchmark data room set. The naming convention and folder structure application is consistent and materially faster than manual organisation. The strongest workflow-layer claim and a genuine differentiator from point extraction tools.

"1,000+ live use cases driving real business outcomes"

not independently tested Use case count is a commercial claim not verifiable through capability benchmarking. The AmLaw 50 and Chambers Band 1 deployment signal is strong market validation. YC W25 backing adds further institutional credibility. The claim is plausible given the product's multi-segment coverage across legal, asset management, real estate, and serial acquirers.

Frontier intelligence

Current frontier -- GPT-5.4

89.3

Weighted avg -- M&A document extraction tasks

Frontier velocity

+3.6 pts / qtr

Document extraction -- steady

Extraction gap runway

2 qtrs

Extraction-only vendors face parity pressure -- workflow moat is the differentiator

The extraction layer faces frontier compression. The workflow infrastructure layer -- data collection, organisation, collaboration, access controls -- does not. Tower's strategic position depends on whether the workflow layer commands sufficient switching cost as extraction becomes commoditised.

Practitioner signal n=16 -- M&A legal teams

Output acceptance rate

76% +7pp

Verify before use

54% -6pp

Workflow abandonment

8% flat

Trust trajectory

Building

Top correction type

Cross-document synthesis depth

76% acceptance on extraction and organisation tasks. Declining verification rate signals growing trust in the extraction layer. Practitioners flag cross-document synthesis as the capability they most want to see improve.

Score trajectory Tower AI weighted avg score

Higher bar = stronger performance vs. frontier

Q3 25Q4 25Q1 26

71.4Q3 2025

76.8Q1 2026

Methodology

Dataset

CaliperDiligence-v1 -- 520 tasks

Baseline

GPT-5.4 (Mar 2026)

Scoring L1-L2

Extraction F1 + organisation accuracy

Scoring L3-L5

LLM-as-judge + M&A lawyer review

Ground truth

Expert-constructed -- kappa 0.86

Run date

26 March 2026

Representative profile for discussion -- all scores and findings are illustrative, based on the Lab's published methodology applied to Tower AI's publicly stated capabilities. Full benchmark data will be published upon completion of the formal evaluation programme. thecaliperlab.com