Skip to main content

Compare

Structurally compare two PDF documents and receive a detailed diff describing what changed. The comparison operates at the structural level — it analyzes metadata, pages, objects, forms, annotations, security properties, and fonts — not at the pixel or rendered-content level.


Endpoint

POST /api/compare
Content-Type: multipart/form-data
Authorization: Bearer <api-key>

Request

Provide two PDFs as multipart fields:

FieldTypeRequiredDescription
file_afile (binary)YesFirst PDF (the "before" document)
file_bfile (binary)YesSecond PDF (the "after" document)
normalize_firstbooleanNo (default false)Canonicalize both documents before comparing. Eliminates cosmetic differences caused by object ordering or xref style.
Quota usage

A plain compare counts as 1 normalization unit toward your monthly quota. When normalize_first=true, both documents are normalized first — the operation counts as 2 units.


Response 200 OK

Returns a StructuralDiffReport object.

{
"summary": {
"totalChanges": 7,
"severity": "minor",
"identical": false
},
"metadata": [
{ "field": "ModDate", "oldValue": "2024-01-10T09:00:00Z", "newValue": "2025-03-22T14:05:11Z" }
],
"pages": {
"added": [],
"removed": [],
"reordered": []
},
"objects": {
"added": 3,
"removed": 1,
"totalA": 142,
"totalB": 144
},
"forms": [
{ "fieldName": "Signature1", "action": "added", "oldValue": null, "newValue": null }
],
"annotations": [],
"security": [
{ "field": "SignatureCount", "oldValue": "0", "newValue": "1" }
],
"fonts": []
}

Response Schema

StructuralDiffReport

FieldTypeDescription
summaryDiffSummaryHigh-level summary
metadataMetadataDiffEntry[]Document metadata field changes
pagesPagesDiffPage additions, removals, and reorderings
objectsObjectsDiffAggregate PDF object count deltas
formsFormFieldDiff[]AcroForm field changes
annotationsAnnotationDiff[]Per-page annotation changes
securitySecurityDiffEntry[]Security-relevant property changes
fontsFontDiff[]Font additions and removals

DiffSummary

FieldTypeDescription
totalChangesintegerTotal count of individual changes across all dimensions
severitystring"none" | "minor" | "moderate" | "major"
identicalbooleantrue if no structural differences were found

MetadataDiffEntry

A change to a single document information field (PDF /Info dictionary).

FieldTypeDescription
fieldstringMetadata key, e.g. "Title", "Author", "ModDate"
oldValuestring | nullValue in file_a; null if the field was absent
newValuestring | nullValue in file_b; null if the field was removed

PagesDiff

FieldTypeDescription
addedPageInfo[]Pages present in file_b but not file_a
removedPageInfo[]Pages present in file_a but not file_b
reorderedPageReorder[]Pages that exist in both but at different positions

PageInfo

FieldTypeDescription
pageNumberinteger1-based page number
contentHashstringSHA-256 fingerprint of the page content (16 hex chars)

PageReorder

FieldTypeDescription
oldPositioninteger1-based page number in file_a
newPositioninteger1-based page number in file_b
contentHashstringSHA-256 fingerprint of the page content

ObjectsDiff

Approximate object-count delta based on the PDF cross-reference table.

FieldTypeDescription
addedintegerNet gain in PDF objects from A to B
removedintegerNet loss in PDF objects from A to B
totalAintegerTotal object count in file_a
totalBintegerTotal object count in file_b

FormFieldDiff

A change to a single AcroForm field.

FieldTypeDescription
fieldNamestringPDF field name (/T entry)
actionstring"added" | "removed" | "modified"
oldValuestring | nullField value in file_a; null when added or unset
newValuestring | nullField value in file_b; null when removed or unset

AnnotationDiff

A per-page annotation change.

FieldTypeDescription
pageinteger1-based page number
typestringAnnotation subtype, e.g. "Text", "Link", "Widget"
actionstring"added" | "removed" | "modified"

SecurityDiffEntry

A change to a security-relevant document property.

FieldTypeDescription
fieldstringProperty name, e.g. "Encrypted", "SignatureCount", "JavaScript", "OpenAction", "LaunchAction"
oldValuestringValue in file_a
newValuestringValue in file_b

FontDiff

A font added or removed between the two documents.

FieldTypeDescription
namestringFont base name, e.g. "Helvetica" or "ABCDEF+Arial"
actionstring"added" (in file_b but not file_a) | "removed" (in file_a but not file_b)

Error Responses

StatusDescription
400 Bad Requestfile_a or file_b missing from the request
401 UnauthorizedMissing or invalid API key
402 Payment RequiredMonthly quota exceeded

Code Examples

Node.js

import { PDFCanonClient } from '@pdfcanon/sdk';
import * as fs from 'fs';

const client = new PDFCanonClient({ apiKey: process.env.PDFCANON_API_KEY! });

// Compare two PDFs
const report = await client.compare(
'original.pdf',
'revised.pdf',
{ normalizeFirst: true }
);

if (report.summary.identical) {
console.log('Documents are structurally identical');
} else {
console.log(`Severity: ${report.summary.severity}`);
console.log(`Total changes: ${report.summary.totalChanges}`);

for (const meta of report.metadata) {
console.log(`Metadata change — ${meta.field}: "${meta.oldValue}" → "${meta.newValue}"`);
}

if (report.pages.added.length > 0) {
console.log(`Pages added: ${report.pages.added.map(p => p.pageNumber).join(', ')}`);
}
}

Python (sync)

import os
from pdfcanon import PDFCanonClient

client = PDFCanonClient(api_key=os.environ["PDFCANON_API_KEY"])

with open("original.pdf", "rb") as f_a, open("revised.pdf", "rb") as f_b:
report = client.compare(
f_a,
f_b,
file_name_a="original.pdf",
file_name_b="revised.pdf",
normalize_first=True,
)

if report.summary.identical:
print("Documents are structurally identical")
else:
print(f"Severity: {report.summary.severity}")
print(f"Total changes: {report.summary.total_changes}")

for entry in report.metadata:
print(f"Metadata — {entry.field}: '{entry.old_value}' → '{entry.new_value}'")

for font in report.fonts:
print(f"Font {font.action}: {font.name}")

Python (async)

import os, asyncio
from pdfcanon import AsyncPDFCanonClient

async def main():
client = AsyncPDFCanonClient(api_key=os.environ["PDFCANON_API_KEY"])

with open("original.pdf", "rb") as f_a, open("revised.pdf", "rb") as f_b:
report = await client.compare(
f_a,
f_b,
file_name_a="original.pdf",
file_name_b="revised.pdf",
)

print(f"Identical: {report.summary.identical}")
print(f"Changes: {report.summary.total_changes}")

asyncio.run(main())

cURL

curl -X POST https://api.pdfcanon.com/api/compare \
-H "Authorization: Bearer $PDFCANON_API_KEY" \
-F "file_a=@original.pdf" \
-F "file_b=@revised.pdf" \
-F "normalize_first=true"