For AI, ML & Diagnostics Developers

The World's Largest Standardized Veterinary Imaging Dataset

A uniquely clean, expert-labeled collection of small-animal MRI, CT, and ultrasound studies — acquired on identical hardware across three high-volume centers and read to a single standard by board-certified radiologists. Built for training and validating diagnostic AI. Now available for licensing and partnership.

3T MRI
Identical Philips Scanners
128-Slice
CT, All Sites
3 Centers
One Acquisition Standard
DACVR
Report on Every Study
Single Reader
All MRI Interpretations

Why this dataset is different from anything else available

Most medical imaging datasets are assembled after the fact from many machines, many sites, and many readers. That heterogeneity introduces noise that limits how well a model can learn — and it is expensive and slow to correct.

Sage Veterinary Imaging generates its data the opposite way. Three busy imaging centers run identical Philips 3T MRI, 128-slice CT, and Samsung ultrasound systems, with standardized patient positioning at every site. Every study is interpreted by a board-certified veterinary radiologist (DACVR), and all MRI studies are read by a single radiologist, Dr. Jaime Sage — producing the kind of label consistency that is normally impossible to buy at scale.

The result is a corpus that is standardized at the source, richly labeled, and continuously growing. For developers of veterinary diagnostic AI it is the missing ingredient; for multimodal foundation-model teams it is a high-quality, low-friction imaging domain that sidesteps the privacy constraints of human medical data — because animal imaging is not subject to HIPAA, and Sage owns the data outright.

What makes the data high-value for machine learning

🧲

Hardware homogeneity

Identical 3T MRI, 128-slice CT, and ultrasound across all three centers eliminates the cross-site domain shift that degrades multi-source datasets. Models train faster and generalize more reliably.

🎯

Single-reader MRI labels

Every MRI is read by one ACVR radiologist, giving near-zero inter-observer variability. Label consistency is the single biggest cost driver in medical AI — here it is built in.

📝

Paired image & report

Each study is linked to a finalized, structured radiologist report. This supports both supervised learning and vision-language (image-to-text) model training.

⚖️

Normals and abnormals

Curated normal anatomy — usually discarded elsewhere — sits alongside a broad spectrum of abnormal and diseased states, essential for screening and anomaly-detection models.

🔄

Continuously generated

Three active centers produce a steady prospective feed. Partners can subscribe to fresh data and even commission custom acquisition protocols.

🔓

Clean rights & no HIPAA

Animal data avoids human-privacy friction. Sage owns the data outright and de-identifies before release, enabling faster agreements and broader permitted uses.

Specifics of the corpus

The dataset is offered as de-identified DICOM studies paired with structured radiologist reports and study-level metadata. It can be filtered and licensed by modality, species, body region, and diagnosis.

AttributeDetail
Modalities3T MRI (Philips), 128-slice CT, ultrasound & echocardiography (Samsung), image-guided biopsy
SpeciesPrimarily canine and feline; additional small-animal species available
Body regionsBrain & spine (neurological), musculoskeletal/orthopedic, thorax, abdomen, head & neck
LabelsFinalized DACVR report per study: findings, impression, and diagnosis; structured metadata extractable
Class balanceBoth normal anatomy and abnormal/diseased states across all regions
FormatDe-identified DICOM + paired report text (plain text / JSON); custom export formats by arrangement
ProvenanceThree centers, identical scanners, homogeneous positioning, single-standard reads
GrowthContinuously expanding; prospective feed available by subscription

Exact study counts by modality, species, region, and diagnosis are provided under NDA as part of the dataset catalog.

What you can build with this data

The license defines permitted use; the data itself supports a wide range of development work, including:

Diagnostic classification — detect and categorize disease on MRI, CT, and ultrasound
Segmentation & quantification of anatomy, lesions, and organs
Vision-language models trained on image-report pairs for automated reporting
Anomaly & screening models built on curated normal baselines
Image reconstruction, denoising & acceleration on specific hardware
Foundation-model pre-training as a standardized imaging domain
Comparative-medicine research using dogs and cats as natural disease models
Benchmarking & validation against a sealed, expert-read holdout

Who we partner with

Veterinary AI

Diagnostic AI companies

Teams building radiograph, MRI, CT, or ultrasound interpretation tools who need clean cross-sectional data to expand beyond X-ray. Strong fit for field-of-use exclusivity.

Foundation Models

AI & foundation-model labs

Multimodal model developers seeking a high-quality, low-friction medical-imaging domain to supplement human corpora. Non-exclusive, large-volume licensing.

Equipment OEMs

Imaging manufacturers

MRI, CT, and ultrasound makers developing acquisition, reconstruction, and protocol-optimization models — on exactly the hardware this data was captured on.

Research

Pharma & comparative medicine

Pharmaceutical companies, CROs, and academic groups using companion animals as translational disease models for imaging biomarker development.

Evaluate the data before you commit

We provide a sample data room of de-identified showcase studies under NDA so your team can assess quality, labeling, and fit firsthand.

Ways to license the data

Data is licensed, never sold. Agreements are term-limited and can be sliced by modality, species, body region, and application — so scope maps precisely to your project.

EvaluationArranged privately

A bounded sample (typically 500–1,000 studies) for internal evaluation only, over a fixed term.

Use: assess data quality and fit. Terms negotiated privately and creditable toward a full license.
Non-Exclusive TrainingQuoted by scope

A defined corpus slice licensed for model training, term-limited (typically 2–3 years), no resale or sublicensing.

Use: train and ship models; same data may be licensed to others.
Field-of-Use ExclusivePremium

Exclusivity within a defined application (e.g. veterinary MRI diagnostics), with performance milestones to retain it.

Use: lock out competitors in your specific niche without acquiring the whole asset.
Data-Feed SubscriptionAnnual

Ongoing delivery of newly acquired studies and reports on a recurring schedule.

Use: keep models fresh; optionally commission custom acquisition protocols.
Validation & BenchmarkingPer engagement

Evaluation of your model against a sealed holdout set that is never licensed for training, scored by Sage radiologists.

Use: independent, expert-graded performance validation.
Build TogetherCo-development & equity

We bring the data, the radiology expertise, and ongoing acquisition; a technical partner brings the engineering and modeling. Together we build a product and share equity and revenue in the outcome.

Use: ideal for a team that wants the dataset as the foundation of a jointly owned product rather than a one-time license. Includes expert annotation from ACVR radiologists and a structured co-development agreement.

Built-in protections. All studies are de-identified before release. Every agreement prohibits resale, sublicensing, and any attempt to re-identify patients or solicit Sage's referring clinics, and includes audit rights. Whether trained models persist beyond the license term is negotiated and priced explicitly.

Getting access

1

Introductory call & NDA

Tell us your use case and target application. We sign a mutual NDA and share the dataset catalog.

2

Sample data room

Review de-identified showcase studies with paired reports to confirm quality and fit.

3

Scope & agreement

We define modality, volume, exclusivity, and term, then execute the matching license.

4

Secure delivery

De-identified data is delivered through a secure channel, with an optional ongoing feed.

Dataset licensing questions

Common questions from AI, diagnostics, and research teams.

Three high-volume centers run identical Philips 3T MRI, 128-slice CT, and Samsung ultrasound with homogeneous patient positioning, so the data is free of the cross-site domain shift that degrades most medical imaging datasets. Every study is paired with a board-certified radiologist report, and all MRI is read by a single radiologist, giving near-zero inter-observer label variability. High-field (3T) small-animal MRI at this volume is not believed to exist anywhere else.
Yes. Every study is linked to a finalized, structured radiologist report with findings and diagnoses, supporting both supervised learning and vision-language (image-to-report) model training. Both normal anatomy and abnormal or diseased states are included.
Through evaluation licenses, non-exclusive commercial training licenses, field-of-use exclusive licenses, prospective data-feed subscriptions, validation/benchmarking engagements, and a "build together" co-development option in which a technical partner shares equity in a jointly owned product. Licenses are term-limited and can be sliced by modality, species, body region, and application.
This is animal imaging data, which is not subject to HIPAA. Sage owns the data outright and de-identifies all studies before release, allowing faster agreements and broader permitted uses than comparable human-medicine datasets.
In many cases, yes. Because the three centers actively scan, partners can subscribe to a prospective feed and, by arrangement, commission custom acquisition protocols suited to a specific model-development need.
That is negotiated and priced explicitly in each agreement. The raw data must be deleted at term end with certification; whether models trained during the term may continue in use is a separate, defined term of the license.

Let's talk about your use case

Whether you're training a diagnostic model, building a foundation model, optimizing imaging hardware, or running comparative-medicine research, Sage's dataset is built to move your work forward. Reach out to start with an NDA and a look at the catalog.