The World's Largest Standardized Veterinary Imaging Dataset
A uniquely clean, expert-labeled collection of small-animal MRI, CT, and ultrasound studies — acquired on identical hardware across three high-volume centers and read to a single standard by board-certified radiologists. Built for training and validating diagnostic AI. Now available for licensing and partnership.
Why this dataset is different from anything else available
Most medical imaging datasets are assembled after the fact from many machines, many sites, and many readers. That heterogeneity introduces noise that limits how well a model can learn — and it is expensive and slow to correct.
Sage Veterinary Imaging generates its data the opposite way. Three busy imaging centers run identical Philips 3T MRI, 128-slice CT, and Samsung ultrasound systems, with standardized patient positioning at every site. Every study is interpreted by a board-certified veterinary radiologist (DACVR), and all MRI studies are read by a single radiologist, Dr. Jaime Sage — producing the kind of label consistency that is normally impossible to buy at scale.
The result is a corpus that is standardized at the source, richly labeled, and continuously growing. For developers of veterinary diagnostic AI it is the missing ingredient; for multimodal foundation-model teams it is a high-quality, low-friction imaging domain that sidesteps the privacy constraints of human medical data — because animal imaging is not subject to HIPAA, and Sage owns the data outright.
What makes the data high-value for machine learning
Hardware homogeneity
Identical 3T MRI, 128-slice CT, and ultrasound across all three centers eliminates the cross-site domain shift that degrades multi-source datasets. Models train faster and generalize more reliably.
Single-reader MRI labels
Every MRI is read by one ACVR radiologist, giving near-zero inter-observer variability. Label consistency is the single biggest cost driver in medical AI — here it is built in.
Paired image & report
Each study is linked to a finalized, structured radiologist report. This supports both supervised learning and vision-language (image-to-text) model training.
Normals and abnormals
Curated normal anatomy — usually discarded elsewhere — sits alongside a broad spectrum of abnormal and diseased states, essential for screening and anomaly-detection models.
Continuously generated
Three active centers produce a steady prospective feed. Partners can subscribe to fresh data and even commission custom acquisition protocols.
Clean rights & no HIPAA
Animal data avoids human-privacy friction. Sage owns the data outright and de-identifies before release, enabling faster agreements and broader permitted uses.
Specifics of the corpus
The dataset is offered as de-identified DICOM studies paired with structured radiologist reports and study-level metadata. It can be filtered and licensed by modality, species, body region, and diagnosis.
| Attribute | Detail |
|---|---|
| Modalities | 3T MRI (Philips), 128-slice CT, ultrasound & echocardiography (Samsung), image-guided biopsy |
| Species | Primarily canine and feline; additional small-animal species available |
| Body regions | Brain & spine (neurological), musculoskeletal/orthopedic, thorax, abdomen, head & neck |
| Labels | Finalized DACVR report per study: findings, impression, and diagnosis; structured metadata extractable |
| Class balance | Both normal anatomy and abnormal/diseased states across all regions |
| Format | De-identified DICOM + paired report text (plain text / JSON); custom export formats by arrangement |
| Provenance | Three centers, identical scanners, homogeneous positioning, single-standard reads |
| Growth | Continuously expanding; prospective feed available by subscription |
Exact study counts by modality, species, region, and diagnosis are provided under NDA as part of the dataset catalog.
What you can build with this data
The license defines permitted use; the data itself supports a wide range of development work, including:
Who we partner with
Diagnostic AI companies
Teams building radiograph, MRI, CT, or ultrasound interpretation tools who need clean cross-sectional data to expand beyond X-ray. Strong fit for field-of-use exclusivity.
AI & foundation-model labs
Multimodal model developers seeking a high-quality, low-friction medical-imaging domain to supplement human corpora. Non-exclusive, large-volume licensing.
Imaging manufacturers
MRI, CT, and ultrasound makers developing acquisition, reconstruction, and protocol-optimization models — on exactly the hardware this data was captured on.
Pharma & comparative medicine
Pharmaceutical companies, CROs, and academic groups using companion animals as translational disease models for imaging biomarker development.
Evaluate the data before you commit
We provide a sample data room of de-identified showcase studies under NDA so your team can assess quality, labeling, and fit firsthand.
Ways to license the data
Data is licensed, never sold. Agreements are term-limited and can be sliced by modality, species, body region, and application — so scope maps precisely to your project.
A bounded sample (typically 500–1,000 studies) for internal evaluation only, over a fixed term.
Use: assess data quality and fit. Terms negotiated privately and creditable toward a full license.A defined corpus slice licensed for model training, term-limited (typically 2–3 years), no resale or sublicensing.
Use: train and ship models; same data may be licensed to others.Exclusivity within a defined application (e.g. veterinary MRI diagnostics), with performance milestones to retain it.
Use: lock out competitors in your specific niche without acquiring the whole asset.Ongoing delivery of newly acquired studies and reports on a recurring schedule.
Use: keep models fresh; optionally commission custom acquisition protocols.Evaluation of your model against a sealed holdout set that is never licensed for training, scored by Sage radiologists.
Use: independent, expert-graded performance validation.We bring the data, the radiology expertise, and ongoing acquisition; a technical partner brings the engineering and modeling. Together we build a product and share equity and revenue in the outcome.
Use: ideal for a team that wants the dataset as the foundation of a jointly owned product rather than a one-time license. Includes expert annotation from ACVR radiologists and a structured co-development agreement.Built-in protections. All studies are de-identified before release. Every agreement prohibits resale, sublicensing, and any attempt to re-identify patients or solicit Sage's referring clinics, and includes audit rights. Whether trained models persist beyond the license term is negotiated and priced explicitly.
Getting access
Introductory call & NDA
Tell us your use case and target application. We sign a mutual NDA and share the dataset catalog.
Sample data room
Review de-identified showcase studies with paired reports to confirm quality and fit.
Scope & agreement
We define modality, volume, exclusivity, and term, then execute the matching license.
Secure delivery
De-identified data is delivered through a secure channel, with an optional ongoing feed.
Dataset licensing questions
Common questions from AI, diagnostics, and research teams.
Let's talk about your use case
Whether you're training a diagnostic model, building a foundation model, optimizing imaging hardware, or running comparative-medicine research, Sage's dataset is built to move your work forward. Reach out to start with an NDA and a look at the catalog.