OpenMedLLM-70B — OpenMedLLM

📊 Benchmark Performance

78.9%

GeneTuring (avg)

82.4%

ClinVar VUS

84.1%

Gene-Disease Assoc.

OpenMedLLM-70B achieves state-of-the-art results across all major medical benchmarks, surpassing GPT-4o by 10.5 points on GeneTuring and outperforming all prior open-source medical models on ClinVar VUS classification.

🏥 About OpenMedLLM-70B

OpenMedLLM-70B is an open-source large language model built by DeepCog.ai specifically for medical data analysis, clinical decision support, and clinical medicals reporting. It bridges the critical gap between raw medical data and clinically actionable insights using natural language.

Unlike general DNA sequence models (DNABERT, Evo2) which operate on raw nucleotide sequences, OpenMedLLM-70B is designed for clinical reasoning about medicals — answering complex questions like "What is the clinical significance of this BRCA1 variant?" or "Generate an ACMG classification report for this VCF."

Built on Llama-3 70B with continued domain-adaptive pre-training
Trained on ClinVar, NCBI dbSNP, OMIM, gnomAD, Ensembl, and 8M+ PubMed medicals papers
Aligned with DPO using 2.4M expert-annotated clinical decision support preference pairs
Supports 128K token context — process full clinical reports in one pass
Native VCF file input and structured JSON/ACMG report output

💻 Quick Start

python

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model
tokenizer = AutoTokenizer.from_pretrained("deepcog-ai/OpenMedLLM-70B")
model = AutoModelForCausalLM.from_pretrained(
    "deepcog-ai/OpenMedLLM-70B",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Example 1: Variant interpretation
prompt = """Interpret the following variant and provide ACMG classification:
Gene: BRCA1
Variant: c.5266dupC (p.Gln1756ProfsTer25)
Allele frequency (gnomAD): 0.000004
ClinVar submissions: 142 pathogenic, 0 benign"""

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=1024, temperature=0.1)
print(tokenizer.decode(output[0], skip_special_tokens=True))

bash — CLI

# Install and run via CLI
pip install openmedllm

openmedllm download deepcog-ai/OpenMedLLM-70B
openmedllm interpret --vcf patient.vcf --output report.json

📚 Training Data

ClinVar

2M+ clinically annotated variants with pathogenicity classifications and submitter evidence

gnomAD

Population allele frequencies across 125,000 exomes and 15,000 genomes

OMIM

7,000+ disease diagnosis with molecular mechanisms and inheritance patterns

PubMed Medicals

8M+ medicals research abstracts and 240K full-text papers from 2000–2025

⚠️ Limitations and Safety

OpenMedLLM-70B is intended for research and clinical decision support only. It should not replace the judgment of a board-certified clinical geneticist.

Performance degrades for rare variants with <5 ClinVar submissions
Not validated for somatic clinical decision support in oncology
Should be used with appropriate clinical informatics infrastructure
All outputs must be reviewed by a qualified clinician before patient use

📄 Citation

bibtex

@article{deepcog2026openmedllm,
  title   = {OpenMedLLM: An Open-Source LLM for Medical
             Data Analysis and Clinical Variant Interpretation},
  author  = {DeepCog AI Research Team},
  journal = {arXiv preprint},
  year    = {2026},
  url     = {https://openmedllm.org}
}