5 Minute Read
Beyond Public Data: How Proprietary ADMET Models Transform Drug Discovery

In the high-stakes world of pharmaceutical development, where a single drug can take 10-15 years and over $1 billion to bring to market, early prediction of a compound’s ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties can mean the difference between success and failure. While the industry has long relied on public datasets to train predictive models. There is a growing interest in the use of proprietary ADMET models trained on internal experimental data, which may offer improved accuracy and influence how medicinal chemists approach drug design. .
The ADMET Prediction Revolution
ADMET properties determine whether a promising compound will ultimately succeed as a therapeutic. Poor absorption, unexpected toxicity, or rapid metabolism can derail even the most potent drug candidates. Traditionally, these properties were assessed late in the development process through expensive and time-consuming experimental assays. Machine Learning models now enable the prediction of these characteristics in silico, allowing researchers to optimize compounds before synthesis and testing.
However, not all ADMET models are created equal. The quality and comprehensiveness of training data remain critical factors influencing model performance—and proprietary datasets provide certain advantages.
The Limitations of Public Training Data
Most commercially available ADMET prediction tools rely on public datasets compiled from scientific literature and open databases. While these resources provide valuable starting points, they suffer from several limitations:
- Publication Bias: Published data often features successful compounds and positive results, creating an incomplete picture of the chemical space and failure modes.
- Data Heterogeneity: Public datasets may combine results from different laboratories, assay conditions, and experimental protocols, introducing variability that can compromise model accuracy.
- Limited Scope: Public datasets may not sufficiently represent the specific chemical series or therapeutic areas of interest to individual pharmaceutical companies.
- Temporal Gaps: There can be significant delays between experimental work and publication, meaning public datasets may not reflect the most current understanding or methodologies.

The Power of Proprietary ADMET Training Data
Pharmaceutical companies often possess decades of internally generated experimental data, which can be leveraged for training ADMET models. These proprietary datasets offer several advantages:
Experimental Consistency and Quality Control
Internal datasets are typically developed using standardized protocols, consistent assay conditions, and rigorous quality control. When EMD Serono, the Healthcare business of Merck KGaA, Darmstadt, Germany, accumulated over 30 years of experimental ADMET data, they maintained consistent methodologies and documentation standards that ensure data reliability and reproducibility.
Comprehensive Chemical Space Coverage
Proprietary datasets capture the full range of compounds synthesized and tested within an organization, including failed experiments and negative results. This comprehensive coverage enables models to learn from both successes and failures.
Therapeutic Area Specialization
Companies often focus on specific therapeutic areas, accumulating deep expertise and extensive datasets within those chemical spaces. This allows for the development of highly accurate models tailored to specific compound classes and biological targets.
AIDDISON™: Proprietary ADMET Models in Action
The AIDDISON™ software platform is an example of how proprietary training data can transform ADMET prediction capabilities. Built on EMD Serono’s extensive experimental database, AIDDISON™ incorporates multiple sophisticated models that address key ADMET properties:
Absorption Models
CACO2 Permeability: The platform’s CACO2 permeability model predicts intestinal absorption using data from Caco-2 cell assays and is trained on a large set of compounds.
Distribution Models
Plasma Protein Binding (Fub): AIDDISON includes species-specific models for plasma protein binding, trained on thousands of compounds, to predict the unbound fraction available for pharmacological activity.
Metabolism and Excretion Models
Intrinsic Clearance (CLint): The platform features classification models for predicting hepatic clearance across species, supporting understanding of metabolic stability and potential drug-drug interactions.
Physicochemical Properties
Solubility Prediction: Solubility prediction models, trained on extensive datasets, help assess aqueous solubility, which is important for formulation development.
Toxicity Assessment
Hepatotoxicity (HepG2): And Cardiotoxicity (hERG): The platform includes models for predicting liver toxicity using HepG2 cell assays and for identifying potential cardiotoxicity through hERG inhibition predictions.
Integration into Medicinal Chemistry Workflows
The true value of proprietary ADMET models lies in their seamless integration into medicinal chemistry workflows. Rather than treating ADMET assessment as a late-stage checkpoint, these models enable:

Hit-to-Lead Optimization
During the hit-to-lead phase, medicinal chemists can use ADMET predictions to prioritize compounds with favorable drug-like properties. This early filtering saves significant time and resources by focusing synthetic efforts on the most promising candidates.
Lead Optimization Strategy
In lead optimization, ADMET models guide structure-activity relationship (SAR) studies by predicting how structural modifications will impact key properties. Chemists can explore larger chemical spaces virtually, identifying optimal substitutions and modifications before synthesis.
Multi-Parameter Optimization
Modern drug discovery requires balancing multiple competing objectives: potency, selectivity, ADMET properties, and synthetic accessibility. Proprietary ADMET models provide the accurate predictions necessary for effective multi-parameter optimization approaches.
Cross-Species Translation
Species-specific ADMET models enable better translation from preclinical models to human predictions, improving the reliability of early development decisions and reducing late-stage failures
The Competitive Advantage of Proprietary Models
Organizations leveraging proprietary ADMET models gain several competitive advantages:
Higher Prediction Accuracy: Models trained on high-quality, consistent internal data typically outperform those based on heterogeneous public datasets.
Reduced Development Timelines: More accurate early-stage predictions enable faster decision-making and reduce the number of synthesis-test cycles required.
Lower Development Costs: By identifying problematic compounds earlier, proprietary models help avoid expensive late-stage failures.
Strategic Differentiation: Unique predictive capabilities can guide research into chemical spaces that competitors might overlook or avoid.
The Future of ADMET Modeling
As the pharmaceutical industry continues to embrace AI-driven approaches, several trends are shaping the future of ADMET modeling:
Automated Experimentation
High-throughput automated platforms are generating ADMET data at larger scales, providing richer training datasets for machine learning models. This automation also ensures greater experimental consistency and reproducibility.
Multi-Modal Learning
Future ADMET models will integrate diverse data types—chemical structures, biological assays, omics data, and clinical outcomes—to provide more comprehensive predictions.

Explainable AI
As regulatory agencies increasingly scrutinize AI-driven decisions, explainable AI approaches will become essential for understanding and validating ADMET predictions.
Collaborative Modeling
Industry consortiums are exploring ways to share training data while preserving competitive advantages, potentially combining the benefits of proprietary datasets with broader chemical space coverage.
Transforming Drug Discovery Through Better Predictions
The evolution from public to proprietary ADMET models reflects a shift in how pharmaceutical companies approach drug discovery. By leveraging internal experimental data, organizations can develop predictive models that both accurate and relevant to their specific research objectives.
As the industry continues to face pressure to reduce development timelines and costs while improving success rates, proprietary ADMET models represents one approach to making more informed decisions earlier in the discovery process. The adoption of such models is expected to grow, with continues innovation and collaboration playing key roles in maximizing their impact.
In this era of AI-driven drug discovery, high quality data, whether public or proprietary, remains a critical foundation for competitive advantage and improved patient outcomes.