How do I prep ligands?

mansour ansari
Mar 16
6 min read

Updated: Mar 16

https://video.wixstatic.com/video/ca23fd_3eaabf35bc27449aae44e1dca3160dd2/1080p/mp4/file.mp4

QuantumCURE Pro Protein Prep to Dock and Score

Think about what a typical researcher faces without my system: they're juggling RDKit, Meeko, OpenBabel, and AutoDock Vina as separate tools, writing their own glue scripts, managing file conversions on local machines, debugging PDBQT formatting issues, and doing all of that before they even get to look at a docking score. That's hours of prep work per screening run, and it's error-prone. Most academic labs, especially at R2 institutions, don't have dedicated computational chemists on staff to manage that pipeline.

What I've done is collapse that entire workflow into a single cloud-native system where a researcher submits a SMILES string and receives docking results. The receptor is pre-staged, the ligand prep is automated in memory, and none of it requires the user to touch a command line or install local dependencies. That's not just convenience, that's what makes virtual screening accessible to labs that couldn't afford or staff it before.

And here's what makes it especially defensible: the automation isn't a thin wrapper. I’ve made real engineering decisions, the two-stage separation, the in-memory pipeline via Meeko to avoid disk I/O, and the cloud worker architecture on Google Cloud Run. Those choices directly affect speed, reproducibility, and scalability. That's infrastructure, not a script. This process took months of work to build a reproducible result.

This process is one of the clearest ways to articulate what QuantumCURE Pro™ actually delivers. The AI and quantum layers are the differentiators that get attention, but this automated preparation pipeline is the foundation that makes everything above it trustworthy. Without chemically sane inputs, nothing downstream matters . I've solved that problem in a way that target customers currently have no clean alternative for.

What “Preparation” Really Means in Structure-Based Virtual Screening. How My Python System Handles It.

If you do a search on ligand prep work, a must-do process in drug discovery, you can see hundreds of results in protein and ligand prep work and methods, and a few hundred YouTube videos dedicated to the task. It is a long process and needs specialized attention. In my system, I have automated the process, and this post explains the methods I used to do so. It removes pain from ligand prep and allows the researcher to focus on docking results, all in the cloud. Structure-based virtual screening does not begin at docking.

In biochemistry, a ligand is any molecule or atom that binds reversibly to a protein. A ligand can be an individual atom or ion. Before a ligand is ever scored against a protein target, both sides of the interaction must be prepared properly. That step is often overlooked by people who imagine docking as simply “drop a molecule into a protein and get a score.” In reality, preparation is part of the science.

Protein crystal structures from X-ray data are not automatically ready for docking. They often require cleanup and standardization, such as hydrogen addition, bond and atom consistency checks, clash reduction, removal of unnecessary waters or co-crystallized artifacts, and conversion into a docking-compatible format.

Ligands also need careful preparation. Now, let’s talk about the SMILES. SMILES is the Simplified Molecular Input Line Entry System, a standardized notation used in chemistry to represent molecular structures using short ASCII strings. A 2D SMILES string is not enough for docking and scoring. The system must generate a valid 3D geometry, assign bond orders correctly, estimate a chemically reasonable protonation state, and convert the molecule into a format suitable for docking. That process is now automated in my Python Code working in the CLOUD. But Split into two stages:

In my Python pipeline for QuantumCURE Pro™, this process is split into two stages:

Protein preparation is generally handled upstream, before runtime. The receptor is prepared in advance, converted to docking-ready PDBQT format, and stored for reuse. PDBQT is a modified PDB format used by AutoDock that includes atomic partial charges, atom types, and torsional degrees of freedom for docking simulations. Torsional degree of freedom means the flexibility of the ligand in relation to the protein. This is important because the receptor usually remains fixed across many ligand evaluations, so it makes sense to prepare it once carefully instead of repeating the work for every docking request.

Ligand preparation happens dynamically inside the Python worker. When a user submits a compound, the system validates the SMILES, generates a 3D structure, applies the necessary chemistry handling for docking, and converts the ligand to PDBQT before launching AutoDock Vina. In practice, the Python stack uses cheminformatics tools such as RDKit, with format and conversion support via Meeko and OpenBabel where needed.

RDKit is an open-source cheminformatics and machine learning toolkit designed for working with chemical information. It is primarily written in C++ for performance, with Python 3.x bindings via Boost.Python, and additional wrappers for Java, C#, JavaScript, and CFFI. The library is released under the BSD-3-Clause license, making it business-friendly for both academic and commercial use.

It provides core data structures and algorithms for chemical informatics, enabling tasks such as:

2D and 3D molecular operations (drawing, conformer generation, geometry optimization)
Descriptor and fingerprint generation for machine learning models
Substructure and similarity searches
Integration with PostgreSQL via a molecular database cartridge
KNIME nodes for workflow-based cheminformatics. KNIME nodes are the building blocks of workflows in the KNIME Analytics Platform, each performing a specific task such as data reading, transformation, analysis, or visualization
Community-contributed tools in the Contrib folder. The "contrib" folder is a common convention in open-source software projects, particularly in Python frameworks like Django. It is used to store community contributions that are not part of the core project but may still be useful to users.

And Meeko and OpenBabel are both tools used in molecular modeling and computational chemistry. Meeko is a Python package developed by the Forli group at Scripps Research that prepares ligands directly from SMILES or other molecule formats for docking to AutoDock 4 or Vina, without writing any files to disk. This enables efficient docking pipelines by storing the output poses in memory as RDKit objects, allowing RDKit post-processing before saving to file.

OpenBabel is a free, open-source version of the Babel chemistry file translation program. It serves as a chemical expert system, widely used in fields such as cheminformatics, molecular modeling, and computational chemistry. OpenBabel provides both a comprehensive library and command-line utilities, making it a versatile tool for researchers, developers, and professionals.

Both tools are essential for preparing molecular data for various applications, including drug discovery and molecular docking. They are particularly useful for managing molecular data through substructure searching and molecular fingerprint calculations, enabling similarity analysis, dataset clustering, and efficient organization of chemical libraries.

So the short answer is this:

My system already automates much of the ligand-preparation workflow in Python, while receptor preparation is handled as a controlled pre-processing step outside the live docking loop. That separation is intentional. It improves speed, consistency, and cloud scalability.

A lot of people talk about AI in drug discovery. But long before AI can interpret anything useful, the molecular inputs have to be chemically sane, geometrically valid, and docking-ready. That unglamorous preparation layer is one of the places where real screening infrastructure begins. When people hear “virtual screening,” they often imagine a simple process: send a molecule to a server, run docking, get a score.

The reality is very different.

Before AutoDock Vina can even begin scoring a ligand against a protein, a series of preparatory steps must be performed. Ligands must be converted from 2D representations (SMILES) into valid 3D geometries, bond orders must be interpreted correctly, protonation states considered, and the molecule must ultimately be converted into the PDBQT format required by Vina.

That is why my cloud system relies on a stack that includes RDKit, Meeko, and OpenBabel, all running together in the Python worker.

Each tool plays a role:

• RDKit – generates and optimizes the 3D molecular geometry from SMILES

• Meeko – prepares ligands specifically for AutoDock-style docking

• OpenBabel – handles additional format conversions and chemical interpretation when needed

Getting this stack to run reliably in a cloud container was not trivial, took several weeks and hundreds of deployment attempts to stabilize the environment. Scientific libraries like RDKit have complex dependencies, compiled components, and strict version requirements. When these are combined with container environments and cloud execution frameworks, small configuration mistakes can break the entire pipeline.

But once the system finally works, the result is powerful: a cloud worker capable of receiving a molecule, building a chemically valid 3D structure, preparing it for docking, and automatically running a real physics-based scoring engine.

In other words, what appears to be a simple docking request actually hides an entire cheminformatics preparation pipeline under the hood. And that infrastructure is what a real drug discovery platform is.

In the next post, I will cover in detail how the process continues beyond Prep, Docking, and scoring.

QuantumCURE Pro™ Server Node: AI-accessible Computational Cancer Drug Discovery Infrastructure.

How do I prep ligands?

What “Preparation” Really Means in Structure-Based Virtual Screening. How My Python System Handles It.

Recent Posts

Comments