Can’t you just use AI to duplicate your cancer drug discovery platform?
- mansour ansari

- 6 days ago
- 4 min read

I get this question a lot lately, especially now that “AI coding” looks like magic on Twitter and Facebook or Instagram: prompt → app → shipped. And yes, AI can help. I use it. It can write functions, refactor code, generate UI scaffolds, explain libraries, and speed up research. But if someone thinks they can duplicate a real cancer drug discovery system by “vibe coding” a few prompts into existence, they’re confusing typing code with building an engine.
This isn’t a mainstream CRUD app where the hardest parts are login, billing, and UI polish. A drug discovery platform is an orchestration problem across physics, chemistry, cloud infrastructure, data provenance, data science, several databases, data reproducibility, and scientific interpretation. And every piece has sharp edges, difficulties, and obstacles in integration and updating. let me explain:
1) Cloud deployment isn’t “upload to AWS or Google and done.”
To run large-scale docking, you’re not deploying a website, what you are doing is deploying a production compute pipeline:
Container builds that actually reproduce the same results everywhere (Linux libs, RDKit, Open Babel, PDB tools, Vina binaries, GPU/CPU differences).
Cloud Run / batch workers / job queues / retries / timeouts / concurrency tuning.
Storage architecture for proteins, ligands, results, logs, and provenance.
IAM permissions that don’t accidentally expose your buckets to the world.
Cost control so a bug doesn’t turn into a thousand-dollar surprise overnight.
An AI can generate Terraform or Dockerfiles. But it won’t sit well with you when faced with the painful reality of dependency conflicts, cold starts, memory limits, broken builds, and “it worked locally” failures that come from scientific toolchains.
2) QRNG extraction is not a buzzword; it’s an engineering subsystem
“Add quantum randomness” sounds easy until you actually do it.
Pulling entropy from real QRNG hardware means dealing with device interfaces, throughput limits, buffering, SDKs, encoding, health checks, error modes, automation, and data integrity. Then you still have to answer: where does that entropy go in the pipeline?
Are you seeding docking?
Sampling conformers?
Driving Monte Carlo steps?
Tagging result lineage (“this lead was QRNG-seeded”)?
If the entropy injection isn’t traceable and repeatable, it becomes decoration. If it is traceable, you need file formats, metadata, provenance, versioning, and validation rules.
3) Quantum glyph design isn’t “make a cool icon.”
A glyph system is effectively a symbolic compression layer, a language, built from the behavior of the search process. That’s not just art. It’s mapping:
trajectory patterns → topology/motifs
collapse signatures → symbolic tags
run metadata → consistent encoding
visual identity → stable grammar
And once you invent the glyphs, you now own the hardest responsibility: making them consistent, interpretable, and usable across runs, proteins, targets, and entropy sources. AI can generate images. But designing a symbolic system that stays meaningful after 100,000 runs is a whole different job.
4) Integration is the real monster
Most people underestimate this: the hard part isn’t docking, or QRNG, or the UI, it’s stitching them into a single system without breaking scientific meaning.
You end up building:
standardized file formats (JSON schemas, validation, forward compatibility)
result normalization across tools and versions
workflow coordination across services and languages
reproducibility controls (seed logging, version pinning, provenance)
error handling for scientific failures (bad PDBs, missing residues, protonation issues, ligand parsing failures)
Mainstream apps don’t have “protein structure failed to protonate correctly so the binding site is wrong and your ‘best compound’ is fiction.”
5) Docking nuances: it’s easy to get results, hard to get truth
AutoDock Vina will give you a number. That doesn’t mean you have a lead.
Real docking work means understanding:
exhaustiveness tradeoffs and pose diversity
scoring function limitations and false confidence
protonation/tautomer states
receptor prep, binding site definitions, flexible residues
RMSD clustering, rescoring, consensus scoring
decoys, controls, and sanity checks
AI can produce code that runs docking. It can’t magically guarantee the docking output is scientifically meaningful. That takes human judgment, chemistry knowledge, and validation discipline.
6) Cheminformatics is a full discipline (and it bites)
You don’t “just parse SMILES.”
You need to handle:
stereochemistry, tautomers, salts
conformer generation rules
filtering libraries (PAINS, reactive groups, toxicity alerts)
synthetic accessibility vs novelty
similarity clustering and scaffold analysis
Cheminformatics is where “it runs” becomes “it lies.” A sloppy pipeline can look productive while quietly producing junk.
7) IC50, OMICS, and SKALA aren’t plug-ins, they’re additional worlds
People casually say: “Add IC50 prediction.” “Add omics.” “Use SKALA.” That’s not a checkbox list. Each layer is a new research-grade subsystem:
IC50 prediction requires data assumptions, models, confidence intervals, calibration, and awareness of domain shift.
OMICS integration means data harmonization, identifiers, batch effects, cohort selection, and biological interpretation.
SKALA-style electron distribution insights (or any advanced AI chemistry layer) adds new dependencies, new data, new failure modes, and new validation requirements.
So the “UI that covers everything” isn’t a pretty dashboard. It’s a careful interface to complex decisions, where one wrong default can mislead users.
8) D-Wave libraries and quantum integrations aren’t copy/paste either
Quantum tooling isn’t like installing a React library.
You’re dealing with:
embedding constraints
QUBO/Ising formulation choices
sampling behavior and bias
latency, cost, batching, queueing
file formats that preserve the meaning of a quantum run log
And if your “quantum piece” isn’t scientifically connected to the rest of the pipeline, it becomes marketing. Making it real means careful design and traceability.
The bottom line
AI can absolutely accelerate development. But it doesn’t remove the work, it shifts the work.
Instead of spending all day writing code, you spend your time doing the harder thing: architecting, validating, integrating, debugging edge cases, and defending scientific meaning.
So when someone asks: “How hard would it be to duplicate your project with AI?”
My answer is:
If your goal is a demo that looks like drug discovery, AI can get you something fast. If your goal is a system that’s scalable, reproducible, scientifically defensible, and integrates cloud compute + docking + QRNG + quantum logs + glyph semantics + cheminformatics + IC50 + omics… that’s not vibe coding. That’s engineering a new machine.
And that’s why I’m building it. Because the real value isn’t “an app.”It’s the integration of hard components into a pipeline that actually works.



Comments