Where Do Recent Small Molecule Clinical Candidates Come From?

Modern hit-finding technologies are incredible. Dean Brown and Jonas Bostrom at AstraZeneca have a very nice review out summarizing the hit-finding strategies for 66 clinical candidates published from 2016-2017. Some highlights include a clinical candidate from DNA-encoded libraries at GSK, the discovery of an RNA-binding drug candidate by PTC/Roche, and fragment-based programs with candidates 100,000x more potent than the original hits.

Lead generation has come a long way from the old days of scouring the world for natural products in soil, reefs, or tree bark. Significant advances in computing, robotics, molecular biology, and chemistry have greatly facilitated the identification of novel chemical matter for drug targets.

Of the 66 clinical candidates published in J. Med. Chem. from 2016-2017, only one (<1%) is natural product-based in the classic sense. While 42% of compounds were derived from known starting points, the remainder originated from hits identified by random screening (29%), structure-based drug design (14%), directed screening (8%), fragment-based lead generation (6%), and DNA-encoded library screening (1%).

Percentage of candidates published in J. Med. Chem. between 2016-2017 which emerged from each lead generation strategy.

Random Screening (29%)

In the era of classical pharmacology, it wasn’t uncommon for isolated compounds to go from phenotypic screens into the clinic without much modification or mechanistic understanding. While this strategy brought us the golden age of antibiotics, today’s indications and target-based drug discovery paradigm require more nuance than killing “bugs in a dish,” and random screening has evolved with the needs of our time.  Modern chemical libraries are more diverse, assays have proliferated with genetic engineering technologies, and better computing tools have made handling all the data possible on a PC.

One of the most impressive examples from this 2016-2017 J. Med. Chem. cohort is the SMN2-splicing modulator program run by PTC Therapeutics and Roche. (Naryshkin, N. A. et al. Science, 2014, 345, 688-693.) The original hit (below) was identified by PTC Therapeutics using a library of >200,000 compounds and a luciferase-reporter assay for desired SMN2 gene splicing. (Ratni, H. et al. J. Med. Chem. 2016, 59, 6086-6100.)

Though the first hit had activity only at concentrations >32 ┬ÁM, its optimization led to much more potent clinical candidates RG7800 (Woll, M. G. et al. J. Med. Chem. 2016, 59, 6070-6085.) and risdiplam (Ratni, H. et al. J. Med. Chem. 2018, 61, 6501-6517) for treatment of spinal muscular atrophy. Recently, these compounds have been shown to bind to SMN2 pre-mRNA directly [(a) Sivaramakrishnan, M. et al. Nature Communications, 2017, 8, 1476. (b) Wang, J. et al. PNAS, 2018, doi: 10.1073/PNAS.1800260115], and are rare examples of RNA-binding clinical candidates.(Warner, K. D., Hadjin, C. E., Weeks, K. M. Nat. Rev. Drug Discov. 2018, 17, 547-558.)

(Left) A random screen for splice-correcting activity resulted in very weak but optimizable hits. (Right) NMR-guided model of the binding location of SMN2 splicing modifiers. The compounds bind to SMN2 pre-mRNA (green) in a complex with U1-C protein (orange) and U1 snRNA-derived oligonucleotide (blue). Protons strongly affected by compound binding are shown in red. The molecules’ binding site within the RNA is represented by the pink hexagon.  Image reproduced from Sivaramakrishnan et al., Nature Communications, 2017, 8, 1476.

Directed Screening (8%)

The significant amount of target-specific drug discovery experience accumulated by our industry has enabled a shift from random high-throughput screening to more focused, directed screening based on knowledge of the target class.

Kinase programs are the best example where this is used, since ATP-competitive pharmacophore models (e.g. Type I, Type 1.5, Type II) are well-established for the various conformations of kinases (e.g. DFG-in, DFG-out). As an example, Sanofi’s MET inhibitor program (Ugolini, A. et al. J. Med. Chem. 201659, 7066-7074.) started with a kinase-focused inhibitor library using an unphosphorylated MET assay, (phosphorylated MET can potentially bias the assay against inactive (e.g. DFG-out) conformation binders, see Zhao, Z. et al. ACS Chem. Biol. 2014, 9, 1230-1241.) which led to the identification of a benzimidazole hit (below, left).

The initial hit was relatively unselective against other kinases such as CDK9, and subsequently, an internal benzimidazole library of ~2000 compounds from various other programs was used in a phosphorylated MET assay to identify a more selective second scaffold.  This second scaffold, which originated from an old anthelmitic program, was ultimately advanced to clinical candidate SAR125844.  The final compound was confirmed by X-ray crystallography to retain the Type I binding mode of the original screen hit, despite the intermediate use of a non-kinase-directed benzimidazole library.

Sanofi’s MET inhibitor program started with a kinase-focused screen, resulting in identification of the left benzimidazole. A follow-up screen of benzimidazoles resulted in the identification of the center compound. Optimization of this second lead led to the identification of SAR125844.

DNA-Encoded Libraries (1%)

Surprisingly, only one candidate from the group of 66 originated from DNA-encoded library (DEL) screening, (Goodnow, R. A. Jr., Dumelin, C. E., Keefe, A. D. Nat. Rev. Drug Discov. 201716, 131-147.) a RIPK1 inhibitor from GSK. (Harris, P. A. et al. J. Med. Chem. 201760, 1247-1261.)

My impression was that DEL had seen more recent success, but we may just need more time to see today’s hits manifest in clinical candidates, especially since DEL is often employed for newer, less biologically validated targets.

The GSK team first approached the RIP1 kinase target using a kinase-focused library which identified a number of type II (DFG-out) inhibitors.  The optimization of these type II leads was challenging due to high lipophilicities, poor solubility, and a range of off-target activities. A separate random screen of the GSK compound collection identified GSK’963, a small, highly potent compound which was limited by poor oral exposure in rodents.  The last approach GSK took was to screen RIPK1 against its in-house DNA-encoded libraries, which they had previously used with success on their soluble epoxide hydrolase program (Belyanskaya, S. L. et al. ChemBioChem, 2017, 18, 837-842.). The DEL campaign identified a remarkably potent and selective benzoxepinone (GSK’481) out of a library containing approx. 7.7 billion chemical warheads.(Harris, P. A. et al. J. Med. Chem. 201659, 2163-2178.)

Random screening and directed screening failed to identify starting points of the same quality as that identified in a massive DNA-encoded library.  A small modification of the screening hit led to the RIPK1 clinical candidate.

During the optimization campaign, the GSK team hypothesized that GSK’481 bound to the hinge region based on a homology model.  Later on, they were surprised to find that the compound actually binds in a Type III-like manner, making no interaction with the hinge at all!  Remarkably, optimized compound GSK2982772 has >10,000-fold selectivity against all 339 kinases tested.

(Left) GSK2982772 binds to the kinase ATP-binding site in a Type-III-like manner, occupying an allosteric pocket while making no interaction with the hinge region of RIPK1. (Right) GSK2982772 is extremely selective for RIPK1, with <50% inhibition of any other kinase tested at 10 uM inhibitor.  Images reproduced from Harris, P. A. et al. J. Med. Chem. 2017, 60, 1247-1261.

Structure-Based Drug Design (14%)

While structure-based drug design is commonly used during hit-to-lead and lead optimization, in this context it is referring to the generation of the initial hits, either through virtual screening of matter against a protein structure or analysis of structural data leading to de novo rational design of a new motif.

A great example from this 2016-2017 cohort comes from the identification of the spiro-oxindole scaffold for MDM2-p53 inhibitors more than a decade ago, (Ding, K. et al. J. Am. Chem. Soc. 2005127, 10130-10131.) which led to the development of Ascentage’s APG-115 and aided discovery of Roche’s 3rd-generation MDM2/p53 compound, idasanutlin. (Ding, Q. et al. J. Med. Chem. 201356, 5979-5983.)

A crystal structure of MDM2 complexed to p53 served as the starting point for the design effort.  Trp23 in p53 is a residue which is incorporated deep in the MDM2/p53 protein-protein interface.  A substructure search of indoles and oxindoles within natural products yielded several natural products such as alstonisine which contain a spiro-oxindole motif. Modeling studies showed that these natural products fit poorly within the MDM2 cleft, but the core spiro-oxindole motif appeared to be a good template for elaboration. Addition of appropriate groups to this core structure based on the structure of the MDM2/p53 interface near Trp23 led to lead compounds with double-digit nM inhibitory activity.

Example of a de-novo structure-based drug design process leading to spiro-oxindole MDM2/p53 inhibitors.

Fragment-Based Lead Generation (5%)

Fragment-based discovery generally involves screening small sets (~thousands) of much smaller (<200 Da) molecules, which typically bind with low (mM) affinity. (Erlanson, D. A. et al. Nat. Rev. Drug Discov. 2016, 15, 605-619.)

These weakly binding starting points require more sensitive biophysical assays such as surface plasmon resonance (SPR), NMR for small (<30 kDa) isotopically-labeled proteins, x-ray crystallography, or thermal shift assays. The four remarkable fragment-to-candidate pairs below [(a) Pfizer: Lee, K. L. et al. J. Med. Chem. 201760, 5521-5542. (b) Constellation: Albrecht, B. K. et al. J. Med. Chem. 201659, 1330-1339. (c) Abbvie: Wang, L. et al. J. Med. Chem. 201760, 3828-3850 and McDaniel, K. F. et al. J. Med. Chem. 201760, 8369-8384. (d) Merck: Scott, J. D. et al. J. Med. Chem. 2016, 59, 10435-10450.] illustrate how complex clinical candidates can emerge from very small starting points.  In some cases, the final candidates are more than 100,000 times as potent as the fragment hit!

Examples of hit-to-clinical pairs from fragment-based drug discovery programs between 2016-2017. Each program was able to take very small, weakly binding fragments and generate a highly potent candidate molecule with excellent drug properties.

Known Starting Points (42%)

Finally, just because a starting point is already known doesn’t mean the overall program is easier than a program that starts with a screen. Many of the programs with known starting points are back-up programs to address issues that were near-impossible to solve the first time around.  

Some great examples of candidates diverging from their known starting points are Pfizer’s non-systemic pan-JAK inhibitor, PF-06263276, (Jones, P. et al. J. Med. Chem. 2017, 60, 767-786.) and BMS’s reversible BTK inhibitor, BMS-986142, below. [(a) Watterson, S. H. et al. J. Med. Chem. 2016, 59, 9173-9200. (b) Liu, Q. et al. Bioorg. Med. Chem. Lett. 2015, 25, 4265-4269.] Sure, tofacitinib was a known starting point, but it’s hard to see how this helped just by glancing at PF-06263276. Both programs were guided in part by structure-based drug design, a testament to how advances in computational chemistry and protein sciences have made dramatic scaffold leaps more tractable.

Two hit-to-clinical pairs in which the final compound bears very little resemblance to the initial known chemical matter.

The difference between “known-to-candidate” and “screen-to-candidate” pairs was assessed quantitatively by the authors. They found that the Tanimoto maximum common substructure (MCSS) value, a measure of structural similarity between pairs of molecules, was generally low for the 66 hit-to-candidate pairs, regardless of whether the program initiated with a known compound or a screening hit.

Candidates which emerged from known starting points and candidates which emerged from starting points identified in a random screen had similar levels of “dissimilarity” between the starting points and their candidates.


It’s impressive to see how many powerful options there today are for generating starting points, and how good we have gotten at building on starting points regardless of whether they are known compounds or millimolar fragment hits. It’s the collective accomplishment of many fields, including chemistry, biology, physics, and computer science, and hundreds of organizations. I can’t wait to see what’s new when this type of review is written again in twenty years. A big thanks again to the AstraZeneca group for assembling this useful compilation.

Explore drughunter.com for more.

Join the Drug Hunter mailing list

to get free content and resources weekly. Trusted by > 5,500 drug hunters worldwide. Unsubscribe anytime.

This field is for validation purposes and should be left unchanged.

Join Subscribers from

…and hundreds more!