Alphunt

-

Idea in Artificial Intelligence

Introduction

I am building an AI research scientist platform in therapeutics unifying diverse grounded-science data sources under a data model and expose a tool set for retrieving and analyzing said data and running ML algorithms on it. An LLM Tool-based Agent system can handle diverse user queries and autonomously develop and execute research plans. Our wedge product targets peptide-based rational design of protein degraders (ie. PROTACs) optimized against cancer drug resistance and BBB permeability.


Problem

Global pharmaceutical and biotech companies, patient-centric organizations, and research institutions in the public and private institutions will pay for a turn-key machine learning system via which novel therapies are generated reliably, predictably, and instantaneously to serve areas of greatest unmet medical need. We focus on peptide-based-PROTAC therapeutics for cancer and CNS disorders, where acquired drug resistance and BBB permeability present greatest challenge respectively.


Opportunity

We are developing an AI research scientist based on experimental data, utilizing an API-first design that unifies diverse, science-grounded data sources under a data model. The system provides tools for retrieving, analyzing and running machine learning algorithms on the data, powered by a custom API built over TDC-2. Leveraging the TDC-2 framework as the backbone for data and ML tooling, and utilizing the GPT API as the LLM engine, our cross-domain platform can handle a wide range of user queries and autonomously develop and execute research plans. The system incorporates advanced tools such as Stanford DsPy, LangChain, and HuggingFace to enhance the LLM Agent framework, making it adaptable to any domain with rich, exposed datasets.

Alejandro Velez-Arce is the Chief Architect of TDC-2 and lead author of the TDC-2 paper.

In addition to the general research framework, we also built specific components for more in-depth research functionalities, starting with our Target Discovery Engine.

Our target prioritization strategy utilizes geometric deep learning to generate protein embeddings based on extensive single-cell RNA-seq and protein-protein interaction datasets (PINNACLE). We then predict the viability of these targets as candidates for clinical trials, tailored to specific cell and tissue contexts. To further refine our target list, we integrate a reverse translation framework, HINDsight (Heterotypic Interaction and Node Detection), which leverages disease specific clinical data to predict treatment response with bulk RNA-seq, and identifies key receptor ligand interactions as target candidates. A critical element of this framework is a cell-cell interaction network, where different immune cell types are connected with each other and each edge is associated with patient outcomes. As a proof of concept, we found that CD47-SIRPa interactions between malignant B cells and macrophages is negatively correlated with treatment response in more than 1700 Lymphoma patients. CD47-SIRPa is a well known "Do Not eat me" signal, allowing cancer cells to evade phagocytosis by macrophages. Our research strategy not only capitalizes on large-scale single-cell datasets but also elucidates mechanisms of action associated with target candidates. Ultimately, we believe that a data-driven approach grounded in human clinical data, combined with state-of-the-art AI and comprehensive single-cell datasets, will improve the translatability and de-risk drug development.