Yale Research – Atharva Tyagi

Yale School of Medicine

Chen Lab (Mentor: Ardavan Abiri)

Overview

At the Yale School of Medicine, I worked in the Sidi Chen Lab under Ardavan Abiri (PhD student) on a computational immunology project called EnVax. The goal of EnVax was to model thymic negative selection to predict the antigenicity of proteins, allowing researchers to determine whether a given sequence may trigger an autoimmune response. The project aimed to create a machine learning system that could screen therapeutic proteins and vaccine candidates for autoimmunity risk before clinical testing. My work combined transcriptomic analysis, single-cell data integration, and computational modeling to construct a predictive map of self-tolerance in the human thymus.

Modeling Thymic Negative Selection to Predict Protein Antigenicity

Research Focus

Thymic negative selection is the process that eliminates developing T cells which react strongly to self-antigens. This mechanism prevents autoimmune disease by ensuring that only non-self-reactive T cells mature. The AIRE transcription factor drives the expression of thousands of tissue-specific proteins in the thymus, enabling thymocytes to encounter a comprehensive set of self-antigens.

Our model used this biological foundation to build a computational equivalent. We integrated expression data from the human thymus to establish a database of self-proteins and developed a learning system that could compare new sequences to this reference set. The model was designed to predict whether a protein would be tolerated or eliminated based on its resemblance to self-antigens encountered during thymic selection.

My Contributions

Collected and processed thymus gene expression data from CZ CELLxGENE Discover, GEO DataSets, and bulk RNA-sequencing repositories.
Used Seurat v5 to structure and visualize thymic expression data across major cell types.
Helped design the initial EnVax pipeline to predict antigenicity by comparing user-input protein sequences to thymus-expressed proteins.
Assisted in developing deep learning algorithms capable of identifying sequence motifs associated with immune tolerance or autoimmunity
Proposed applications of the model in drug and vaccine development, focusing on improving the safety and reliability of biologic therapeutics.

Outcomes and Impact

The EnVax project created one of the first harmonized thymic expression datasets suitable for large-scale antigenicity modeling. It provided a foundation for predicting immune tolerance through computational methods and contributed to the Chen Lab’s broader research in computational vaccine design and autoimmunity modeling.

My work advanced the data-processing and machine learning components of EnVax, strengthening its predictive accuracy and translational relevance.

The project was later discontinued following disruptions in funding, but the methodologies and datasets we developed remain integral to future efforts in computational immunology.

Reflection

Working at Yale showed me that computation is not separate from biology but an instrument to understand it more deeply. Every dataset represented a living process, and every model was a hypothesis about how the immune system distinguishes self from non-self. Collaborating with researchers who operated at the intersection of data science and immunology taught me how rigorous analysis can reveal principles that guide therapy and prevention. This experience strengthened my goal to merge biology, computation, and translational thinking to design safer, smarter, and more precise approaches to medicine.