Erfan Shayegani 😈

About me

I’m a 4th-year PhD student at the Computer Science department of UC Riverside, where I’m fortunate to be advised by Prof. Nael Abu-Ghazaleh and Prof. Yue Dong.

My research focuses on the intersection of Generative AI and trustworthiness, particularly on Multimodal Language Models (LLMs/MLLMs) and AI Agents such as Computer-Use Agents (CUAs) with an emphasis on Alignment, Robustness, Safety, Ethics, Fairness, Bias, and Security/Privacy. Additionally, I’m deeply interested in advancing Multimodal Understanding, Reasoning, and Retrieval, as well as Expert Specialization, Personalization, Multilingual MLLMs. My work also involves exploring novel Evaluation methods, Reward Modeling, and Post-Training Algorithms (e.g., Machine Unlearning, Reinforcement Learning-based, etc.) for adaptive, steerable, and contextually aligned AI agents. I have been also working on integrating AR/VR and Mixed Reality (MR) with AI Agents.

I’ll be direct: the Dark Side of AI 😈 has always captivated me. I enjoy probing models from an adversarial perspective (Dopamine Rush 🌊🧨), because exposing alignment gaps is the fastest path to building safer, more robust systems.

In Summer 2025, I rejoined Microsoft as a Research Intern with the AI Frontiers and AI Red Team, working with Besmira Nushi, Roman Lutz, and Vibhav Vineet on the Safety and Trustworthiness of Computer-Use Agents (CUAs), introducing the “Blind Goal-Directedness” phenomenon.

In Summer 2024, I also had an incredible experience as a Research Intern at Microsoft Research , working with Javier Hernandez and Jina Suh. There, I had to pause my adversarial mindset and be the “good guy” 😈→😇; developing evaluation methods to measure empathy and user satisfaction in LLM chatbots, and training context-specific expert adapters to dynamically steer empathy based on user needs.

News ⬇️ (Scroll down)

Oct 2025: I will serve as a reviewer for ICLR 2026.
September 2025: Wrapped up my Microsoft internship with "Just Do It!? Computer-Use Agents Exhibit Blind Goal-Directedness". Check it out 👀! We are also on the top daily papers on HuggingFace 🤗 [ArXiv] [Top Daily Papers 🤗] [YouTube]
Summer 2025: I will get back to Microsoft Research for my 2nd internship! (Super excited 💥👨🏻‍💻!)
May 2025: 🎖🔥 Our paper "Layer-wise Alignment: Examining Safety Alignment Across Image Encoder Layers in Vision Language Models" was accepted for "Spotlight presentation(top 2.6% of 12,107 submissions) at ICML2025! [OpenReview] [News]
Oct 2024: I will serve as a reviewer for ICLR 2025.
Sep 2024: Our paper (Co-First Authored) on "Textual Unlearning" to solve "Cross-Modality Safety Alignment" was Accepted at EMNLP 2024 Findings - See y'all in Florida 🐊🌊🌴 [Paper]
Sep 2024: I successfully concluded my internship at Microsoft Research ; The best experience I could imagine and thankful to my whole team! Stay tuned for the research paper and the models (Cooking ... 👨🏻‍🍳🍳🔥)
Sep 2024: My work was cited in the "International Scientific Report on the Safety of Advanced AI". [Report]
Aug 2024: 👨🏻‍🎓 We gave a 3-hour tutorial on "AI Safety and Adversarial Attacks" at ACL 2024. [Material] [Paper]
July 2024: I gave a talk on AI Safety and AR/VR Security with implications on Human-Computer Interaction at MSR . [Slides]
July 2024: I presented my works on "Unlearning" and "Cross-Modality Safety Alignment" at McGill NLP group . [Site]
Summer 2024: I will be doing an internship at Microsoft Research in Summer 2024! (Thrilled 💥👨🏻‍💻)
June 2024: I'm honored to serve as a reviewer for NextGenAISafety 2024 at ICML! [ICML2024]
June 2024: 🏅🏆 I won the "Outstanding Teaching Award" of the CS department of UCR! (Grateful 🤗) [Award]
Mar 2024: My work on Cross-Modal Vulnerability Alignment in Vision-Language Models was accepted for a presentation at SuperAGI Leap Summit 2024! [Video] [SuperAGI]
Mar 2024: Our paper "That Doesn't Go There: Attacks on Shared State in Multi-User Augmented Reality Applications" has been accepted to USENIX SECURITY 2024! [paper]
Feb 2024: Gave a lightning talk on my AI Safety work at Cohere For AI! [Slides]
Jan 2024: 🎖🔥 Our paper "Jailbreak in Pieces: Compositional Adversarial Attacks on Multi-Modal Language Models" was accepted for "Spotlight presentation(top 5% of 7262 submissions) at ICLR2024! [OpenReview] [SlidesLive-Video] [YoutubeAInews]
Nov 2023: 🏆 Our paper "Jailbreak in Pieces: Compositional Adversarial Attacks on Multi-Modal Language Models" won the "Best Paper Award" at SoCal NLP 2023! [paper] [Award] [News1] [News2] [News3]
Sep 2023: Our paper "Vulnerabilities of Large Language Models to Adversarial Attacks" has been accepted for a tutorial to ACL2024! [paper]
Jul 2023: Yay! I did my own first paper :D! "Plug and Pray: Exploiting off-the-shelf components of Multi-Modal Models" [paper]
Apr 2023: I will be serving as the moderator & evaluator of student presentations at UGRS2023! [paper]

Education

Ph.D. in Computer Science at University of California, Riverside (Sep2022-Present)

B.Sc. in Electrical Engineering at Sharif University of Technology (2017-2022)

Ranked 68th among 150,000 participants in Iran Nationwide University Entrance Exam (Konkur)