Yue Yu

School of CSE, Georgia Institute of Technology

Taken in Anchorage, Alsaka

Room E1317, CODA Building

756 W Peachtree St NW, Atlanta, GA 30308

Hello! I am a final-year PhD student at School of Computational Science and Engineering, Georgia Institute of Technology. I mainly work on the intersection of Large Language Models and Data-centric AI.

Before joining Georgia Tech, I obtained my bachelor’s degree (with honors) from the Department of Electronic Engineering, Tsinghua University in 2019, where I have also worked on spatio-temporal data mining under the supervision of Dr. Yong Li.

Feel free to drop me an email (yueyu at gatech dot edu) if you have any questions about my research, or general discussions about NLP.


Educations

Georgia Institute of Technology (2019 - Present)
Ph.D. in Computational Science and Engineering
GPA: 4.00/4.00
Thesis Topic: Towards Efficiently and Effectively Harnessing Large Pre-trained Models via Data-centric Lens.
Advisor: Prof. Chao Zhang

Tsinghua University (2015 - 2019)
B.Eng. in Electronic Engineering
GPA: 3.87/4.00 (Outstanding Graduate)
Research Focus: Spatio-temporal Data Mining [WWW 2019, UbiComp 2020], Recommender Systems [UbiComp 2019].
Advisor: Prof. Yong Li


Industrial Experience

Meta (May 2024 - Aug 2024)
Research Intern, GenAI (Llama Post-training Team)
Host: Rui Hou, Manager: Melanie Kambadur
Topic: Self-Critiquing Reward Models [Preprint].

NVIDIA (Jan 2024 - May 2024)
Research Intern, Applied Deep Learning Research Group
Host: Wei Ping, Manager: Mohammad Shoeybi
Topic: LLM Instruction Fine-tuning for Zero-shot Retrieval-Augmented Generation [NeurIPS 2024].

Google Research (May 2023 - Aug 2023)
Research Intern, News Understanding Group
Host: Jiaming Shen, Manager: Jialu Liu
Topic: LLM In-context Learning with Rationales [ACL 2024].

Microsoft Research (May 2021 - Aug 2021)
Research Intern, Productivity and Intelligence Group
Mentor: Chenyan Xiong, Manager: Arnold Overwijk
Topic: Zero-shot Dense Text Retrieval [EMNLP 2022].

IQVIA (May 2020 - Aug 2020)
Research Intern, Analytics Center of Excellence
Mentor: Cao (Danica) Xiao
Topic: Knowledge-enhanced Drug Interaction Prediction [Bioinformatics 2021].

News

Sep 25, 2024 Two papers are accepted to NeurIPS 2024 and Three papers are accepted to EMNLP 2024. Congratulations!
May 16, 2024 6 papers are accepted to ACL 2024 (4 Main Conf, 2 Findings).
Oct 25, 2023 Honored to receive the NeurIPS 2023 Scholar award!
Sep 22, 2023 3 papers are accepted to NeurIPS 2023. Thanks for my collaborators!
May 16, 2023 Checkout the recent publications: 2 first-author papers are accepted to ACL 2023 (1 Main Conf, 1 Findings), and 3 coauthored papers are accepted to KDD 2023. Thanks and Congratulations for my collaborators!

Selected Publications

  1. RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
    Yue Yu, Wei Ping, Zihan Liu, Boxin Wang, Jiaxuan You, Chao Zhang, Mohammad Shoeybi, and Bryan Catanzaro
    Proceedings of NeurIPS, 2024.
  2. Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
    Yue Yu*, Yuchen Zhuang*, Jieyu Zhang*, Yu Meng, Alexander Ratner, Ranjay Krishna, Jiaming Shen, and Chao Zhang
    Proceedings of NeurIPS (D&B Track), 2023.
  3. Cold-Start Data Selection for Better Few-shot Language Model Fine-tuning: A Prompt-based Uncertainty Propagation Approach
    Yue Yu, Rongzhi Zhang, Ran Xu, Jieyu Zhang, Jiaming Shen, and Chao Zhang
    Proceedings of ACL, 2023.
  4. COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning
    Yue Yu, Chenyan Xiong, Si Sun, Chao Zhang, and Arnold Overwijk
    Proceedings of EMNLP, 2022. (Oral)
  5. AcTune: Uncertainty-Based Active Self-Training for Active Fine-Tuning of Pretrained Language Models
    Yue Yu, Lingkai Kong, Jieyu Zhang, Rongzhi Zhang, and Chao Zhang
    Proceedings of NAACL, 2022. (Oral)