Yue Yu

School of CSE, Georgia Institute of Technology

Taken in Anchorage, Alsaka

Room E1317, CODA Building

756 W Peachtree St NW, Atlanta, GA 30308

Hello World! I am a 5th year PhD student at School of Computational Science and Engineering, Georgia Institute of Technology. I mainly work on the intersection of Large Language Models and Data-centric AI.

Before joining Georgia Tech, I obtained my bachelor’s degree (with honors) from the Department of Electronic Engineering, Tsinghua University in 2019, where I have also worked on spatio-temporal data mining under the supervision of Prof. Yong Li.

Feel free to drop me an email (yueyu at gatech dot edu) if you have any questions about my research, or general discussions about NLP.

I am actively seeking industrial R&D opportunities, beginning after Summer 2024. I am happy to engage in discussions regarding potential opportunities!


Educations

Georgia Institute of Technology (2019 - Present)
Ph.D. in Computational Science and Engineering
GPA: 4.00/4.00
Thesis Topic: Towards Efficiently and Effectively Harnessing Large Pre-trained Models via Data-centric Lens.
Advisor: Prof. Chao Zhang

Tsinghua University (2015 - 2019)
B.Eng. in Electronic Engineering
GPA: 3.87/4.00 (Outstanding Graduate)
Research Focus: Spatio-temporal Data Mining [WWW 2019, UbiComp 2020], Recommender Systems [UbiComp 2019].
Advisor: Prof. Yong Li


Industrial Experience

NVIDIA (Jan 2024 - May 2024)
Research Intern, Applied Deep Learning Research Group
Host: Wei Ping, Manager: Mohammad Shoeybi

Google Research (May 2023 - Aug 2023)
Research Intern, News Understanding Group
Host: Jiaming Shen, Manager: Jialu Liu
Topic: Large Language Model In-context Learning.

Microsoft Research (May 2021 - Aug 2021)
Research Intern, Productivity and Intelligence Group
Mentor: Chenyan Xiong, Manager: Arnold Overwijk
Topic: Zero-shot Dense Text Retrieval [EMNLP 2022].

IQVIA (May 2020 - Aug 2020)
Research Intern, Analytics Center of Excellence
Mentor: Cao (Danica) Xiao
Topic: Knowledge-enhanced Drug Interaction Prediction [Bioinformatics 2021].

News

Oct 25, 2023 Honored to receive the NeurIPS 2023 Scholar award!
Sep 22, 2023 3 papers are accepted to NeurIPS 2023. Thanks for my collaborators!
May 16, 2023 Checkout the recent publications: 2 first-author papers are accepted to ACL 2023 (1 Main Conf, 1 Findings), and 3 coauthored papers are accepted to KDD 2023. Thanks and Congratulations for my collaborators!
Apr 6, 2023 One short paper on weakly supervised scientific document classification is accepted to SIGIR 2023.
Nov 28, 2022 Our paper Counterfactual and Factual Reasoning over Hypergraphs for Interpretable Clinical Predictions on EHR has been selected as the best paper (2 in total) at the Machine Learning for Health 2022.

Selected Publications

  1. Large Language Model as Attributed Training Data Generator: A Tale of Diversity and Bias
    Yue Yu*, Yuchen Zhuang*, Jieyu Zhang*, Yu Meng, Alexander Ratner, Ranjay Krishna, Jiaming Shen, and Chao Zhang
    Proceedings of NeurIPS (D&B Track), 2023.
  2. ToolQA: A Dataset for LLM Question Answering with External Tools
    Yuchen Zhuang*, Yue Yu*, Kuan Wang*, Haotian Sun, and Chao Zhang
    Proceedings of NeurIPS (D&B Track), 2023.
  3. Cold-Start Data Selection for Better Few-shot Language Model Fine-tuning: A Prompt-based Uncertainty Propagation Approach
    Yue Yu, Rongzhi Zhang, Ran Xu, Jieyu Zhang, Jiaming Shen, and Chao Zhang
    Proceedings of ACL, 2023.
  4. COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning
    Yue Yu, Chenyan Xiong, Si Sun, Chao Zhang, and Arnold Overwijk
    Proceedings of EMNLP, 2022. (Oral)
  5. AcTune: Uncertainty-Based Active Self-Training for Active Fine-Tuning of Pretrained Language Models
    Yue Yu, Lingkai Kong, Jieyu Zhang, Rongzhi Zhang, and Chao Zhang
    Proceedings of NAACL, 2022. (Oral)