I am an undergraduate student (Class of 2022) at the School of Computer Science and Technology, Taiyuan University of Technology (TYUT), supervised by Associate Professor Yongfei Wu. I am currently a member of the Intelligent Medicine and Biometric Research Laboratory (IMBR) at TYUT. My research primarily focuses on medical image analysis, including weakly supervised classification and segmentation of pathological and endoscopic images.

In addition to my core research in medical imaging, I am also deeply interested in emerging areas such as large language models and their interdisciplinary applications, as well as multimodal learning, foundation models in vision, and weakly supervised learning. I am always eager to explore new ideas, equipped with a strong willingness and ability to learn, and I look forward to growing together with my future advisor through collaborative research and continuous exploration.

If you are interested in my research or potential collaborations, feel free to contact me via the provided details!

[Email]/[Github]

🔥 News

📝 Publication

BSPC
DSAGL Figure

DSAGL: Dual-Stream Attention-Guided Learning for Weakly Supervised Whole Slide Image Classification
Biomedical Signal Processing and Control (SCI Q2), Under Review

Daoxi Cao, Hangbei Cheng, Yijin Li, Ruolin Zhou, Xuehan Zhang, Xinyi Li, Binwei Li, Xuancheng Gu, Jianan Zhang, Xueyu Liu, Yongfei Wu


Highlights

  • We propose DSAGL, a novel weakly supervised classification framework that integrates a dual‑stream structure and a teacher–student mechanism to jointly enhance instance‑level and bag‑level performance.
  • An alternating training strategy is introduced to improve semantic consistency and enable effective collaboration between the teacher and student branches.
  • We design a lightweight encoder (VSSMamba) and a scale‑aware attention module (FASA) to balance efficient long‑range modeling and focus on diagnostically critical regions.
  • DSAGL consistently outperforms representative MIL‑based methods on both synthetic and real‑world pathological datasets at the instance and bag levels.
JVCI
FALMIL Figure

DGMCN: Depth-Guided Multi-modal Collaboration Network for Robust Polyp Segmentation in Endoscopic Images
Journal of Visual Communication and Image Representation (CCF-C), With Editor

Xuehan Zhang, Hangbei Cheng, Tengfei Xu, Xinyi Li, Daoxi Cao , Xiaorong Dong, Xueyu Liu, Yongfei Wu


Highlights

  • A Depth-Guided Multi-modal Collaborative segmentation Network (DGMCN) is proposed for complex endo scopic scenarios. This approach pioneers the integration of monocular depth estimation with an encoder-decoder architecture, introducing structural modality to compensate for the inadequacies of RGB images in boundary identification while explicitly modeling three-dimensional deformation characteristics of mucosal surfaces.
  • A cross-modal feature fusion module incorporating global-local collaborative pathways and a multi-scale pyramid module are designed, enabling joint modeling of spatial structures and textural appearance features.
  • State-of-the-art performance has been achieved on three public polyp segmentation datasets, significantly enhancing the segmentation stability and generalization capability in scenarios involving complex deformations, blurred boundaries, and low-contrast conditions.

🔬 Research Projects

  • Multidimensional OCT-Based Macular Lesion Recognition and Prediction System
    Aug 2024 – Jun 2025
    Core Contributor, Shanxi Provincial Innovation and Entrepreneurship Training
    Applied the Mamba model to multidimensional OCT images for automatic detection and segmentation of elderly macular lesions.

  • Dual-Stream Attention-Guided Learning for Weakly Supervised Whole-Slide Image Classification
    Mar 2025 – Jun 2025
    Core Contributor
    Proposed a dual-stream teacher–student architecture for weakly supervised WSI classification.

  • Depth-Guided Multi-modal Collaboration Network for Robust Polyp Segmentation in Endoscopic Images
    Jan 2025 – Apr 2025
    Core Contributor
    Built an encoder–decoder framework with depth guidance to overcome mucosal deformation challenges.

  • Deep Learning–Based Semantic Communication System for Image Transmission
    Aug 2024 – Mar 2025
    Core Contributor, Taiyuan University of Technology Innovation Training
    Designed a Transformer-based semantic communication pipeline for image compression and transmission.

🎖 Honors and Awards

  • 2025.02 The 16th Lanqiao Cup National Software and IT Professional Talent Competition – Special Track — National Level – First Prize
  • 2023.06 The 25th National College English Competition — National Level – Third Prize
  • 2023.03 The 32nd National Undergraduate Mathematical Modeling Contest — Provincial Level – Second Prize
  • 2025.06 The 16th Lanqiao Cup National Software and IT Professional Talent Competition – Design Track — Provincial Level – Second Prize
  • 2023.12 The 15th National College Mathematical Competition — Provincial Level – Third Prize
  • 2024.04 The 15th Lanqiao Cup National Software and IT Professional Talent Competition – Programming Track — Provincial Level – Third Prize
  • 2024.06 Shanxi Construction Investment Education Award — Outstanding Student Award
  • 2023.06 Taiyuan University of Technology “Qing’ou Award” — Outstanding Talent Award (Top 2%)
  • School-level “Academic Research Outstanding Individual (” Certificate — Once
  • School-level “Academic Excellence Outstanding Individual” Certificate — Three times
  • School-level “Second-Class Outstanding Student Scholarship” — Once
  • School-level “Third-Class Outstanding Student Scholarship” — Three times

💻 Patents and Software Works

  • Design Patent: Intelligent Diagnostic Robot Based on Multidimensional OCT Imagess Inventor: First Inventor
    Granted on: June 24, 2025
    Patent No.: ZL 2024 3 0682609.6

  • Utility Patent (Pending): Lesion Identification Device
    Co-inventor: Third Inventor
    Filed on: March 2025

  • Software Copyright (Pending): Elderly Macular Lesion Recognition and Prediction System Based on Multidimensional OCT
    Co-author: Third Inventor
    Filed on: June 2025

  • Utility Patent (Pending): Multifocal Frequency-Domain OCT Adaptive Focusing Device
    Co-inventor: Third Inventor
    Filed on: October 2024

📖 Educations

  • 2019.06 - present,B.ENG. Major in Computer Science and Technology, College of Computer Science and Technology (College of Big Data) , Taiyuan University of Technology, China

🧠 Skills

  • Proficient in Python, familiar with the PyTorch framework
  • Experienced in Linux server operation
  • Skilled in using programming tools such as PyCharm and VSCode
  • Familiar with AI tools including ChatGPT, Deepseek, Cursor, and V0
  • Proficient in Word, PowerPoint, Visio, Excel, and basic video editing software
  • Strong learning ability, good teamwork and communication skills

🧩 Personal Interests

  • Exploring and experimenting with AI Agents
  • Video creation and editing
  • Interested in algorithm design and research
  • Playing the guitar
  • Singing
  • Table tennis
  • Enjoying board games