I am an undergraduate student (Class of 2022) at the School of Computer Science and Technology, Taiyuan University of Technology (TYUT), supervised by Associate Professor Yongfei Wu. I am currently a member of the Intelligent Medicine and Biometric Research Laboratory (IMBR) at TYUT. My research primarily focuses on medical image analysis, including weakly supervised classification and segmentation of pathological and endoscopic images.

In addition to my core research in medical imaging, I am also deeply interested in emerging areas such as large language models and their interdisciplinary applications, as well as multimodal learning, foundation models in vision, and weakly supervised learning. I am always eager to explore new ideas, equipped with a strong willingness and ability to learn, and I look forward to growing together with my future advisor through collaborative research and continuous exploration.

If you are interested in my research or potential collaborations, feel free to contact me via the provided details!

🔥 News

2024.08: 🎉🎉The Undergraduate Innovation and Entrepreneurship Training Program project in which I played a key role was recognized as a provincial-level project.
2024.02: I joined the Intelligent Medicine and Biometric Research Laboratory (IMBR) as an undergraduate member.
2023.12: Competitively selected for the Excellent Engineer Education and Training Program, joining the dedicated Excellent Engineer Class designed for top-performing students.
2023.06: 🎉🎉 I was honored with the Taiyuan University of Technology “Qing’ou Award” — Outstanding Talent Award (Top 2%)

📝 Publication

BSPC

DSAGL: Dual-Stream Attention-Guided Learning for Weakly Supervised Whole Slide Image Classification
Biomedical Signal Processing and Control (SCI Q2), Under Review

Daoxi Cao, Hangbei Cheng, Yijin Li, Ruolin Zhou, Xuehan Zhang, Xinyi Li, Binwei Li, Xuancheng Gu, Jianan Zhang, Xueyu Liu, Yongfei Wu

[Paper] [Code]

Highlights

We propose DSAGL, a novel weakly supervised classification framework that integrates a dual‑stream structure and a teacher–student mechanism to jointly enhance instance‑level and bag‑level performance.
An alternating training strategy is introduced to improve semantic consistency and enable effective collaboration between the teacher and student branches.
We design a lightweight encoder (VSSMamba) and a scale‑aware attention module (FASA) to balance efficient long‑range modeling and focus on diagnostically critical regions.
DSAGL consistently outperforms representative MIL‑based methods on both synthetic and real‑world pathological datasets at the instance and bag levels.

JVCI

DGMCN: Depth-Guided Multi-modal Collaboration Network for Robust Polyp Segmentation in Endoscopic Images
Journal of Visual Communication and Image Representation (CCF-C), With Editor

Xuehan Zhang, Hangbei Cheng, Tengfei Xu, Xinyi Li, Daoxi Cao , Xiaorong Dong, Xueyu Liu, Yongfei Wu

[Paper] [Code]

Highlights

A Depth-Guided Multi-modal Collaborative segmentation Network (DGMCN) is proposed for complex endo scopic scenarios. This approach pioneers the integration of monocular depth estimation with an encoder-decoder architecture, introducing structural modality to compensate for the inadequacies of RGB images in boundary identification while explicitly modeling three-dimensional deformation characteristics of mucosal surfaces.
A cross-modal feature fusion module incorporating global-local collaborative pathways and a multi-scale pyramid module are designed, enabling joint modeling of spatial structures and textural appearance features.
State-of-the-art performance has been achieved on three public polyp segmentation datasets, significantly enhancing the segmentation stability and generalization capability in scenarios involving complex deformations, blurred boundaries, and low-contrast conditions.

🔬 Research Projects

Multidimensional OCT-Based Macular Lesion Recognition and Prediction System
Aug 2024 – Jun 2025
Core Contributor, Shanxi Provincial Innovation and Entrepreneurship Training
Applied the Mamba model to multidimensional OCT images for automatic detection and segmentation of elderly macular lesions.
Dual-Stream Attention-Guided Learning for Weakly Supervised Whole-Slide Image Classification
Mar 2025 – Jun 2025
Core Contributor
Proposed a dual-stream teacher–student architecture for weakly supervised WSI classification.
Depth-Guided Multi-modal Collaboration Network for Robust Polyp Segmentation in Endoscopic Images
Jan 2025 – Apr 2025
Core Contributor
Built an encoder–decoder framework with depth guidance to overcome mucosal deformation challenges.
Deep Learning–Based Semantic Communication System for Image Transmission
Aug 2024 – Mar 2025
Core Contributor, Taiyuan University of Technology Innovation Training
Designed a Transformer-based semantic communication pipeline for image compression and transmission.

🎖 Honors and Awards

2025.02 The 16th Lanqiao Cup National Software and IT Professional Talent Competition – Special Track — National Level – First Prize
2023.06 The 25th National College English Competition — National Level – Third Prize
2023.03 The 32nd National Undergraduate Mathematical Modeling Contest — Provincial Level – Second Prize
2025.06 The 16th Lanqiao Cup National Software and IT Professional Talent Competition – Design Track — Provincial Level – Second Prize
2023.12 The 15th National College Mathematical Competition — Provincial Level – Third Prize
2024.04 The 15th Lanqiao Cup National Software and IT Professional Talent Competition – Programming Track — Provincial Level – Third Prize
2024.06 Shanxi Construction Investment Education Award — Outstanding Student Award
2023.06 Taiyuan University of Technology “Qing’ou Award” — Outstanding Talent Award (Top 2%)
School-level “Academic Research Outstanding Individual (” Certificate — Once
School-level “Academic Excellence Outstanding Individual” Certificate — Three times
School-level “Second-Class Outstanding Student Scholarship” — Once
School-level “Third-Class Outstanding Student Scholarship” — Three times

💻 Patents and Software Works

Design Patent: Intelligent Diagnostic Robot Based on Multidimensional OCT Imagess Inventor: First Inventor
Granted on: June 24, 2025
Patent No.: ZL 2024 3 0682609.6
Utility Patent (Pending): Lesion Identification Device
Co-inventor: Third Inventor
Filed on: March 2025
Software Copyright (Pending): Elderly Macular Lesion Recognition and Prediction System Based on Multidimensional OCT
Co-author: Third Inventor
Filed on: June 2025
Utility Patent (Pending): Multifocal Frequency-Domain OCT Adaptive Focusing Device
Co-inventor: Third Inventor
Filed on: October 2024

📖 Educations

2019.06 - present,B.ENG. Major in Computer Science and Technology, College of Computer Science and Technology (College of Big Data) , Taiyuan University of Technology, China

🧠 Skills

Proficient in Python, familiar with the PyTorch framework
Experienced in Linux server operation
Skilled in using programming tools such as PyCharm and VSCode
Familiar with AI tools including ChatGPT, Deepseek, Cursor, and V0
Proficient in Word, PowerPoint, Visio, Excel, and basic video editing software
Strong learning ability, good teamwork and communication skills

🧩 Personal Interests

Exploring and experimenting with AI Agents
Video creation and editing
Interested in algorithm design and research
Playing the guitar
Singing
Table tennis
Enjoying board games

Daoxi Cao(曹道熙)