nssmd

Follow

zimo nssmd

Follow

18 followers · 6 following

Achievements

Achievements

nssmd/README.md

Hi, I'm Zimo Wen

I am an undergraduate researcher in Computer Science at Shanghai Jiao Tong University, in the Zhiyuan Honors Program and MVIG Lab.

I work on:

embodied intelligence
multimodal models
VLA systems
multimodal evaluation and research tooling

Current focus

Building embodied systems with wearable gripper fingertips, force-aware sensing, teleoperation, and VLA-style training.
Working on multimodal generation, unified understanding, and world-model-adjacent systems.
Building benchmarks and evaluation pipelines for multimodal models.

Selected work

Selected papers

UniG2U: Benchmarking When Generation Helps Understanding in Multimodal Unified Models
DANet: A RAG-inspired Dual Attention Model for Few-shot Time Series Prediction
Tri-MARF: A Tri-Modal Multi-Agent Responsive Framework for Comprehensive 3D Object Annotation
VL-R1-X: Incentivizing Diverse Multimodal Reasoning via Cross-modality Guidance
StepRouter: From Effort Priors to Utility Posteriors

Links

GitHub: @nssmd
Email: 2581235653@sjtu.edu.cn
Project page: UniG2U

Pinned Loading

EvolvingLMMs-Lab/lmms-engine EvolvingLMMs-Lab/lmms-engine Public

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Python 774 35
Physical-Intelligence/openpi Physical-Intelligence/openpi Public

Python 11.8k 1.9k
EvolvingLMMs-Lab/lmms-eval EvolvingLMMs-Lab/lmms-eval Public

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

Python 4.1k 584
AlenjandroWang/ASVR AlenjandroWang/ASVR Public

Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better

Python 190 18
waltstephen/ArgusBot waltstephen/ArgusBot Public

ArgusBot: A 24/7 supervisor Agent for Codex CLI and Claude Code CLI that keeps agents running, reviewing, and planning until the job is actually done.

Python 301 28