Tencent’s tech team has optimized DeepSeek’s open-source DeepEP communication framework,Watch A MILFS Desires Online boosting its performance across different network environments, according to the Chinese AI startup. Testing showed a 100% improvement on RoCE networks and a 30% gain on InfiniBand (IB), offering more efficient solutions for AI model training. On GitHub, DeepSeek acknowledged the Chinese tech giant’s contribution had led to a “huge speedup.” DeepEP is a communication library tailored for a mixture of experts (MoE) and expert parallelism (EP), supporting high-throughput, low-latency GPU kernels and low-precision computing, including FP8. Tencent’s Starlink Networking team identified two main bottlenecks: underutilized dual-port NIC bandwidth and CPU control latency. After targeted optimizations, performance doubled on RoCE and improved by 30% on IB. The enhanced framework is now fully open-source and has been successfully deployed in training Tencent’s Hunyuan large model, demonstrating strong versatility within environments built on Tencent’s Starlink and H20 servers, Chinese tech media outlet iThome reported. [iThome, in Chinese]
Related Articles
2025-06-26 14:27
476 views
Waymo data shows humans are terrible drivers compared to AI
Now operating in cities like L.A., San Francisco, Phoenix, Austin, and Atlanta, the robotaxis of Way
Read More
2025-06-26 13:29
2073 views
Little maestro expertly commands the band in an Istanbul subway
A band conductor's job is hard. They need to keep the pace, make sure that all the sounds blend flaw
Read More
2025-06-26 12:27
960 views
NHL ref loudly curses out player during penalty announcement
Hockey is the most fun sport to watch because it's reckless, there are a bunch of fights and even th
Read More