DeepSeek-affiliated Hangzhou DeepSeek AI Fundamental Technology Research Co.,Watch A Taste of Experience Online Ltd. today filed a patent for a new web data collection system designed to improve efficiency and data quality. The patent outlines a method for discovering more webpage links while minimizing website traffic impact. It assesses downloaded content to predict the quality of undiscovered links, prioritizing high-value data and reducing redundant downloads. Efficient web data collection is crucial for training large language models (LLMs), which power AI systems like ChatGPT. Existing techniques struggle with incomplete link retrieval, excessive downloads that can crash websites, and low-quality data filtering. DeepSeek’s proposed system aims to solve these issues by optimizing data allocation and maintaining metadata accuracy. [iThome, in Chinese]
Related Articles
2025-06-26 12:49
2193 views
Here are the glorious proposed logos for Donald Trump's Space Force
Donald Trump really seems intent on making his proposed Space Force a thing. On Thursday morning, Vi
Read More
2025-06-26 12:47
971 views
Twitter honors Selma's Bloody Sunday on 56th anniversary
On March 7, 1965 — 56 years ago — the late John Lewis led one of the most pivotal demons
Read More
2025-06-26 11:52
353 views
Away from Her by Sadie Stein
Away from HerBy Sadie SteinOctober 10, 2013Quote Unquote“Often, in about three quarters of what I do
Read More