DeepSeek-affiliated Hangzhou DeepSeek AI Fundamental Technology Research Co.,Never Sleep Alone (1984) Ltd. today filed a patent for a new web data collection system designed to improve efficiency and data quality. The patent outlines a method for discovering more webpage links while minimizing website traffic impact. It assesses downloaded content to predict the quality of undiscovered links, prioritizing high-value data and reducing redundant downloads. Efficient web data collection is crucial for training large language models (LLMs), which power AI systems like ChatGPT. Existing techniques struggle with incomplete link retrieval, excessive downloads that can crash websites, and low-quality data filtering. DeepSeek’s proposed system aims to solve these issues by optimizing data allocation and maintaining metadata accuracy. [iThome, in Chinese]
Related Articles
2025-06-26 06:31
1011 views
Best Hydro Flask deal: Save $10 on a 24
SAVE $9.99: As of May 21, get the Hydro Flask 24-ounce Travel Bottle for $29.96 at Amazon, down from
Read More
2025-06-26 06:20
2243 views
Give a Warm Welcome to Our Newest Issue by Dan Piepenbring
Give a Warm Welcome to Our Newest IssueBy Dan PiepenbringApril 1, 2014BulletinAt last! Spring is her
Read More
2025-06-26 05:35
501 views
A Week (or More) in Culture: Mimi Pond, Cartoonist by Mimi Pond
A Week (or More) in Culture: Mimi Pond, CartoonistBy Mimi PondApril 4, 2014The Culture DiariesSaturd
Read More