Set as Homepage - Add to Favorites

【Taste of Future Sister-in-law】

Source：Inspiration Information Network Editor：Games Time：2025-06-26 06:20:10

DeepSeek has released a new paper,Taste of Future Sister-in-law with co-founder Liang Wenfeng credited as a contributor, detailing how its latest large language model DeepSeek-V3 achieves efficient training and inference using only 2,048 H800 GPUs – significantly fewer than the tens of thousands typically required. The team attributes this efficiency to four key innovations: memory optimization through multi-head latent attention (MLA), computational savings via a Mixture-of-Experts (MoE) design with FP8 precision, communication improvements using a multi-plane network topology, and faster inference through multi-token prediction (MTP). With MLA, KV cache memory usage is cut to just 70KB per token, up to 1/7 that of competing models. MoE architecture activates only 37 billion of the model’s 671 billion parameters per forward pass, reducing training costs by 90% compared to dense models. FP8 training further halves compute and memory usage, with minimal accuracy tradeoff. Beyond the model, the paper also outlines five future directions for AI hardware design, advocating for tighter integration between software and hardware to address memory, compute, and networking bottlenecks. [36Kr, in Chinese]

1
2
3
4
5
6
7
8
9
10
11

Previous：What cracked the Milky Way's giant cosmic bone? Scientists think they know.

Next：Ryzen 5 1600X vs. 1600: Which should you buy?

Related Articles

Related Recommendations

Categories

Latest Articles

Popular Articles

Hot Recommendations

Featured Column

Quick Links

Ice—It’s More Than Just Frozen Water! by Dan Piepenbring The Morning Roundup for January 16, 2014 Faulkner’s Cocktail of Choice The Horror, and Other News by Sadie Stein What We’re Loving: Twain, Gilbert, Visconti by The Paris Review Swamp Thing by Dan Piepenbring Faulkner’s Cocktail of Choice Mysterious Skin: The Realia of William Gaddis by Matthew Erickson If Looks Could Kill by Sadie Stein Happy Birthday, Susan Sontag The Sicilian Defense by Max Ross Anthony Cudahy The Horror, and Other News by Sadie Stein Lysley Tenorio’s Window on the World West Side Story by Sadie Stein Recapping Dante: Canto 12, or A Concerned Parent Contacts the FCC by Alexander Aciman Comedies Are Too Depressing, and Other News by Dan Piepenbring Start 2014 with a Dual Subscription to McSweeney’s and The Paris Review Recapping Dante: Canto 10, or Why We Are Doing This by Alexander Aciman Simone de Beauvoir Would Have Been 106 Today by Dan Piepenbring What one 'Supergirl' story arc taught me about coming out Apple’s new iPhone 11 is so pretty in person. About that bump, though… Now any idiot off the street can answer your dumb Alexa questions Twitter no longer recommends Trump's profile when you search 'asshole' McDonald's buys voice Cabin's 'moving hotel' bus returns with more spacious sleeping space Hands on with Apple's iPhone 11 Pro and iPhone 11 Pro Max Slack releases dark mode for desktop and yes, please Trump capping a pen with his tiny hands gets a huge Photoshop battle Hey Sean Spicer, what's up with that cryptic tweet? Apple finally streamed its big event on YouTube. Here's how many people watched. Shia LaBeouf arrested during anti Hacking firm Cellebrite's newest agreement is with ICE Apple unveils $699 iPhone 11 in two new colors Last minute iPhone rumor: No reverse wireless charging Thank you, Ms. Monopoly, you toppled the patriarchy!!!!!! (Just kidding) Artificial intelligence could one day diagnose skin cancer from smartphones Kyte brings the rental car to you—then the driver finds a way home Lyft adds 911 button more than a year after Uber app The 'Downton Abbey' movie is the horniest PG

2.5181s , 10090.8984375 kb

Copyright © 2025 Powered by 【Taste of Future Sister-in-law】,Inspiration Information Network

Top