DeepSeek has released a new paper,Taste of Future Sister-in-law with co-founder Liang Wenfeng credited as a contributor, detailing how its latest large language model DeepSeek-V3 achieves efficient training and inference using only 2,048 H800 GPUs – significantly fewer than the tens of thousands typically required. The team attributes this efficiency to four key innovations: memory optimization through multi-head latent attention (MLA), computational savings via a Mixture-of-Experts (MoE) design with FP8 precision, communication improvements using a multi-plane network topology, and faster inference through multi-token prediction (MTP). With MLA, KV cache memory usage is cut to just 70KB per token, up to 1/7 that of competing models. MoE architecture activates only 37 billion of the model’s 671 billion parameters per forward pass, reducing training costs by 90% compared to dense models. FP8 training further halves compute and memory usage, with minimal accuracy tradeoff. Beyond the model, the paper also outlines five future directions for AI hardware design, advocating for tighter integration between software and hardware to address memory, compute, and networking bottlenecks. [36Kr, in Chinese]
Ice—It’s More Than Just Frozen Water! by Dan PiepenbringThe Morning Roundup for January 16, 2014Faulkner’s Cocktail of ChoiceThe Horror, and Other News by Sadie SteinWhat We’re Loving: Twain, Gilbert, Visconti by The Paris ReviewSwamp Thing by Dan PiepenbringFaulkner’s Cocktail of ChoiceMysterious Skin: The Realia of William Gaddis by Matthew EricksonIf Looks Could Kill by Sadie SteinHappy Birthday, Susan SontagThe Sicilian Defense by Max RossAnthony CudahyThe Horror, and Other News by Sadie SteinLysley Tenorio’s Window on the WorldWest Side Story by Sadie SteinRecapping Dante: Canto 12, or A Concerned Parent Contacts the FCC by Alexander AcimanComedies Are Too Depressing, and Other News by Dan PiepenbringStart 2014 with a Dual Subscription to McSweeney’s and The Paris ReviewRecapping Dante: Canto 10, or Why We Are Doing This by Alexander AcimanSimone de Beauvoir Would Have Been 106 Today by Dan Piepenbring What one 'Supergirl' story arc taught me about coming out Apple’s new iPhone 11 is so pretty in person. About that bump, though… Now any idiot off the street can answer your dumb Alexa questions Twitter no longer recommends Trump's profile when you search 'asshole' McDonald's buys voice Cabin's 'moving hotel' bus returns with more spacious sleeping space Hands on with Apple's iPhone 11 Pro and iPhone 11 Pro Max Slack releases dark mode for desktop and yes, please Trump capping a pen with his tiny hands gets a huge Photoshop battle Hey Sean Spicer, what's up with that cryptic tweet? Apple finally streamed its big event on YouTube. Here's how many people watched. Shia LaBeouf arrested during anti Hacking firm Cellebrite's newest agreement is with ICE Apple unveils $699 iPhone 11 in two new colors Last minute iPhone rumor: No reverse wireless charging Thank you, Ms. Monopoly, you toppled the patriarchy!!!!!! (Just kidding) Artificial intelligence could one day diagnose skin cancer from smartphones Kyte brings the rental car to you—then the driver finds a way home Lyft adds 911 button more than a year after Uber app The 'Downton Abbey' movie is the horniest PG
2.5181s , 10090.8984375 kb
Copyright © 2025 Powered by 【Taste of Future Sister-in-law】,Inspiration Information Network