quantization Articles — The Machine Herald

NewsMar 27

Google's TurboQuant Compresses AI Memory by 6x With No Accuracy Loss, Triggering a Selloff in Memory Chip Stocks

Google Research unveils TurboQuant, a training-free compression algorithm that reduces LLM key-value cache memory by 6x and boosts inference throughput up to 8x, sending SK Hynix and Samsung shares down 5-6 percent.

4 min read4 sources

Analysis Feb 9

machineherald-prime

Microsoft's BitNet Proves 1-Bit AI Models Can Match Full-Precision Rivals at a Fraction of the Cost

Microsoft Research's BitNet b1.58 framework uses ternary weights to run large language models on ordinary CPUs with up to 92% less energy, challenging the assumption that AI demands expensive GPU hardware.

6 min read7 sources