> **来源:[研报客](https://pc.yanbaoke.cn)** # Global Technology # Memory – How to Play the New AI Bottleneck Memory sits in a capacity-constrained cycle with unusually long order visibility driven by AI inference. For 2026, the risk is execution and transition, not demand. A steeper pricing climb and favourable conditions likely persist through 2027. Multiples have expanded, but we think stock calls can still work with much higher earnings upside from here. Inference becomes a memory challenge, not just compute. Memory access increasingly determines the performance of longer text, image/video and Agentic AI workflows, with far more robust memory requirements than prior AI models to support context, autonomy and continuous learning. These systems require superior server DRAM and enterprise NAND to function effectively. Memory cycle – a steeper pricing climb. Memory pricing power is shifting at lighting speed. We expect a steeper upcycle with rapid gains in DRAM, HBM, NAND, and legacy memory. Innovation and architectural redesign continue to improve memory efficiency, enabling AI systems deliver better latency and cost profiles, materially enhancing user experience. This lowers the economic barrier to adoption and unlocks a significantly larger AI TAM, even as aggregate memory demand continues to scale with broader deployment. Our analysis suggests that text-only AI inference alone could account for $35\%$ of 2026 global memory supply for DRAM and $92\%$ for NAND. What's changed? The key debate is now shifting to whether supply can catchup and where the true choke point sits, and therefore whether tightness and pricing power persist or normalize. Near-term price expectations matter less, in our view, but our channel checks indicate potential upside to already aggressive $70\% +$ QoQ hikes for both DRAM and NAND. Inventory level continues to fall across the supply chain. Capex acceleration is inevitable, focused on DRAM, and we do expect more meaningful greenfield expansions from 2027. The supply-demand gap for legacy memory is widening further for DDR4/3, NOR and SLC/MLC NAND and we raise Price Targets across the board. Bottlenecks are the winners – buy memory and semicap, especially EUV. We prefer higher pricing power in DRAM (Samsung, SK hynix, MU), legacy memory (Winbond), HDD (WDC), and capex benefits via SPE (ASML), packaging (DISCO) vs. downstream hardware and consumer facing margin pressure. Negative factors to watch include the impact of demand destruction for system/hardware device vendors and challenging YoY growth comparisons from 2H26 for memory. MORGAN STANLEY & CO. INTERNATIONAL PLC+ Shawn Kim Equity Analyst Shawn.Kim@morganstanley.com +44 20 7677-1018 Lee Simpson Equity Analyst Lee.Simpson@morganstanley.com +44 20 7425-3378 Nigel van Putten Equity Analyst Nigel.Putten@morganstanley.com +44 20 7425-2803 MORGAN STANLEY ASIA LIMITED+ Duan Liu Equity Analyst Duan.Liu@morganstanley.com +852 2239-7357 Michelle Kim Research Associate Michelle.Kim1@morganstanley.com +852 3963-0183 MORGAN STANLEY & CO. INTERNATIONAL PLC+ Amelia M Scicluna Research Associate Amelia.Scicluna@morganstanley.com +44 20 7425-6694 MORGAN STANLEY MUFG SECURITIES CO., LTD.+ Tetsuya Wadaki Equity Analyst Tetsuya.Wadaki@morganstanleymufg.com +81 3 6836-8890 MORGAN STANLEY TAIWAN LIMITED+ Charlie Chan Equity Analyst CharlieChan@morganstanley.com +886 2 2730-1725 MORGAN STANLEY & CO. LLC Joseph Moore Equity Analyst Joseph.Moore@morganstanley.com +1 212 761-7516 MORGAN STANLEY TAIWAN LIMITED+ Daniel Yen, CFA Equity Analyst Daniel.Yen@morganstanley.com +886 2 2730-2863 MORGAN STANLEY MUFG SECURITIES CO., LTD.+ Kazuo Yoshikawa, CFA Equity Analyst Kazuo.Yoshikawa@morganstanleymufg.com +81 3 6836-8408 MORGAN STANLEY & CO. LLC Erik W Woodring Equity Analyst Erik.Woodring@morganstanley.com +1 212 296-8083 Morgan Stanley does and seeks to do business with companies covered in Morgan Stanley Research. As a result, investors should be aware that the firm may have a conflict of interest that could affect the objectivity of Morgan Stanley Research. Investors should consider Morgan Stanley Research as only a single factor in making their investment decision. For analyst certification and other important disclosures, refer to the Disclosure Section, located at the end of this report. + = Analysts employed by non-U.S. affiliates are not registered with FINRA, may not be associated persons of the member and may not be subject to FINRA restrictions on communications with a subject company, public appearances and trading securities held by a research analyst account. # Dylan Liu Research Associate Dylan.Liu@morganstanley.com +1 212 761-4519 # Mason Wayne Research Associate Mason.Wayne@morganstanley.com +1 212 761-6012 MORGAN STANLEY MUFG SECURITIES CO., LTD.+ # Suzune Tamura, CFA Equity Analyst Suzune.Tamura@morganstanleymufg.com +81 3 6836-8891 MORGAN STANLEY ASIA LIMITED+ # Ethan Jia Research Associate Ethan.Jia@morganstanley.com +852 3963-2287 MORGAN STANLEY & CO. LLC # Shane Brett Equity Analyst Shane.Brett@morganstanley.com +1 212 761-1022 # TECHNOLOGY - EUROPEAN SEMICONDUCTORS # Europe Industry View In-Line # How to Play the AI Bottleneck? Historic memory shortage – a precursor to a historic semiconductor production footprint. With DRAM prices now surpassing metals as a benchmark of scarcity, the memory sector enters a period of extended capacity constraint. Memory along with advanced logic foundry manufacturers must find ways to secure and manage supply chains for rapidly growing AI infrastructure consumption and ensure expansion in wafer manufacturing. The bottlenecks in the semiconductor industry become the winners in stocks performance and the knock-on effect on adjacent parts of semiconductors tends to be underestimated – the key bottleneck to AI has shifted to commodity DRAM and NAND from CoWoS and HBM. Next could be semiconductor equipment – in particular EUV (ASML Holding NV: Stronger Set-Up for 2027; Raise PT to €1,400). Hence our global top 10 picks to play the memory bottleneck: - DRAM - Samsung, MU and also like SK hynix Legacy Memory - Winbond Storage-WDC - Advanced packaging - DISCO - Semicap - AMAT, ASMI EUV-ASML The nature of the correlation. The semicap (semiconductor capital equipment) as well as the logic foundry cycles are closely and intrinsically correlated to the DRAM cycle, with the memory market acting as a primary driver of the broader semiconductor industry's cyclical nature. In the short run, there is no sign of an imminent peak in memory pricing and profitability – add the recent TSMC strong capex guide with leading edge foundry tightness and capex for the industry is set to accelerate towards all time highs by 2027-28. Semicap companies generally experience the upturn (and downturn) in profitability one to two quarters ahead of memory makers, as equipment orders are placed well in advance of production. However, stocks inflect around the same time as illustrated in Exhibit 1. The share performance lag is significant and a material catch-up more probable. Exhibit 1: ASML vs. DRAM YoY performance - significant laggard Source: FactSet, Morgan Stanley Research. Note: Three DRAM companies = Samsung Electronics, Micron and SK Hynix Exhibit 2: ASML orders well below peak today Source: Company Data, Morgan Stanley Research Exhibit 3: Semicap exposure landscape Source: Applied Materials, Morgan Stanley Research # EUV Lessons from History – Time to Play Chess, Not Checkers ASML – Party Like It's 2010. The period between 2010-12 was extremely capacity-constrained in EUV lithography, and saw extremely limited volumes of production from ASML as the sole viable EUV tool supplier. As EUV tools were not yet production-worthy, they required co-development, risk-sharing and guaranteed demand to justify continued investment. In other words, access to EUV tools was determined less by price and more by risk tolerance and strategic commitment. We see a similar potential set-up with EUV capacity increasingly constrained into 2027-28e. What's different this time is both SK hynix and Micron using 6-layers of EUV for future 1c/gama DRAM and lots of cash on the balance sheet competing for tools vs. Samsung and advanced logic foundry. Samsung hoarding EUV tools. Samsung Electronics positioned itself as a lead EUV partner, by placing early tool reservations, engaging in a deep engineering collaboration and expressed willingness to absorb potential yield risks. Samsung Electronics ended up absorbing a disproportionate share of early EUV tool availability, effectively 'crowding out' other logic players like TSMC and Intel. As a result, TSMC and Intel were forced to rely longer on advanced DUV multi-patterning tools. This episode has preceded and catalyzed a shift from transactional equipment purchasing to strategic co-ownership of critical supply-chain assets. Exhibit 4: South Korea Semiconductor Equipment Imports Source: Korea Customs, Morgan Stanley Research Strategic assets. In 2012, ASML announced a customer co-investment program sized at €1.38bn in minority stake purchases and long-term R&D funding, with Intel (15%), Samsung (3%) and TSMC (5%) participating and ASML remaining independent with no control rights held by any customers. Although the announcement of this partnership generated a muted market reaction, this episode highlighted a structural feature of the semiconductor supply chain – control over bottleneck capital equipment can temporarily reshape competitive dynamics, even without formal exclusivity. Today, we see similar discussions around Advanced Packaging Capacity, HBM supply chains and Foundry co-investment models. Memory players' current engagement with ASML's EUV (High-NA EUV tool) in the form of acquisition and strategic partnership: ASML has been building a substantial research and support campus in Hwasung, South Korea, designed to strengthen collaboration with both Samsung and SK hynix. - Samsung is planning to purchase multiple ASML High-NA EUV tools with plans to install them for 2nm foundry production (not memory). This represents a lead deployment among major players rather than a passive position. - In 2025, hynix installed an ASML Twinscan NXE:5200B High-NA EUV system at its M16 fab in Incheon, South Korea. This marks the first High-NA EUV tool deployed for memory production outside R&D usage. As the first to integrate a High-NA EUV system in a mass production context, SK hynix is implicitly partnering with ASML on both deployment and development of new lithography capabilities. Exhibit 5: ASML outperformed the MSCI 2010-2012 Source: FactSet, Morgan Stanley Research Exhibit 6: ASML NTM PE 2010-12 Source: FactSet, Morgan Stanley Research # Memory – why the sudden bottleneck? Memory becomes a significant bottleneck for AI development. Key-Value (KV) Cache is emerging as the primary memory scaling constraint in transformer inference. As context lengths and concurrency rise, KV Cache memory grows linearly, saturating high-bandwidth memory well before compute limits are reached. This makes inference increasingly memory-bound and underpins the industry's push toward architectural and software-level memory efficiency. AI inference is fundamentally different from LLM training and is becoming increasingly difficult to scale. Recent breakthroughs and emerging usage patterns make inference more memory-intensive, often requiring additional high-bandwidth memory rather than less. Current trends that are increasing the memory requirements of inference and amplifying the KV Cache problem include: - Context - Memory in Agentic AI is deeply intertwined with context – without context, even the most sophisticated conversation becomes meaningless (tendency to hallucinate). Context constantly retrieves prior shared history to make sense of the present. It increases compute and significantly more memory demand as the amount of information the LLM model can look at rises meaningfully when generating an answer that is of quality. - Reasoning generates a long sequence of thinking before the final answer, similar to how people solve a problem step-by-step. This significantly increases latency and the long sequence of thought tokens strains the memory. - Multimodal (images, audio, video generation) are larger data types that consume far more than text generation. - Mixture of Experts (MoE) expands memory usage with the use of multiple experts (for example, China DeepSeekv3 has 256 MoE) invoked selectively rather than a single dense feed-forward block, which allows model size to grow significantly for higher quality (relative to a modest increase in training cost) and help training. Exhibit 7: AI Inference Prefix vs. Decoding stages Source: Morgan Stanley Research The AI hardware race has pivoted towards less glamorous memory from previous compute horsepower. Compute determines AI but memory now determines how far and how fast it can scale. As AI systems move further away from training towards inference, the physics of performance changes with AI inference workloads, especially Agentic AI, which is increasingly more memory-bound than compute-bound. Memory lives at different layers in an AI Agent system (short-term working memory, stored general facts, long-term memory span, pre-trained external knowledge base, tool outputs, user history). As models grow in size and context windows expand, the challenge is less about how fast chips can compute but how quickly they can feed data to those processors. A step change in AI is underway in 2026... The AI industry is shifting from generative AI to Agentic AI, with 2025 being the year when we mastered reasoning and introduced Agentic AI, to 2026, which will be about moving AI from experimentation to core infrastructure and enterprise agent adoption. These agents are now more reliable, have stronger memory, have fewer hallucinations and continuous learning has begun. We are in the process of fusing frontier models with customized open-source systems running on enterprise servers. The next end market will be much bigger – that is Physical, when we move intelligence from the cloud into industrial AI and humanoids. ... with a dramatic impact for memory. Agentic AI drives massive demand for DRAM and NAND by requiring significantly higher memory capacity and performance to support its core functions of context, autonomy, planning and continuous learning. We are shifting from reactive single-task models to proactive, autonomous, and continuously learning systems that require significant and reliable memory resources to function effectively. This shift has led DRAM demand prioritizing the production of high-end, AI-specific memory, driving up overall DRAM demand surge and prices. Exhibit 8: Agentic AI - Memory tiers illustration Source: Towards Data Science, Morgan Stanley Research # What is the long-term bull case for memory? We're still at the beginning, not the end. Despite recent progress, ChatGPT is only three years old, operational 1GW data centers don't yet exist, and experts see "no walls in sight" for pre-training. AI models continue to exhibit substantial headroom for improvement through both increased compute and efficiency gains, as demonstrated by recent breakthroughs such as DeepSeek V4. Beyond traditional scaling, we are uncovering new optimization levers across training and inference, including dynamic reasoning depth (how much a model 'thinks' before responding). At the same time, the growing adoption of vision and multimodal AI models is structurally increasing memory requirements, as these workloads process high-dimensional inputs and maintain larger intermediate representation in fast memory. We have found new scaling relationships in other parts of model training and inference, including how much to think before answering a question. Exhibit 9: Number of AI models released Source: Our World in Data, Morgan Stanley Research The AI buildup keeps hitting new infrastructure limits. For memory, it is facing the largest global scaling of any tech wave in history. AI agents are quickly becoming a critical bottleneck and the next wave of progress with AI agents will not come from better reasoning but rather will come from better context handling. An AI assistant that remembers everything is more useful than a bigger model that remembers little. This means more memory layers to unlock far more value than reasoning improvements. Memory and context management are increasingly the bottleneck. The agent's source code is just the orchestration layer whereas the heavy lifting happens in how memory gets ingested, organized, and retrieved. Memory systems are quickly becoming the hidden complexity behind agents. If code used to be the bottleneck, memory might be the new one. LLMs remember just enough context to sustain a conversation, but not a lifetime of them. For many users, that's fine but for anyone who imagines AI as a permanent cognitive partner or agent, it's a structural frustration. - Storage cost: Keeping billions of users' full conversation histories, indexed and instantly retrievable, would explode storage needs and retrieval latency. - Processing cost: Even if the data is stored, every query would need more compute to search, rank and contextualize. That translates directly into higher per-prompt costs. - Hardware cost: A typical AI server uses 8X more memory than traditional servers, and with each generation, that number climbs even higher. Innovation and higher efficiency. DeepSeek recent research paper 'Conditional Memory via Scalable Lookup: A New Axis of Sparcity for LLMs' demonstrates a pathway to scaling model capacity beyond HBM limits by decoupling reasoning from knowledge storage. In this architecture, latency-critical reasoning remains resident in HBM, while large, less frequently accessed knowledge memory is off-loaded to CXL-attached DDR5. This effectively introduces memory as a new axis of sparsity, enabling meaningful model scaling even as HBM capacity remains constrained. More NAND - NVIDIA Inference Context Memory Storage Platform. At CES, Nvidia showcased its inference context memory storage platform which serves as a KV Cache for inferencing. This platform is powered by the NVIDIA BlueField-4 DPU (Data Processing Unit) and inserted another tier of eSSD storage, which manages the off-loading and sharing of KV Cache data. In a typical configuration, this infrastructure allows for an additional 16TB of high-speed SSD storage (presumably NVMe) to be directly associated with each Rubin GPU, functioning as an extension of the system's memory hierarchy to handle extremely large context lengths. With DPU moving from BlueField-3 to 4, DRAM content also upgraded from 32GB DDR5 to 128GB LPDDR5X. Exhibit 10: NVIDIA Inference Context Memory Storage Platform Source: NVIDIA CES 2026 Keynote Context Storage # Sizing the Inference TAM It is almost impossible to calculate the exact demand on memory given the changing dynamics on user growth, applications, and technology innovation. What investors are primarily trying to understand is how much incremental TAM can be added as KV Cache expands. Currently, most of the hot KV Cache is stored in HBM/DRAM, which means not only that it is expensive but also the memory duration will be short. Allowing KV Cache to be offloaded to eSSD makes longer context and longer memory duration possible, which can in turn significantly enhance current applications and create a better inference setup for Agentic AI development and penetration. In Exhibit 11, we calculated the tiered memory usage of a ChatGPT like model with key assumptions below (full assumptions in Appendix: Memory Usage Tier Breakdown Assumptions): We assumed 800 million weekly active users (Peak QPS 300,000 req/s) - Input tokens/request: 2,000 tokens - We assumed live KV is $50\% / 50\%$ split on HBM and DRAM and reused KV Cache is $40\%$ on DRAM vs. $60\%$ on NAND. (KV Cache in FP16 precision) - Assumed text only applications; image/video demand is not considered in this model - We stick to the industry current common practice for the rest of our assumptions to the extent possible In conclusion, for a 200,000 GPU cluster that runs such a model, the HBM usage is around 200PB, DRAM 4EB, NAND 42EB on warm data/KV Cache offload and datalake demand is around 260EB. If we assume globally there are three such models, the total AI inference demand will account for $17\%$ , $35\%$ and $92\%$ of 2026 global memory supply respectively for HBM, DRAM and NAND. Exhibit 11: Tiered Memory / Storage Usage Breakdown <table><tr><td>Tier</td><td>Component</td><td>Total (TiB)</td><td>Total (PiB)</td><td>Notes</td></tr><tr><td>HBM</td><td>Live KV (hot)</td><td>330</td><td>0</td><td>50% of live KV</td></tr><tr><td>DRAM</td><td>Live KV (warm)</td><td>330</td><td>0</td><td>50% of live KV</td></tr><tr><td>Rack SSD</td><td>Live KV (offload)</td><td>-</td><td>-</td><td></td></tr><tr><td>DRAM</td><td>KV reuse cache (24h, 50%)</td><td>4,169,941</td><td>4,072</td><td>Time-integrated retained KV 40% in DRAM, 60% in Rack SSD</td></tr><tr><td>Rack SSD</td><td>KV reuse cache (24h, 50%)</td><td>6,254,911</td><td>6,108</td><td>Time-integrated retained KV 40% in DRAM, 60% in Rack SSD</td></tr><tr><td>HBM</td><td>GPU overheads</td><td>18,750</td><td>18</td><td>(workspace+runtime+side models)</td></tr><tr><td>HBM</td><td>Weights on active experts (top-2/128)</td><td>186,731</td><td>182</td><td>Set input if known</td></tr><tr><td>DRAM</td><td>Host buffers/state</td><td>0</td><td>0</td><td>per-seq host state</td></tr><tr><td>Rack SSD</td><td>Local Rack SSD caches/logs</td><td>36,379,788</td><td>35,527</td><td>200 TB/node (decimal)</td></tr><tr><td>HBM</td><td>Total</td><td>226,291</td><td>226</td><td></td></tr><tr><td>DRAM</td><td>Total</td><td>4,585,261</td><td>4,585</td><td></td></tr><tr><td>Rack SSD</td><td>Total</td><td>46,877,348</td><td>46,877</td><td></td></tr><tr><td>Datalake</td><td>RAG + logs + caches (central)</td><td>293,556,000</td><td>293,556</td><td></td></tr></table> Source: Morgan Stanley Research estimates Note: detailed assumptions in Appendix Extending context length and increasing the KV Cache will have the most obvious incremental uplift on DRAM and Rack SSD (assuming Effective live window capped the usage of HBM and Datalake is more relevant to model size, RAG and other factors). By increasing the input tokens from 2000 tokens/request to 5,000 tokens/request holding other assumptions unchanged, the incremental KV Cache will increase the DRAM demand by around 2EB and Rack SSD by 3EB per model. Exhibit 12: Sensitivity test on input token / request per model <table><tr><td>Input tokens / request</td><td>HBM TOTAL (PB)</td><td>DRAM TOTAL (PB)</td><td>Rack SSD TOTAL (EB)</td><td>Datalake TOTAL (EB)</td></tr><tr><td>2,000</td><td>226</td><td>4,585</td><td>47</td><td>294</td></tr><tr><td>5,000</td><td>226</td><td>6,648</td><td>50</td><td>294</td></tr><tr><td>10,000</td><td>226</td><td>10,087</td><td>55</td><td>294</td></tr><tr><td>20,000</td><td>226</td><td>16,964</td><td>65</td><td>294</td></tr><tr><td>50,000</td><td>226</td><td>37,597</td><td>96</td><td>294</td></tr><tr><td>100,000</td><td>226</td><td>71,983</td><td>148</td><td>294</td></tr><tr><td>200,000</td><td>226</td><td>140,757</td><td>251</td><td>294</td></tr></table> Source: Morgan Stanley Research estimates Exhibit 13: Inference-stage storage by memory tier <table><tr><td>Storage tier</td><td>What lives here in practice</td></tr><tr><td>HBM (GPU memory)</td><td>• Model weights (active shards) • Hot KV cache (recent tokens only, typically last 256-1,024 tokens) • Temporary activations / workspaces (attention scratch, GEMM buffers, logits, NCCL buffers) • Runtime metadata (KV block tables, schedulers, CUDA graphs) • Occasionally small side models (safety filters, routers)</td></tr><tr><td>Host DRAM (CPU memory)</td><td>• Warm KV cache (paged / offloaded) • Evicted KV blocks from HBM (majority of KV at scale) • Request/session state (token queues, schedulers, batching metadata) • Prompt text + tokenized inputs • Optional CPU-resident weights (rare for high-throughput chat)</td></tr><tr><td>Local / in-rack NAND (NVMe / SSD)</td><td>• Cold KV spill (emergency or batch/offline inference only) • KV checkpoints for long jobs • Local caches (prompt cache, embedding cache) • Node-level logs and short-term telemetry buffers</td></tr><tr><td>Datalake (shared storage)</td><td>• Model storage (checkpoints, sharded weights, multiple variants) • KV spill for offline / recovery (not latency-sensitive) • RAG corpora and indices • Logs, telemetry, traces • Long-lived caches (prompt reuse, embeddings, evaluation artifacts)</td></tr></table> Source: Morgan Stanley Research # What's changed? Price hikes everywhere. Given recent buyer-seller negotiations, we see DRAM pricing momentum to remain exceptionally strong into 1Q26, with upside risk to its prior forecast. Global DRAM market has entered a phase of dramatic price inflation, driven primarily by the three major suppliers reallocating capacity toward high-margin server DRAM and HBM to meet robust AI inference and infrastructure demand from major CSPs. This shifts has created a severe 'capacity crowding-out effect' for PC, mobile, and consumer DRAM, resulting in a firmly entrenched high-price, low-volume seller's market. With US CSPs willing to absorb substantial price increases and actively signing long-term contracts to secure supply, we see customers offering $50 - 85\%$ QoQ price hikes on DRAM for more allocation in 1Q26. - On DRAM, we see 1Q26 pricing of $80 - 85\%$ QoQ and $60 - 65\%$ QoQ increase for mainstream DDR5 for PC DRAM and Server DRAM respectively. General server demand has been revised upward again, driven by accelerated replacement cycles at CSPs against a backdrop of already tight DRAM supply. US CSP customers are increasingly securing volumes through long-term supply agreements, while OEMs with weaker bargaining power face acute supply constraints and reduced allocation. Smartphone and PC customers are forced to accept the higher prices despite lingering concerns on shipment growth. - On HBM, prices are moving higher on upside orders from H200. We see some ASIC customers are also adding orders for HBM3E 12hi, which could result in a better than feared pricing adjustment (-10% YoY vs. -30% previously). HBM4 qualification is still ongoing and final result is delayed to March with mass production expected in late 1Q/early 2Q. Samsung accelerates the capacity expansion for TSV capacity (from 170kwpm to 220kwpm by the end of 2026). SK hynix slightly revised up from 180kwpm to 200kwpm while Micron stays flat at 90kwpm. - On NAND, our channel checks indicate the pricing range could be around $50 - 80\%$ depending on the customers vs. consensus $55 - 60\%$ QoQ increase. QLC eSSD supply remains the tightest and could see $70 - 80\%$ QoQ price hikes while some suppliers who hiked less in 4Q25 are now asking to double the price in 1Q to catch up with the industry trend. Our channel checks indicate that there are some suppliers in talks for LTA down payments, which indicates potential greenfield capex expansion from 2027 onwards, but this should not be a concern on the overall supply and demand dynamics as the new capacity will take at least 1-1.5 years to fully ramp. - On legacy memory, we believe the supply demand gap could widen further for DDR4 & 3, NOR and SLC/MLC NAND. TrendForce forecasts DDR4 pricing to increase as much as $93 - 98\%$ QoQ in 1Q26. Some old memory vendors could even share the HBM growth. Winbond is Top Pick, and we believe Nanya Tech, Macronix, Longsys, AP Memory, GigaDevice and PSMC are all positioned to benefit from the current setup. - On module makers, we expect volume shipment to be limited in 4Q/1Q but gross margin should improve to $30\%+$ level for Chinese module makers due to favorable pricing. The pressure of increasing inventory costs should be alleviated this time as the increasing wafer price can be $100\%$ pass down to customers. Exhibit 14: TrendForce, Pricing Forecast <table><tr><td rowspan="2"></td><td>Mar-25</td><td>Jun-25</td><td>Sep-25</td><td>Dec-25</td><td>Jan-26</td><td>Jan-26</td><td>Oct-25</td><td>Oct-25</td></tr><tr><td>1Q25</td><td>2Q25</td><td>3Q25</td><td>4Q25</td><td>1Q26E</td><td>2Q26E</td><td>3Q26E</td><td>4Q26E</td></tr><tr><td>PC DRAM</td><td>DDR4: down 13-18%DDR5: down 10-15% Blended ASP: down 10-15%</td><td>DDR4: up 13-18%DDR5: up 3-8% Blended ASP: up 3-8%</td><td>DDR4: up 38-43%DDR5: up 3-8% Blended ASP: up 3-8%</td><td>DDR4: up 43-49%DDR5: up 38-43 Blended ASP: up 38-43%</td><td>DDR4: up 93-98%DDR5: up 80-85% Blended ASP: up 80-85%</td><td>DDR4: up 28-33%DDR5: up 20-25% Blended ASP: up 20-25%</td><td>up 3-8%</td><td>up 0-5%</td></tr><tr><td>Server DRAM</td><td>DDR4: down 10-15% DDR5: down 3-8% Blended ASP: down 5-10%</td><td>DDR4: up 18-23%DDR5: up 3-8% Blended ASP: up 3-8%</td><td>DDR4: up 28-33%DDR5: up 3-8% Blended ASP: up 3-8%</td><td>DDR4: up 60-65%DDR5: up 53-58% Blended ASP: up 53-58%</td><td>DDR4: up 65-70%DDR5: up 60-65% Blended ASP: up 60-65%</td><td>DDR4: up 10-15% DDR5: up 10-15% Blended ASP: up 10-15%</td><td>up 3-8%</td><td>up 0-5%</td></tr><tr><td>Mobile DRAM</td><td>LPDORAX: down 8-13% LPDORXS(x): down 3-8%</td><td>LPDORAX: up 0-5% LPDORXS(x): up 3-8%</td><td>LPDORAX: up 38-43%DPLORXS(x): up 10-15%</td><td>LPDORAX: up 48-53%DPLORXS(x): up 43-48%</td><td>LPDORAX: up 53-68%DPLORXS(x): 53-58%</td><td>LPDORAX: up 8-13% LPDORXS(x): up 13-18%</td><td>LPDORAX: mostly flat LPDORXS(x): up 5-10%</td><td>LPDORAX: mostly flat LPDORXS(x): 3-8%</td></tr><tr><td>Graphics DRAM</td><td>GDDR5: down 8-13% GDDR7: down 0-5%</td><td>GDDR5: mostly flat GDDR7: down 0-5%</td><td>GDDR6: up 38-33%DDDR7: 5-10%</td><td>GDDR6: up 25-30%DDDR7: up 30-35%</td><td>GDDR6: up 45-50%DDDR7: up 45-50%</td><td>GDDR6: up 0-5% GDDR7: up 8-13%</td><td>GDDR6: up 0-5% GDDR7: up 3-8%</td><td>GDDR6: mostly flat GDDR7: up 0-5%</td></tr><tr><td>Consumer DRAM</td><td>DDR3: down 3-8% DDR4: down 10-15%</td><td>DDR3: mostly flat DDR4: up 18-23%</td><td>DDR3: up 50-60%DDDR4: up 65-60%</td><td>DDR3: up 55-60%DDDR4: up 45-50%</td><td>DDR3: up 45-50%DDDR4: up 45-50%</td><td>DDR3: up 3-8% DDR4: up 3-8%</td><td>DDR3: up 0-5% DDR4: up 0-5%</td><td>DDR3: mostly flat DDR4: mostly flat</td></tr><tr><td>Total DRAM</td><td>Conventional DRAM: up 5-10% HBM Blended: up 5-10% (HBM Penetration: 8%)</td><td>Conventional DRAM: up 10-15% HBM Blended: up 5-10% (HBM Penetration: 10%)</td><td>Conventional DRAM: up 10-15% HBM Blended: up 15-20%(HBM Penetration: 10%)</td><td>45-50% HBM Blended: up 50-55%(HBM Penetration: 11%)</td><td>Conventional DRAM: up 55-60%</td><td>Conventional DRAM: up 13-18%</td><td>Conventional DRAM: up 3-8%</td><td>Conventional DRAM: up 0-5%</td></tr><tr><td>eMMC</td><td>1Q25</td><td>2Q25</td><td>3Q25</td><td>4Q25</td><td>1Q26E</td><td>2Q26E</td><td>3Q26E</td><td>4Q26E</td></tr><tr><td>UFS</td><td>down 15-20%</td><td>up 5-10%</td><td>Mostly flat</td><td>up 20-25%</td><td>up 55-60%</td><td>up 20-25%</td><td>up 5-10%</td><td>up 0-5%</td></tr><tr><td>Enterprise SSD</td><td>down 18-23%</td><td>mostly flat</td><td>up 3-8%</td><td>up 25-30%</td><td>up 53-58%</td><td>up 15-20%</td><td>up 5-10%</td><td>up 3-8%</td></tr><tr><td>Client SSD</td><td>down 18-23%</td><td>up 5-10%</td><td>up 5-10%</td><td>up 20-25%</td><td>up 68-73%</td><td>up 20-25%</td><td>up 5-10%</td><td>up 0-5%</td></tr><tr><td>3D NAND Wafers (TLC & GLC)</td><td>down 8-13%</td><td>up 15-20%</td><td>up 8-13%</td><td>up 145-150%</td><td>up 50-55%</td><td>up 15-20%</td><td>up 3-8%</td><td>up 0-5%</td></tr><tr><td>Total NAND Flash</td><td>down 15-20%</td><td>up 3-8%</td><td>up 3-8%</td><td>up 33-38%</td><td>up 55-60%</td><td>up 18-23%</td><td>up 5-10%</td><td>up 0-5%</td></tr></table> Source: TrendForce, Morgan Stanley Research Exhibit 15: NAND Spot and Contract Pricing Source: TrendForce, Morgan Stanley Research Exhibit 16: DRAM Spot and Contract Pricing Source: TrendForce, Morgan Stanley Research End-market deep dive – five surprises. 1) Server demand stands out, with potentially $40 - 50\%$ server unit growth from top hyperscalers in 2026 – server DRAM demand this year is equivalent to about $70\%$ of the DRAM market size this year and $100\%$ if including demand from inventory buffer. 2) A demand inflection is taking place for China H200 HBM and potentially much more in 2027. 3) We see upside order for HBM3e from both ASIC top customers and GPU makers. 4) DDR4 demand has picked up meaningfully since April 2025, with price hikes likely by $65 - 70\%$ in 1Q26 on reduced supply allocation to other parts of the DRAM market. 5) Inventory is now significantly below normal levels for DRAM producers. # Where are we in the cycle? DRAM stocks offer better relative value than the semiconductor group and still among the cheapest among semis. Memory remains the largest and fastest growing market in semiconductors in 2026, accelerating well above $40\%$ YoY growth. Despite the strong rally, we still see a positive risk-reward balance across memory equities, underpinned by a sharp return of pricing inflation and lagging consensus earnings forecasts, which leave the group well placed to close the valuation gap with AI semiconductors more broadly. Cycle in context. Let's pretend it's 2H26 and look ahead a few months and ask ourselves... What does a memory cyclical inflection on the way down look like? It is a combination of 1) Major expectation reset by management – maximum bullish; 2) Capex is finally raised, often by more than $30 - 40\%$ ; and most important 3) we see 3 things – inflection at trough inventory, peak YoY pricing (the rate of change is plateauing) and stock price action. We don't have any of that so far. This is what a peak in the cycle looks like... We are not ticking on all 3 preconditions for the peak in memory stocks: - Trough inventory - it does not need to bottom today, but we don't have a line of sight on when it does. We see inventory declining further in the coming quarters, and that's driven by strong demand and customer panic behavior causing stockpiling. - The YoY pricing peak is still distant and not inflecting next quarter. Contract is no longer leading spot this time (due to HBM weakness and commodity rally) and we see still the YoY accelerating into 2Q. Even though visibility of 2H26 is limited now, our forecast shows that another $20\%$ QoQ hikes in 3Q26 can extend the acceleration, but in 4Q26 we need at least $70\%$ QoQ price hikes to keep up the trends given high base in 4Q25. - We do not get to good news is bad news. Stocks are still outperforming on good news in the past 2-weeks – AI was overwhelming and not seeing sell-the-news on MU strong results last month. Exhibit 17: DRAM contract price YoY accelerating into 1H26 Source: FactSet, TrendForce estimates, Morgan Stanley Research. Inventory depleting quickly. DRAM inventories have come off well below normal at the producer level, which should support bit shipment growth in 2026, as customers become increasingly concerned about supply. Restocking of inventory and customer anxiety on supply availability next year could add to current supply demand shortages. Exhibit 18: DRAM inventory level Source: TrendForce, Morgan Stanley Research Exhibit 19: NAND Inventory level Source: TrendForce, Morgan Stanley Research # Stock implication What will work in 2026? Explaining our framework. Top-down, the environment to continue to favor the AI computing supply chain; however, it is worth remembering that this sub-group has only been in an uptrend since April 8, 2025 (Outperformed MSCI World IT Index by $+60\%$ ). This outperformance has been driven by multiple expansion as well as an average $42\%$ earnings growth in 2025. We believe earnings growth expectations will keep climbing driven by the strength of commodity pricing and HBM into 2027, providing a greater chance of investors increasing positions for DRAM companies. Our stock selection process is bottom-up. Outside of AI, our stock preference is based on several criteria including earnings forecasts, end-market momentum, margin expectations, and risk-adjusted bull-bear. We seek out those companies where margin expectations are actually deliverable and de-rating risks are manageable. - US stock picks. We prefer DRAM over NAND, having recently elevated MU to a Top Pick. DRAM has the most to gain from AI tailwinds relative to current valuations, but those AI tailwinds are still strong for NAND as well. We expect a very strong 2026 for Micron's fundamentals, with better clarity on HBM returns in 2027. We prefer HDD WDC and Seagate. - EU semicap. Our Top Pick ASML, has significant memory exposure and is likely to benefit as the memory cycle improves. EUV system demand is strengthened by DRAM technology transitions which support the upward trend in lithography intensity. For ASMI, we have assumed DRAM growth broadly in-line with our WFE forecast of $c.25\%$ this year but overall exposure is limited to $15 - 20\%$ . This can go up toward '28-29 but the driver is technical (4f2) rather than cyclical. VAT Group has long been viewed as a proxy for memory/NAND but this has been somewhat offset by large exposure to China foundry. - In Korea, SK hynix is a key beneficiary from a much steeper climb in commodity memory and HBM pricing, making it fundamentally more attractive with a path to significantly improve profitability. We think this potential is not fully discounted in the price at currently 5.3x 2026E P/E, implying little value for HBM. Samsung remains our Top Pick on relative risk reward and laggard turnaround play in both HBM and foundry. - KIOXIA in Japan should see an idiosyncratic re-rating opportunity driven by the company's BiCS-8 which combines planar shrinkage, new architecture (CBA) and multi-level cell technology to achieve higher memory density with a low capex burden. KIOXIA began sample shipment of 2Tb QLC product in July 2024, and the company stated in Nov 2026 that for customer qualification of eSSD equipped with the 2Tb QLC chip, it plans to ship the 122TB model within 2025 as scheduled and the 245TB model in early 2026. Cost reductions from the transition to BiCS-8 is expected to accelerate from Jun-Q 2026. Given the current share price remains under pressure due to December-quarter guidance falling short of consensus and the recent stock sale by BCPE Pangea Cayman, we view this as an attractive entry point. Regarding capex, we forecast KIOXIA's gross capex to be in the range of Y350-400bn in FY3/27, compared to Y280bn in FY3/26. FY3/27 capex would be still significantly lower compared to Y510bn in FY3/23. - Greater China memory. We see Winbond (OW, Top Pick) and GigaDevice (OW) as beneficiaries of both legacy DRAM and NOR; other legacy DRAM beneficiaries are Nanya Tech (OW), PSMC (OW), and AP Memory (OW) which also benefits from a stronger IPD outlook. Macronix (OW) is a beneficiary of NOR and MLC NAND. We continue to like Longsys and Phison ahead of a strong 1Q results and see the concerns over margin erosion from higher costs to be alleviated this time given stronger bargaining power amid shortage. We also highlight how the complexity and customization of HBM base die will likely be a meaningful key growth driver for IC design service providers and foundries. US semicap. We see two very strong years of WFE growth in 2026-27 and model y/y growth across all end markets. Our top OW within U.S. SPE is AMAT, given its exposure to the most visible growth drivers - namely DRAM greenfield wafer additions and TSMC's 3nm build - while trading at a discount to peers. We also upgrade KLA to OW on expectations of inevitable upward revisions to foundry capex amid looming 3nm capacity constraints, with its memory upside underappreciated as rising process-control intensity across DRAM and NAND drives incremental growth. While we are EW on LRCX, we maintain a positive bias toward, supported by strong WFE growth visibility across end markets. - Japan semicap. We recommend Advantest, DISCO, Tokyo Seimitsu, and Micronics in Japan on the following key drivers: 1) rapid growth in HBM for generative AI; 2) high-bandwidth memory for Edge AI applications; and 3) large potential growth in the flash memory market from the adoption of PUC. - Korea semicap. As we move into 2026, the key variable for the semiconductor value chain is shifting decisively from headline capex cycles to actual utilization at advanced logic nodes. We still believe new fab opening at Samsung Electronics (P4) and SK Hynix (M15X) may still play as key catalysts for memory, but after an extended period of under-absorption at foundry, wafer starts at sub-5nm nodes are beginning to normalize, supported by AI-driven logic demand and internal anchor volumes. We continue to prefer Wonik IPS as a key beneficiary of Samsung's DRAM capex cycle (1c nm tech migration and P4 ramps). Exhibit 20: Memory Risk Reward Source: Morgan Stanley Research estimates Exhibit 21: How to play the memory strength - most vs least favoured <table><tr><td>Company Name</td><td>Ticker</td><td>Share Price (Local Curr)</td><td>Price Target (Local Curr)</td><td>Usage (%)</td><td>PB 2026E</td><td>PE 2026E</td><td>EPS 2025E</td><td></td></tr><tr><td colspan="8">LCD Electronics</td><td></td></tr><tr><td colspan="8">Memory Suppliers</td><td></td></tr><tr><td>Samsung Electronics</td><td>005930.KS</td><td>Better commodity cycle driven by AI + HBM market share gain optionality</td><td>143,900</td><td>170,000</td><td>18</td><td>1.9</td><td>6.4, 6.044</td><td></td></tr><tr><td>SK hynix</td><td>000680.KS</td><td>Better commodity cycle driven by AI</td><td>749,000</td><td>840,000</td><td>12.20</td><td>2.4</td><td>4.7, 56.976</td><td></td></tr><tr><td>Sanofi-aventis Corporation</td><td>5260.CS</td><td>NAND Supercycle supported by AI inference demand + Potential NL SSD upside</td><td>387,811</td><td>273</td><td>-29.60</td><td>5.2</td><td>26.12.1</td><td>2.74</td></tr><tr><td>KIOXISH Operations</td><td>285A.T</td><td>NAND Supercycle supported by AI inference demand + Potential NL SSD upside</td><td>13,630</td><td>14,000</td><td>2.70</td><td>4.1</td><td>9.2520</td><td></td></tr><tr><td>Micron Technology Inc.</td><td>MU.O</td><td>333.35</td><td>350</td><td>5.00</td><td>4.1</td><td>9.6</td><td>8.29</td><td></td></tr><tr><td colspan="8">Hard Disk Drive Producers</td><td></td></tr><tr><td>Western Digital</td><td>WDC.O</td><td>A rising AI lide lifts all boats for both HDD and eSSD</td><td>215</td><td>228</td><td>6.00</td><td>10.6</td><td>27.495</td><td></td></tr><tr><td>Seagate Technology</td><td>STX.O</td><td>A rising AI lide lifts all boats for both HDD and eSSD</td><td>312.28</td><td>337</td><td>7.90</td><td>575.7</td><td>26.88.1</td><td></td></tr><tr><td colspan="8">SPE</td><td></td></tr><tr><td>ASML Holding NV</td><td>ASML.AS</td><td>ASML benefits from increased EUV layer count</td><td>1,149</td><td>1,400</td><td>21.80</td><td>15.6</td><td>37.125.2</td><td></td></tr><tr><td>Wonik IPS Co Ltd</td><td>240810.KQ</td><td>Strong rebound on the back of Samsung 1c DRAM CAPEX ramp up</td><td>77,800</td><td>72,000</td><td>-7.50</td><td>3.6</td><td>30.91.866</td><td></td></tr><tr><td>Advancefast</td><td>6857.T</td><td>Rapid growth in High-Bandwidth Memory products</td><td>22,490</td><td>26,100</td><td>16.10</td><td>14.5</td><td>30.6408</td><td></td></tr><tr><td>Micronics Japan</td><td>6871.T</td><td>Rapid growth in High-Bandwidth Memory products</td><td>8,460</td><td>9,500</td><td>12.30</td><td>4.6</td><td>20.7238</td><td></td></tr><tr><td>DISCO</td><td>6146.T</td><td>Rapid growth in High-Bandwidth Memory products</td><td>59,100</td><td>73,500</td><td>24.40</td><td>8.7</td><td>13.9726</td><td></td></tr><tr><td>Tokyo Seimitsu</td><td>7729.T</td><td>Rapid growth in High-Bandwidth Memory products</td><td>12,335</td><td>14,600</td><td>18.40</td><td>2.4</td><td>17.8540</td><td></td></tr><tr><td>Applied Materials Inc.</td><td>AMAT.O</td><td>Exposed to the most visible growth drivers (DRAM greenfield waaler additions and TSMC's 3mm built)</td><td>301.89</td><td>273</td><td>-9.60</td><td>10.4</td><td>30.99.42</td><td></td></tr><tr><td colspan="8">Module/Component Makers</td><td></td></tr><tr><td>Shenzhen Longyues Electronics Co Ltd</td><td>301308.SZ</td><td>Favourable consumer NAND pricing + enterprise business expansion</td><td>311.5</td><td>325</td><td>4.30</td><td>10.7</td><td>29.83.67</td><td></td></tr><tr><td>Phison Electronics Corp</td><td>8299.TWO</td><td>Favourable consumer NAND pricing</td><td>1,795</td><td>1,900</td><td>-44.30</td><td>6.4</td><td>38.823.28</td><td></td></tr><tr><td>Silicon Motion</td><td>SIM.O</td><td>Pure NAND controller play with growing opportunities in eSSD</td><td>111.74</td><td>1,050</td><td>-6.00</td><td>3.8</td><td>213.42</td><td></td></tr><tr><td colspan="8">Specialty Memory</td><td></td></tr><tr><td>Windbond Electronics Corp</td><td>2344.TW</td><td>Pricing upside on DDR4/3, NOR, SLC NAND; and LT opportunities in CUBE</td><td>99.2</td><td>88</td><td>-11.30</td><td>3.6</td><td>16.81.16</td><td></td></tr><tr><td>AP Memory Technology Corp</td><td>6531.TW</td><td>Strong outlier for IPO, and a better pricing for DDR3; VHM to drive long-term growth</td><td>470</td><td>475</td><td>1.10</td><td>6.4</td><td>30.82.86</td><td></td></tr><tr><td>GigadDevicemotor Beijing Inc</td><td>603986.SS</td><td>Better DDR4/3 and NOR flash pricing; MCU to gain share and drive growth; W/OW opportunities</td><td>254.96</td><td>255</td><td>0.00</td><td>8</td><td>49.6218</td><td></td></tr><tr><td>Nanya Technology Corp.</td><td>2408.TW</td><td>Better DDR4/3 pricing; potential DOLS order win</td><td>237.5</td><td>198</td><td>-16.60</td><td>3.4</td><td>15.60.9</td><td></td></tr><tr><td>Powerchip Semiconductor Manufacturing Co.</td><td>6770.TW</td><td>Better DDR4/3 pricing; potentially benefiting from hybrid bonding capacity</td><td>51.3</td><td>41</td><td>-20.10</td><td>2.6</td><td>-1.88</td><td></td></tr><tr><td>Meccanix International Co. Ltd</td><td>2337.TW</td><td>Major beneficiary of NOR and MLC NAND price hikes; turning profitable in 2026</td><td>61.9</td><td>48</td><td>-22.50</td><td>2.9</td><td>66.92.16</td><td></td></tr><tr><td colspan="8">PACKaround</td><td></td></tr><tr><td colspan="8">PCServer OEM</td><td></td></tr><tr><td>Acer Inc.</td><td>2353.TW</td><td></td><td>27.1</td><td>20</td><td>-26.20</td><td>1.1</td><td>18.61.39</td><td></td></tr><tr><td>Del Technologies Inc.</td><td>DELL.N</td><td></td><td>118,699</td><td>113</td><td>-4.80</td><td>11.9</td><td>8.18.2</td><td></td></tr><tr><td>HP Inc.</td><td>HPON</td><td></td><td>20.77</td><td>20</td><td>-3.70</td><td>65.4</td><td>7.33.12</td><td></td></tr><tr><td>Hewlett Packard Enterprise</td><td>HPE.N</td><td></td><td>22.09</td><td>25</td><td>13.20</td><td>1.2</td><td>10.11.94</td><td></td></tr><tr><td colspan="8">PC peripheral</td><td></td></tr><tr><td>Logitech International SA</td><td>LOGI.O</td><td>Memory inflation is amplifying macro headsets rather than offsetting them.</td><td>96.74</td><td>107</td><td>10.60</td><td>6.3</td><td>17.64.84</td><td></td></tr><tr><td colspan="8">RF</td><td></td></tr><tr><td>Qorvo Inc.</td><td>QRVO.O</td><td>Memory headsets are viewed as an industry-wide challenge as elevated costs risk suppressing downstream hardware demand.</td><td>81.95</td><td>110</td><td>34.20</td><td>2.1</td><td>12.95.76</td><td></td></tr><tr><td>Skyworks Solutions Inc</td><td>SWKS.O</td><td>Memory headsets are viewed as an industry-wide challenge as elevated costs risk suppressing downstream hardware demand.</td><td>59.88</td><td>89</td><td>48.70</td><td>1.7</td><td>13.35.93</td><td></td></tr><tr><td colspan="8">Foundry</td><td></td></tr><tr><td>GlobalFoundries Inc</td><td>GFS.O</td><td>Memory headsets are viewed as an industry-wide challenge as elevated costs risk suppressing downstream hardware demand.</td><td>41.35</td><td>35</td><td>-15.40</td><td>1.9</td><td>23.41.66</td><td></td></tr></table> Source: FactSet, Morgan Stanley Research estimates (E) # Appendix: Memory Usage Tier Breakdown Assumptions Exhibit 22: Assumptions table <table><tr><td>Category</td><td>Assumption</td><td>Value</td><td>Units / Notes</td></tr><tr><td>Demand</td><td>Peak QPS</td><td>300,000</td><td>req/s</td></tr><tr><td>Demand</td><td>Input tokens / request</td><td>2,000</td><td>tokens</td></tr><tr><td>Demand</td><td>RAG tokens / request</td><td>4,000</td><td>RAG/Input 2:1 (in practice 2:1 - 10:1)</td></tr><tr><td>Demand</td><td>Output tokens / request</td><td>667</td><td>Input/output 3:1</td></tr><tr><td>Demand</td><td>Total tokens / request</td><td>6,667</td><td>tokens</td></tr><tr><td>Demand</td><td>Peak-day tokens processed</td><td>58,300,000,000,000</td><td>tokens/day (given)</td></tr><tr><td>Demand</td><td>Peak-day tokens processed (from QPS)</td><td>172,800,000,000,000</td><td>tokens/day (QPS × tokens/req × 86400)</td></tr><tr><td>Demand</td><td>Active prefill sequences</td><td>300,000</td><td>Peak QPS: 300,000 req/s; TTF: 1.0 s</td></tr><tr><td>Demand</td><td>Active decode sequences</td><td>1,500,000</td><td>Decode speed: 60 tokens/sec/request</td></tr><tr><td>Demand</td><td>Active total sequences</td><td>1,800,000</td><td>given</td></tr><tr><td>KV</td><td>KV precision (bytes/elem)</td><td>2</td><td>FP16 KV => 2 bytes</td></tr><tr><td>KV</td><td>Layers (L)</td><td>96.00</td><td></td></tr><tr><td>KV</td><td>KV heads (Hkv)</td><td>8.00</td><td>GQA</td></tr><tr><td>KV</td><td>Head dim (Dhead)</td><td>128.00</td><td></td></tr><tr><td>KV</td><td>KV bytes / token</td><td>393,216</td><t