HANA Memory Management & OOM Analysis: Heap, Column Store & Real Fixes – Day 6 Guide | Part 51 | FreeLearning365

 

HANA Memory Management & OOM Analysis: Heap, Column Store & Real Fixes – Day 6 Guide | Part 51 | FreeLearning365

🔥 DAY06 [SAP HANA Database & Performance] {Part-51}

Welcome to the day where you truly understand what’s happening inside HANA’s RAM. Over the last five days, you’ve built procedures, tuned queries, and modeled data like a pro. But none of that matters if the database runs out of memory and crashes. Today we’re diving into the dark, often misunderstood world of HANA memory management: how the heap allocator works, how the column store consumes memory, and what to do when the dreaded “Out of Memory” (OOM) kills your production system. This is the survival knowledge that separates a reactive Basis admin from a proactive performance engineer.

We’ll decode the numbers you see in HANA Cockpit, explain why a 128 GB system can OOM with only 80 GB allocated, and give you a step‑by‑step method to analyze and fix memory issues. By the end, you’ll be able to read an OOM dump, identify the culprit, and prevent future incidents – skills that directly save companies tens of thousands in avoided outages.


🧠 The HANA Heap Allocator – The Gatekeeper of RAM

HANA manages its dynamic memory through a heap allocator, a layer that sits between the operating system and the internal memory consumers (column store, row store, SQL execution, caches). Understanding this allocator is key because when memory runs out, it’s the allocator that throws the OOM error, not the OS.

🗂️ Shared Memory vs Process‑Local Heap

HANA’s memory landscape is split into two main areas:

  • Shared Memory: Used by the row store, table metadata, and some internal structures. This memory is pre‑allocated at startup and cannot grow dynamically. It’s sized by parameters like row_engine_memory.
  • Heap Memory: The dynamic, general‑purpose area. Column store tables, intermediate query results, SQLScript variables, and caches all live here. This is where OOMs happen.

The heap allocator obtains memory from the OS in chunks and then sub‑allocates it to internal components. It also handles garbage collection and defragmentation. When the total heap memory approaches the global_allocation_limit, the allocator starts rejecting allocation requests, triggering an OOM error.

📊 Key System Views for Heap Analysis

-- Overall heap usage SELECT * FROM M_HEAP_MEMORY; -- Memory by component SELECT COMPONENT, EXCLUSIVE_SIZE_IN_GB, MALLOC_SIZE_IN_GB FROM M_HEAP_MEMORY WHERE CATEGORY = 'All'; -- Memory per service (index server) SELECT * FROM M_SERVICE_MEMORY;

In M_HEAP_MEMORY, the column “EXCLUSIVE_SIZE” is crucial: it shows memory that is exclusively owned by a component and cannot be shared. “MALLOC_SIZE” is the total memory requested from the OS allocator. A large difference between the two often indicates memory fragmentation.

🧠 Real‑life insight: On a system with 128 GB RAM, the global allocation limit might be set to 112 GB. HANA will never use all 128 GB; it leaves some for the OS and other services. If the heap reaches 112 GB, even if physical RAM is free, the allocator refuses further allocations. You must tune the limit, not just add RAM.

📚 Column Store Memory Architecture – Where Your Data Lives

The column store is by far the largest memory consumer. Understanding its internal memory breakdown helps you predict usage and size tables correctly.

🔹 Main Store and Delta Store Memory

As you learned in Day 1, each column‑store table consists of a Main Store (compressed, read‑optimized) and a Delta Store (write‑optimized, row‑oriented). Both consume heap memory, but in very different ways:

  • Main Store Memory: Depends on the uncompressed size of columns, compression ratio, and dictionary sizes. Typically 5‑10x smaller than original raw data. You can query M_CS_TABLES to see RAW_SIZE (uncompressed size) vs MEMORY_SIZE_IN_MAIN (actual memory used).
  • Delta Store Memory: Each row in the delta store is stored uncompressed (or lightly compressed), so memory grows linearly with the number of unmerged rows. A large delta can suddenly spike memory usage.
-- Check per‑table memory details SELECT TABLE_NAME, TO_DECIMAL(RAW_SIZE /1024/1024/1024,10,2) AS RAW_GB, TO_DECIMAL(MEMORY_SIZE_IN_MAIN /1024/1024/1024,10,2) AS MAIN_GB, TO_DECIMAL(MEMORY_SIZE_IN_DELTA /1024/1024/1024,10,2) AS DELTA_GB, DELTA_RECORD_COUNT FROM M_CS_TABLES WHERE SCHEMA_NAME = 'SAPSR3' AND TABLE_NAME = 'ACDOCA';

🔹 Other Hidden Memory Consumers

  • MVCC (Multi‑Version Concurrency Control): Each update creates a new version. Old versions are kept until they are no longer needed by any active transaction. This can temporarily double the memory of updated rows. Seen via M_CS_TABLES.MEMORY_SIZE_IN_MVCC.
  • Intermediate Results: When a query executes, it may create temporary structures (hash tables for joins, aggregation buffers) that allocate heap memory. The statement_memory_limit caps these per‑statement.
  • Plan Cache: Compiled execution plans for SQL statements are cached to avoid re‑compilation. This uses a modest amount of memory but can grow with thousands of unique queries.

🛡️ Memory Parameters – The Invisible Fences

HANA’s memory behavior is governed by several key parameters. Misconfiguring them is the leading cause of OOM. Let’s demystify the most important ones.

global_allocation_limit

This is the maximum total memory HANA can allocate from the OS. It’s usually set to a value slightly below physical RAM (e.g., 90% for dedicated appliances). If HANA’s total allocation reaches this limit, no new allocations are allowed – and any query asking for more memory gets an OOM. Check it with:

SELECT SECTION, KEY, VALUE FROM M_INIFILE_CONTENTS WHERE FILE_NAME='global.ini' AND KEY='global_allocation_limit';

statement_memory_limit

Limits memory per individual SQL statement. It prevents a single query from starving all other processes. It’s expressed in bytes (e.g., 1073741824 = 1 GB). If a statement needs more, it aborts with an OOM error before consuming system‑wide resources. Tune this based on your workload.

ALTER SYSTEM ALTER CONFIGURATION ('global.ini', 'system') SET ('memorymanager', 'statement_memory_limit') = '2147483648' WITH RECONFIGURE;

column_engine_memory_limit (available in later HANA 2.0 SPS)

This restricts the total memory used by the column store engine, including main and delta stores. It helps prevent the column store from taking 100% of the heap and leaving nothing for other operations.

Workload Classes and Memory Quotas

In HANA 2.0, you can assign memory budgets to workload classes. For example, critical OLTP queries get a higher guarantee, while batch analytics get a hard limit. This is advanced tuning but extremely powerful in mixed S/4HANA environments.

✅ Best Practice: Set global_allocation_limit to 85‑90% of physical RAM on a dedicated HANA server. For multi‑tenant, allocate per tenant. Never set it to 100%; the OS needs memory for networking, file caches, and HANA’s own non‑heap processes.

💀 OOM Analysis – Decoding the Death Certificate

When HANA hits a memory limit, it generates an OOM dump – a snapshot of memory state at the time of failure. This dump is written to the trace directory and contains the information you need to identify the cause. The dump file name looks like indexserver_oom_20260520_143022.trc.

🧐 Step‑by‑Step OOM Diagnosis

  1. Locate the OOM dump: In the HANA trace directory (/usr/sap/{SID}/HDB{instance}/trace) find the latest *.oom.trc file. Download it to your local machine.
  2. Open with HANA Studio: In Studio, use “Open OOM Dump” from the menu (or double‑click the file). You’ll see a graphical breakdown.
  3. Check the “Allocator Summary”: It shows how much memory each allocator (Heap, Shared, etc.) was using. Often the heap is near the global limit.
  4. Identify the top memory consumer: The dump includes a list of the largest allocations at the time of OOM. Look for a component with a huge chunk – often “ColumnStore”, “SqlScript”, or “IntermediateResults”.
  5. Find the offending statement: The dump often includes the last SQL statement that caused the allocation failure. Note the connection ID and the full SQL text.
  6. Correlate with M_EXPENSIVE_STATEMENTS: Use the timestamp and connection ID to find the statement in the expensive statement trace. You’ll see its duration, memory, and CPU usage.

🧪 Hands‑on Lab: Simulate an OOM and Analyze the Dump (Safe Sandbox Only!)

We’ll intentionally trigger an OOM by setting a very low statement_memory_limit and running a memory‑hungry query. This is for demonstration purposes in a non‑production system.

-- 1. Set a tiny limit (e.g., 10 MB) ALTER SYSTEM ALTER CONFIGURATION ('global.ini', 'system') SET ('memorymanager', 'statement_memory_limit') = '10485760' WITH RECONFIGURE; -- 2. Run a query that needs more than 10 MB SELECT * FROM ACDOCA CROSS JOIN ACDOCA; -- This will quickly exhaust 10 MB -- 3. The statement should fail with an OOM error. Check trace directory for the dump.

After the error, reset the limit to a normal value. Then analyze the dump to see the failed allocation size and statement. This exercise builds intuition for production situations.

📊 Real‑Life OOM Scenario: The Delta Merge Avalanche

A manufacturing company experienced OOM every Monday morning. Investigation revealed that over the weekend, a large number of invoices were posted via IDocs, filling the delta store of ACDOCA. On Monday, the scheduled merge could not complete because the delta merge itself needed extra memory to rewrite the main store, pushing the total column store memory over the limit. The fix: stagger the IDoc processing to spread delta growth, increase the global allocation limit slightly, and schedule merges during off‑peak hours. The OOM vanished.


📈 Monitoring Memory Proactively – Cockpit and SQL Queries

The best OOM is the one that never happens. Use these tools to stay ahead of memory pressure.

HANA Cockpit Memory Overview

Navigate to the tenant → “Memory” card. You’ll see a pie chart of memory usage by component: Column Store, Row Store, Heap Allocator, etc. Set alerts at 80% and 90% usage. The “Memory History” view shows trends over days/weeks, so you can spot slow leaks or weekend growth.

Custom Proactive Queries

-- Current memory utilization in GB SELECT TO_DECIMAL(SUM(EXCLUSIVE_SIZE)/1024/1024/1024,10,2) AS HEAP_USED_GB, TO_DECIMAL(SUM(EXCLUSIVE_SIZE)/NULLIF(MAX(TOTAL_HEAP_SIZE),0)*100,10,2) AS USED_PERCENT FROM M_HEAP_MEMORY; -- Top 10 tables by total memory SELECT TOP 10 SCHEMA_NAME, TABLE_NAME, TO_DECIMAL((MEMORY_SIZE_IN_MAIN + MEMORY_SIZE_IN_DELTA) / 1024/1024/1024,10,2) AS TOTAL_GB FROM M_CS_TABLES ORDER BY TOTAL_GB DESC;

Schedule these queries via a simple SQLScript job and feed them into a monitoring dashboard. Being proactive is always cheaper than reactive.


🔁 Best Practices, Pros & Cons of HANA Memory Management

✅ Best Practices

  • Size memory correctly: Use SAP Quick Sizer and real‑world data samples. Always include a buffer for growth.
  • Set global_allocation_limit to 85‑90% of physical RAM.
  • Implement statement_memory_limit to prevent a single query from killing the system. Start with 2‑4 GB per statement for analytical workloads.
  • Regularly monitor delta merges: A growing delta is a ticking memory bomb. Automate delta merge alerts.
  • Partition large tables (Day 5) to reduce memory scan footprint and improve cache efficiency.
  • Use workload classes to isolate critical OLTP from heavy reporting.
  • Keep HANA version current: Memory allocator improvements in SPS07 and later HANA 2.0 versions significantly reduce OOM risks.

⚠️ Common Pitfalls

  • Ignoring delta size: Some teams focus only on main store and are shocked when delta consumes 30% of memory.
  • Over‑allocating memory to row store: S/4HANA uses row store only for small control tables. Over‑sizing row_engine_memory wastes heap that column store could use.
  • Not monitoring MVCC memory: Long‑running transactions (like a forgotten open query) can hold old row versions, slowly bloating memory.
  • Setting statement_memory_limit too high: Then a single bad report can exhaust the entire heap.

🔄 Alternatives & Cloud Considerations

  • In SAP HANA Cloud, you don’t set global_allocation_limit directly; you choose an instance size. The service automatically manages memory parameters. However, you still need to understand statement limits and delta merge implications.
  • On‑premise, some companies use an additional monitoring tool like Solution Manager or a third‑party APM to predict memory trends.

📌 What You’ve Gained Today – Memory Confidence

You now understand the heap allocator, column store memory breakdown, and the exact steps to diagnose an OOM. You can read system views, set limits, and proactively protect your S/4HANA system from memory‑related outages. Tomorrow, we’ll extend into predictive analytics: HANA PAL for regression and clustering directly in SQL. You’ll see how to apply machine learning without moving data out of the database.

🔥 Action Item: Log into your sandbox HANA and run the proactive memory queries. Check your global_allocation_limit and statement_memory_limit. Review the delta store size of ACDOCA. If delta is > 10% of total table size, plan a merge. Document your findings – you’re now the memory guardian.

Post a Comment

0 Comments