Skip to content
Course Content
Heap Common Core

What is a Union?

A union is a special data type in C and C++ that allows you to store different data types in the same memory location. Unlike a structure (struct), where each field has its own memory location, all fields in a union share the same memory space.

The size of a union is equal to the size of its largest member. Only one member can hold a value at any given time, as writing to one member overwrites the others.

Basic Union Syntax

Here’s a simple example of a union:

union MyUnion {
    int intValue;       // 4 bytes
    float floatValue;   // 4 bytes
    char charValue;     // 1 byte
};

The size of MyUnion is 4 bytes (the size of the largest member). All three fields occupy the same 4 bytes of memory.

Using a Union

You can access union members like structure members, but remember that they share memory:

union MyUnion u;

// Set the integer value
u.intValue = 0x12345678;

// Reading intValue returns 0x12345678
printf("intValue: 0x%08X\n", u.intValue);

// Reading charValue returns the lowest byte (0x78 on little-endian)
printf("charValue: 0x%02X\n", u.charValue);

// Setting floatValue overwrites the integer
u.floatValue = 3.14f;

// Now intValue contains garbage (the bit pattern of 3.14f)
printf("intValue: 0x%08X\n", u.intValue);

Unions vs Structures

The key difference between unions and structures is memory layout:

struct MyStruct {
    int intValue;     // Offset 0, 4 bytes
    float floatValue; // Offset 4, 4 bytes
    char charValue;   // Offset 8, 1 byte
};  // Total size: 12 bytes (with padding)

union MyUnion {
    int intValue;     // Offset 0, 4 bytes
    float floatValue; // Offset 0, 4 bytes
    char charValue;   // Offset 0, 1 byte
};  // Total size: 4 bytes

In the struct, each field has its own memory. In the union, all fields overlap at offset 0.

Why Use Unions?

Unions serve several important purposes:

  • Memory Efficiency: When only one variant is needed at a time, unions save memory compared to structures.
  • Type Punning: Unions allow reinterpreting the same bits as different types (useful for low-level operations).
  • Bit Field Access: Unions can provide both whole-value and bit-level access to the same data.
  • Variant Data: Storing different types of data depending on context, often paired with an enum to track which variant is active.

Common Pattern: Tagged Unions

A common pattern is to pair a union with an enum to track which member is currently valid:

enum DataType {
    TYPE_INT,
    TYPE_FLOAT,
    TYPE_STRING
};

struct Data {
enum DataType type;
union {
    int intValue;
    float floatValue;
    char* stringValue;
} value;
};

// Usage
struct Data d;
d.type = TYPE_INT;
d.value.intValue = 42;

// Check type before accessing
if (d.type == TYPE_INT) {
printf("Integer: %d\n", d.value.intValue);
}

Unions for Bit Manipulation

Unions are particularly useful for accessing the same data at different granularities:

union Color {
unsigned int rgba;      // 32-bit color value
struct {
    unsigned char r;    // Red channel
    unsigned char g;    // Green channel
    unsigned char b;    // Blue channel
    unsigned char a;    // Alpha channel
} channels;
};

// Usage
union Color c;
c.rgba = 0xFF00FF80;

// Access individual channels
printf("Red: %d\n", c.channels.r);    // 0x80
printf("Alpha: %d\n", c.channels.a);  // 0xFF

Unions in Windows Heap Structures

Windows heap internals extensively use unions for memory efficiency and flexible field access. For example, a heap entry might have different interpretations depending on whether it’s allocated or free:

typedef struct _HEAP_ENTRY {
union {
    struct {
        unsigned short Size;
        unsigned char Flags;
        unsigned char SmallTagIndex;
    };
    unsigned int CompactHeader;
};

union {
    // When allocated: internal flags
    unsigned short UnusedBytes;
    
    // When free: link to next free entry
    void* NextFreeEntry;
};
} HEAP_ENTRY;

In this example:

  • The first union allows accessing the header as individual fields or as a 32-bit integer for atomic operations.
  • The second union reuses the same memory for different purposes depending on whether the block is allocated or free.

Anonymous Unions

Modern C (C11) and C++ allow anonymous unions within structures:

struct HeapBlock {
    unsigned int header;

    union {                    // Anonymous union
        unsigned short size;
        void* nextFree;
    };  // No name for the union itself
};

// Usage - access members directly
struct HeapBlock block;
block.size = 16;         // Not block.unionName.size

Unions with Bit Fields

Unions work well with bit fields for compact encoding:

union Flags {
unsigned int allFlags;
struct {
    unsigned int flag1 : 1;
    unsigned int flag2 : 1;
    unsigned int count : 6;
    unsigned int type  : 4;
    unsigned int reserved : 20;
} bits;
};

// Set entire value at once
union Flags f;
f.allFlags = 0;

// Or manipulate individual bits
f.bits.flag1 = 1;
f.bits.count = 15;
f.bits.type = 3;

Important Considerations

When working with unions, keep these points in mind:

  • Only One Valid Member: Only the last member written is valid. Reading other members gives undefined behavior (except for type punning in some cases).
  • Initialization: Unions can be initialized, but only the first member is guaranteed to work portably in C (C++ and C99+ offer designated initializers).
  • Alignment: The union is aligned to the most restrictive alignment requirement of its members.
  • Padding: Some members may have trailing padding to match the union size.
  • Endianness: Byte-order matters when accessing multi-byte values as bytes. Little-endian (x86/x64) and big-endian systems differ.

Why Windows Heap Uses Unions

The Windows heap structures use unions extensively because:

  • Space Efficiency: Heap metadata must be minimal to reduce overhead. Unions allow different fields for different states without wasting space.
  • Performance: Accessing the same data as different types (like a 32-bit integer or four 8-bit fields) enables efficient manipulation and atomic operations.
  • State-Dependent Fields: Free and allocated blocks need different information, but unions allow both interpretations to coexist.
  • Encoding/Decoding: Unions simplify encoding complex structures into compact formats and decoding them back.