What is a Union?
A union is a special data type in C and C++ that allows you to store different data types in the same memory location. Unlike a structure (struct), where each field has its own memory location, all fields in a union share the same memory space.
The size of a union is equal to the size of its largest member. Only one member can hold a value at any given time, as writing to one member overwrites the others.
Basic Union Syntax
Here’s a simple example of a union:
union MyUnion {
int intValue; // 4 bytes
float floatValue; // 4 bytes
char charValue; // 1 byte
};
The size of MyUnion is 4 bytes (the size of the largest member). All three fields occupy the same 4 bytes of memory.
Using a Union
You can access union members like structure members, but remember that they share memory:
union MyUnion u;
// Set the integer value
u.intValue = 0x12345678;
// Reading intValue returns 0x12345678
printf("intValue: 0x%08X\n", u.intValue);
// Reading charValue returns the lowest byte (0x78 on little-endian)
printf("charValue: 0x%02X\n", u.charValue);
// Setting floatValue overwrites the integer
u.floatValue = 3.14f;
// Now intValue contains garbage (the bit pattern of 3.14f)
printf("intValue: 0x%08X\n", u.intValue);
Unions vs Structures
The key difference between unions and structures is memory layout:
struct MyStruct {
int intValue; // Offset 0, 4 bytes
float floatValue; // Offset 4, 4 bytes
char charValue; // Offset 8, 1 byte
}; // Total size: 12 bytes (with padding)
union MyUnion {
int intValue; // Offset 0, 4 bytes
float floatValue; // Offset 0, 4 bytes
char charValue; // Offset 0, 1 byte
}; // Total size: 4 bytes
In the struct, each field has its own memory. In the union, all fields overlap at offset 0.
Why Use Unions?
Unions serve several important purposes:
- Memory Efficiency: When only one variant is needed at a time, unions save memory compared to structures.
- Type Punning: Unions allow reinterpreting the same bits as different types (useful for low-level operations).
- Bit Field Access: Unions can provide both whole-value and bit-level access to the same data.
- Variant Data: Storing different types of data depending on context, often paired with an enum to track which variant is active.
Common Pattern: Tagged Unions
A common pattern is to pair a union with an enum to track which member is currently valid:
enum DataType {
TYPE_INT,
TYPE_FLOAT,
TYPE_STRING
};
struct Data {
enum DataType type;
union {
int intValue;
float floatValue;
char* stringValue;
} value;
};
// Usage
struct Data d;
d.type = TYPE_INT;
d.value.intValue = 42;
// Check type before accessing
if (d.type == TYPE_INT) {
printf("Integer: %d\n", d.value.intValue);
}
Unions for Bit Manipulation
Unions are particularly useful for accessing the same data at different granularities:
union Color {
unsigned int rgba; // 32-bit color value
struct {
unsigned char r; // Red channel
unsigned char g; // Green channel
unsigned char b; // Blue channel
unsigned char a; // Alpha channel
} channels;
};
// Usage
union Color c;
c.rgba = 0xFF00FF80;
// Access individual channels
printf("Red: %d\n", c.channels.r); // 0x80
printf("Alpha: %d\n", c.channels.a); // 0xFF
Unions in Windows Heap Structures
Windows heap internals extensively use unions for memory efficiency and flexible field access. For example, a heap entry might have different interpretations depending on whether it’s allocated or free:
typedef struct _HEAP_ENTRY {
union {
struct {
unsigned short Size;
unsigned char Flags;
unsigned char SmallTagIndex;
};
unsigned int CompactHeader;
};
union {
// When allocated: internal flags
unsigned short UnusedBytes;
// When free: link to next free entry
void* NextFreeEntry;
};
} HEAP_ENTRY;
In this example:
- The first union allows accessing the header as individual fields or as a 32-bit integer for atomic operations.
- The second union reuses the same memory for different purposes depending on whether the block is allocated or free.
Anonymous Unions
Modern C (C11) and C++ allow anonymous unions within structures:
struct HeapBlock {
unsigned int header;
union { // Anonymous union
unsigned short size;
void* nextFree;
}; // No name for the union itself
};
// Usage - access members directly
struct HeapBlock block;
block.size = 16; // Not block.unionName.size
Unions with Bit Fields
Unions work well with bit fields for compact encoding:
union Flags {
unsigned int allFlags;
struct {
unsigned int flag1 : 1;
unsigned int flag2 : 1;
unsigned int count : 6;
unsigned int type : 4;
unsigned int reserved : 20;
} bits;
};
// Set entire value at once
union Flags f;
f.allFlags = 0;
// Or manipulate individual bits
f.bits.flag1 = 1;
f.bits.count = 15;
f.bits.type = 3;
Important Considerations
When working with unions, keep these points in mind:
- Only One Valid Member: Only the last member written is valid. Reading other members gives undefined behavior (except for type punning in some cases).
- Initialization: Unions can be initialized, but only the first member is guaranteed to work portably in C (C++ and C99+ offer designated initializers).
- Alignment: The union is aligned to the most restrictive alignment requirement of its members.
- Padding: Some members may have trailing padding to match the union size.
- Endianness: Byte-order matters when accessing multi-byte values as bytes. Little-endian (x86/x64) and big-endian systems differ.
Why Windows Heap Uses Unions
The Windows heap structures use unions extensively because:
- Space Efficiency: Heap metadata must be minimal to reduce overhead. Unions allow different fields for different states without wasting space.
- Performance: Accessing the same data as different types (like a 32-bit integer or four 8-bit fields) enables efficient manipulation and atomic operations.
- State-Dependent Fields: Free and allocated blocks need different information, but unions allow both interpretations to coexist.
- Encoding/Decoding: Unions simplify encoding complex structures into compact formats and decoding them back.