std::mem::transmute the unsafe magic

One great problem that kept me from fully enjoying Rust was the inability to use OOP. Now, before you take me to the gallows, I want you to know that I fully support composition over inheritance, and the concept of trait is one of the primary reasons why I love Rust. However, there are certain situations where OOP is irreplaceable. In my case, it’s the DOM tree structure.

The DOM tree structure was designed in the OOP paradigm. Meaning it relies heavily on inheritance: HTMLElement inherits Element inherits Node.


Extracted from MDN

A straightforward way to implement this behaviour is to use nested structs:

struct HTMLElement {
  element: Element,
  tag_name: String
}

struct Element {
  node: Node,
}

struct Node {
  data: ...
}

This is fine, as long as the data access flows from top to bottom, meaning you can access a property of Node from HTMLElement, but not the other way around. For example, if you have an Element that is also an HTMLElement, there’s no way for you to downcast the Element down to HTMLElement struct to access the tag_name property.

You could argue that this is the Rust “way” and that I should find a way to only access data from the outer struct to the inner struct, but that would certainly reduce the flexibility of the code. Repressing developer experience for the compiler’s happiness is like a slow burning fire that will one day spring out of control. The developer should write the code, not the other way around.

But here we’re pushing the limit of safe Rust. There’s no other choice but to introduce unsafe into the mix, which in truth, is only unsafe for the novice. The great master of the language should draw his/her power from both sides of Rust-Safe and Unsafe; Order and Chaos.

Transmute in Rust is a function that treats a value of one type as another type that you desire, ignoring the type system entirely. In other words, it’s typecasting, a very unsafe type casting, but incredibly powerful. For example, you can cast an array of four u8 into an u32 since four u8 sittings next to each other in memory is a 32-bit memory segment which could be interpreted as a u32.

let a = [0u8, 1u8, 0u8, 0u8];
let b = std::mem::transmute::<[u8; 4], u32>(a);
println!("{}", b); // 256

You probably see where I’m going with this. Even though the DOM tree is a nested structure, the memory layout is still linear. The nested structure only exists in the type system of Rust, so in memory, the address of the HTMLElement is the same as the address of Element and Node (as long as the element and node field is the first field in each struct).

This enables us to use transmute to cast between those types and access the data from the inner struct out to the containing struct.

struct HTMLElement {
  element: Element,
}

struct Element {
  node: Node,
}

struct Node {
  data: String
}

fn main() {

    let html = HTMLElement { element: Element { node: Node { data: String::from("data") } } };

    println!("node data: {}", html.element.node.data);

    let node: Node = unsafe { std::mem::transmute(html) };

    println!("node data: {}", node.data);

    let element: Element = unsafe { std::mem::transmute(node) };

    println!("node data: {}", element.node.data);
}

By default, structs have the Default/Rust representation, which doesn’t guarantee the memory layout due to memory alignment optimization. So the #[repr(C)] attribute is required for your structs to use the C representation and preserve the memory layout.

#[repr(C)]
struct HTMLElement {
  element: Element,
}

#[repr(C)]
struct Element {
  node: Node,
}

#[repr(C)]
struct Node {
  data: String
}

The DOM tree is now infinitely more powerful and flexible. You can even design a convenient API for type casting:

// Not tested yet, don't trust me.
trait Castable {
  fn cast<T>(self) -> T {
    unsafe { std::mem::transmute(self) }
  }
}

let html = HTMLElement::new();
let node: Node = html.cast::<Node>();
let element: Element = node.cast::<Element>();

So unsafe isn’t that unsafe after all, is it? :)

Other cool resources