Rust std study series: Vec

Rust
Author

Ehsan M. Kermani

Published

March 10, 2019

The upcoming series of blog posts will contain my study of Rust standard library. I’ve partially written some parts of the series in scattered places and want to gather them in one place for better and easier access. I intend to update whenever I discover something interesting/important to remember.

I’m referring to implementations inRust stable v1.33.0.

This post covers some details of std::vec that I found important.

Vec

Vec<T> is a dynamic array which only grows and never shrinks automatically.

Acontiguousgrowable arraytype withheap-allocatedcontents.

Rust std doc

Notice the difference between its counterpartstack-allocated fixed-sizearray [T; N] (where at this time N needs to be a specified non-negative integer. Constant generic hopefully will come soon).

Ok! let’s dig in to it. Vec<T> contains(pointer, capacity, length).

Vec pointer

  1. The pointer willnever be null, so it enjoysnull-pointer-optimization.
  2. The pointer maynotactually point to allocated memory for example inVec::new(), vec![] or Vec::with_capacity(0).

Vec capacity and length

  1. The capacity of a vector is the amount of space allocated for any future elements that will be added onto the vector.
  2. The length is the number of actual elements pushed/inserted in the vector.
  3. Vec allocates memory iff mem::size_of::<T>() * capacity() > 0. So it doesnotallocate for a Zero-Sized-Type (ZST) even with positive capacity.
  4. When length matches the capacity, Vec will (re)allocate by a certain growth factor. This makes insertionamortized\(O(1)\). Right now the growth factor is 2. However, comparing to other languages such as C++, Java, etc. it doesn’t seem to be optimal given any global first-fit allocator. Heuristically,1.5or a number a bit less than golden ratio is considered optimal. Here’s the related issue that’s currently open. I found it interesting to dig in!
  5. How about shrink factor? for example, if we pop half of the elements out, would a quarter of the memory be freed? No, actually! That’s if poping all the elements the capacity won’t change leaving a hole on the heap. Therefore, pop (from back) is \(O(1)\) andnotamortized \(O(1).\) If you need to free up some memory, use shrink_to_fit.
  6. If need to use Vec for FFI or as a memory-backed collection, be sure to deallocate memory with from_raw_parts and then drop explicitly.
  7. If used in FFI and need to pass as pointer, for safety remember to call shrink_to_fit or truncate by the length prior to passing the (as_mut_ptr() or as_ptr()) to not pass uninitialized memory buffer.
  8. The order of elements is always guaranteed to be the same if coerced into slice.

Here’s the stripped down definition

struct Vec<T> {
    buf: RawVec<T>,
    len: usize,
}

// Default Global allocator
struct RawVec<T, A: Alloc = Global> {
    ptr: Unique<T>,
    cap: usize,
    a: A,
}

#[repr(transparent)]
struct Unique<T: ?Sized> {
    pointer: *const T,
    _marker: PhantomData<T>,
}

Unique

  1. #[repr(transparent)] enforces that Unique<T>type representation is the same as *const T.
  2. Unique<T> is the covariant version of *mut T (wrt T) and hasstrongersemantic than NonNull<T>.
  3. Unlike *mut T, the pointer mustalways be non-null.
  4. In fact, Box<T> wraps Unique<T> i.e. struct Box<T: ?Sized>(Unique<T>).
  5. It can be accessed in nightly through #![feature(ptr_internals)] and core::ptr::Unique.
  6. If Tis Send/Sync then Unique<T> is Send/Sync.
  7. The presence of PhantomData<T> marker is only important forrustc dropckto understand that we logically own a T causing the main difference between Unique<T> and NonNull<T> where it’s defined as
// NonNull<T> doesn't own the referent whereas Unique<T> does
#[repr(transparent)]
struct NonNull<T: ?Sized> {
    pointer: *const T,
}