Deep Dive into the Implementation of std::vector in libstdc++

Extracting Readable Source Code

To analyze the implementation of std::vector without the noise of external headers, we can use a simple trick with GCC's preprocessor. The following code is compiled to generate a flat source file.

// main.cpp
#include <vector>
int main() {
  std::vector<int> v;
  v.emplace_back(1);
}

Run the preprocessor and redirect output:

g++ -E main.cpp -std=c++11 > vector_source.cpp

In an editor like VSCode, use the regex #.*\n to remove compiler directives. This leaves a clean, navigable source file corresponding to libstdc++ (based on GCC 4.8.5 logic).

Understanding the Allocator Traits

A custom allocator must satisfy specific requirements to work with STL containers. The essential operations include:

  • allocate: Reserve raw memory.
  • deallocate: Release raw memory.
  • max_size: Return the maximum theoretical number of objects the allocator can allocate.
  • construct: Initialize an object in allocated memory (often via placement new).
  • destroy: Call the destructor of an object.

Standard Allocator Implementation

The default std::allocator delegates memory operations to global new and delete.

pointer allocate(size_type __n, const void * = 0) {
  if (__n > this->max_size())
    std::__throw_bad_alloc();
  return static_cast<_Tp *>(::operator new(__n * sizeof(_Tp)));
}

void deallocate(pointer __p, size_type) { ::operator delete(__p); }

The maximum size is calculated based on the process address space width:

size_type max_size() const throw() { return size_t(-1) / sizeof(_Tp); }

Object lifetime management uses placement new and explicit destructor calls:

void construct(pointer __p, const _Tp &__val) {
  ::new ((void *)__p) _Tp(__val);
}

void destroy(pointer __p) { __p->~_Tp(); }

std::vector Architecture

std::vector is a sequence container with dynamic size. In libstdc++, it is defined as:

template <typename _Tp, typename _Alloc = std::allocator<_Tp>>
class vector : protected _Vector_base<_Tp, _Alloc> {};

It uses protected inheritance from _Vector_base, which handles memory management, separating it from the interface logic.

The _Vector_base and Impl Structure

The base class contains a nested struct _Vector_impl which inherits from the allocator type and holds the three critical pointers defining the vector's state:

struct _Vector_impl : public _Tp_alloc_type {
  pointer _M_start;        // Beginning of the data
  pointer _M_finish;       // End of initialized elements (size)
  pointer _M_end_of_storage; // End of allocated capacity

  _Vector_impl() : _Tp_alloc_type(), _M_start(0), _M_finish(0), _M_end_of_storage(0) {}
  
  void _M_swap_data(_Vector_impl &__x) {
    std::swap(_M_start, __x._M_start);
    std::swap(_M_finish, __x._M_finish);
    std::swap(_M_end_of_storage, __x._M_end_of_storage);
  }
};

The base class handles the actual memory lifecycle:

_Vector_base(size_t __n) : _M_impl() { _M_create_storage(__n); }

void _M_create_storage(size_t __n) {
  this->_M_impl._M_start = this->_M_allocate(__n);
  this->_M_impl._M_finish = this->_M_impl._M_start;
  this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
}

Constructors

Key constructors initialize the base and populate data:

// Default constructor
explicit vector(const allocator_type &__a) : _Base(__a) {}

// Initializer list constructor
vector(initializer_list<value_type> __l, const allocator_type &__a = allocator_type())
    : _Base(__a) {
  _M_range_initialize(__l.begin(), __l.end(), random_access_iterator_tag());
}

Core Method Implementations

Size and Capacity

These methods simply calculate the distance between the internal pointers:

size_type size() const {
  return size_type(this->_M_impl._M_finish - this->_M_impl._M_start);
}

size_type capacity() const {
  return size_type(this->_M_impl._M_end_of_storage - this->_M_impl._M_start);
}

Appending Elements (push_back / emplace_back)

When capacity allows, construction happens in-place at the finish pointer. If full, a reallocation occurs.

void push_back(const value_type &__x) {
  if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) {
    _Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish, __x);
    ++this->_M_impl._M_finish;
  } else {
    _M_emplace_back_aux(__x);
  }
}

The reallocation logic allocates a larger buffer (typically doubling), moves existing elements, constructs the new one, and destroys the old buffer.

template <typename... _Args>
void vector<_Tp, _Alloc>::_M_emplace_back_aux(_Args && ...__args) {
  const size_type __len = _M_check_len(size_type(1), "vector::_M_emplace_back_aux");
  pointer __new_start(this->_M_allocate(__len));
  
  // Construct the new element immediately after the old data space
  _Alloc_traits::construct(this->_M_impl, __new_start + size(), std::forward<_Args>(__args)...);
  
  // Move old elements to new storage
  pointer __new_finish = std::__uninitialized_move_if_noexcept_a(
      this->_M_impl._M_start, this->_M_impl._M_finish, __new_start, _M_get_Tp_allocator());
  ++__new_finish;

  // Cleanup old storage
  std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish, _M_get_Tp_allocator());
  _M_deallocate(this->_M_impl._M_start, this->_M_impl._M_end_of_storage - this->_M_impl._M_start);

  // Update pointers
  this->_M_impl._M_start = __new_start;
  this->_M_impl._M_finish = __new_finish;
  this->_M_impl._M_end_of_storage = __new_start + __len;
}

The growth strategy in _M_check_len usually follows a factor of 2 (or similar geometric growth):

size_type _M_check_len(size_type __n, const char *__s) const {
  if (max_size() - size() < __n) __throw_length_error((__s));
  const size_type __len = size() + std::max(size(), __n);
  return (__len < size() || __len > max_size()) ? max_size() : __len;
}

Resizing (resize)

resize adjusts the number of elements. If shrinking, it destroys elements at the end. If growing and capacity is sufficient, it default-cosntructs new elements. If capacity is exceeded, it performs a reallocation similar to push_back.

void resize(size_type __new_size) {
  if (__new_size > size())
    _M_default_append(__new_size - size());
  else if (__new_size < size())
    _M_erase_at_end(this->_M_impl._M_start + __new_size);
}

Memory Management (reserve / shrink_to_fit)

reserve forces a capacity increase if the requetsed size exceeds current capacity.

void vector<_Tp, _Alloc>::reserve(size_type __n) {
  if (__n > this->max_size()) __throw_length_error(("vector::reserve"));
  if (this->capacity() < __n) {
    // Allocate new, move data, deallocate old
    pointer __tmp = _M_allocate_and_copy(...);
    std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish, _M_get_Tp_allocator());
    _M_deallocate(...);
    this->_M_impl._M_start = __tmp;
    this->_M_impl._M_finish = __tmp + __old_size;
    this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
  }
}

shrink_to_fit attempts to reduce capacity to match size by creating a temporary vector with the exact size and swapping:

std::vector<int> v;
v.push_back(1); // size=1, capacity=1
v.push_back(1); // size=2, capacity=2
v.push_back(1); // size=3, capacity=4

// Force shrink
std::vector<int>(v.begin(), v.end()).swap(v); // size=3, capacity=3

Time Complexity Analysis

Complexity Method Explanation
O(1) size(), capacity() Pointer arithmetic
O(1) amortized push_back() Geometric reallocation
O(n) insert() Requires shifting elements
O(n) clear() Destructor calls for all elements
O(n) reserve() Copying elements to new storage

Proof of Amortized O(1) for push_back

Assuming a growth factor of 2, let's analyze n insertions. Let c_i be the cost of the i-th insertion.

  • If no reallocation: c_i = 1.
  • If reallocation occurs (when i-1 is a power of 2): c_i = i.

The total cost sum is:

$$ \sum_{i=1}^n c_i \le n + \sum_{j=0}^{\lfloor \lg n \rfloor} 2^j \le n + 2n = 3n $$

Thus, the amortized cost per operation is bounded by 3, resulting in O(1).

Tags: std::vector libstdc++ C++ STL internals allocator

Posted on Tue, 26 May 2026 19:42:03 +0000 by warmwind