Extracting Readable Source Code
To analyze the implementation of std::vector without the noise of external headers, we can use a simple trick with GCC's preprocessor. The following code is compiled to generate a flat source file.
// main.cpp
#include <vector>
int main() {
std::vector<int> v;
v.emplace_back(1);
}
Run the preprocessor and redirect output:
g++ -E main.cpp -std=c++11 > vector_source.cpp
In an editor like VSCode, use the regex #.*\n to remove compiler directives. This leaves a clean, navigable source file corresponding to libstdc++ (based on GCC 4.8.5 logic).
Understanding the Allocator Traits
A custom allocator must satisfy specific requirements to work with STL containers. The essential operations include:
- allocate: Reserve raw memory.
- deallocate: Release raw memory.
- max_size: Return the maximum theoretical number of objects the allocator can allocate.
- construct: Initialize an object in allocated memory (often via placement new).
- destroy: Call the destructor of an object.
Standard Allocator Implementation
The default std::allocator delegates memory operations to global new and delete.
pointer allocate(size_type __n, const void * = 0) {
if (__n > this->max_size())
std::__throw_bad_alloc();
return static_cast<_Tp *>(::operator new(__n * sizeof(_Tp)));
}
void deallocate(pointer __p, size_type) { ::operator delete(__p); }
The maximum size is calculated based on the process address space width:
size_type max_size() const throw() { return size_t(-1) / sizeof(_Tp); }
Object lifetime management uses placement new and explicit destructor calls:
void construct(pointer __p, const _Tp &__val) {
::new ((void *)__p) _Tp(__val);
}
void destroy(pointer __p) { __p->~_Tp(); }
std::vector Architecture
std::vector is a sequence container with dynamic size. In libstdc++, it is defined as:
template <typename _Tp, typename _Alloc = std::allocator<_Tp>>
class vector : protected _Vector_base<_Tp, _Alloc> {};
It uses protected inheritance from _Vector_base, which handles memory management, separating it from the interface logic.
The _Vector_base and Impl Structure
The base class contains a nested struct _Vector_impl which inherits from the allocator type and holds the three critical pointers defining the vector's state:
struct _Vector_impl : public _Tp_alloc_type {
pointer _M_start; // Beginning of the data
pointer _M_finish; // End of initialized elements (size)
pointer _M_end_of_storage; // End of allocated capacity
_Vector_impl() : _Tp_alloc_type(), _M_start(0), _M_finish(0), _M_end_of_storage(0) {}
void _M_swap_data(_Vector_impl &__x) {
std::swap(_M_start, __x._M_start);
std::swap(_M_finish, __x._M_finish);
std::swap(_M_end_of_storage, __x._M_end_of_storage);
}
};
The base class handles the actual memory lifecycle:
_Vector_base(size_t __n) : _M_impl() { _M_create_storage(__n); }
void _M_create_storage(size_t __n) {
this->_M_impl._M_start = this->_M_allocate(__n);
this->_M_impl._M_finish = this->_M_impl._M_start;
this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
}
Constructors
Key constructors initialize the base and populate data:
// Default constructor
explicit vector(const allocator_type &__a) : _Base(__a) {}
// Initializer list constructor
vector(initializer_list<value_type> __l, const allocator_type &__a = allocator_type())
: _Base(__a) {
_M_range_initialize(__l.begin(), __l.end(), random_access_iterator_tag());
}
Core Method Implementations
Size and Capacity
These methods simply calculate the distance between the internal pointers:
size_type size() const {
return size_type(this->_M_impl._M_finish - this->_M_impl._M_start);
}
size_type capacity() const {
return size_type(this->_M_impl._M_end_of_storage - this->_M_impl._M_start);
}
Appending Elements (push_back / emplace_back)
When capacity allows, construction happens in-place at the finish pointer. If full, a reallocation occurs.
void push_back(const value_type &__x) {
if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage) {
_Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish, __x);
++this->_M_impl._M_finish;
} else {
_M_emplace_back_aux(__x);
}
}
The reallocation logic allocates a larger buffer (typically doubling), moves existing elements, constructs the new one, and destroys the old buffer.
template <typename... _Args>
void vector<_Tp, _Alloc>::_M_emplace_back_aux(_Args && ...__args) {
const size_type __len = _M_check_len(size_type(1), "vector::_M_emplace_back_aux");
pointer __new_start(this->_M_allocate(__len));
// Construct the new element immediately after the old data space
_Alloc_traits::construct(this->_M_impl, __new_start + size(), std::forward<_Args>(__args)...);
// Move old elements to new storage
pointer __new_finish = std::__uninitialized_move_if_noexcept_a(
this->_M_impl._M_start, this->_M_impl._M_finish, __new_start, _M_get_Tp_allocator());
++__new_finish;
// Cleanup old storage
std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish, _M_get_Tp_allocator());
_M_deallocate(this->_M_impl._M_start, this->_M_impl._M_end_of_storage - this->_M_impl._M_start);
// Update pointers
this->_M_impl._M_start = __new_start;
this->_M_impl._M_finish = __new_finish;
this->_M_impl._M_end_of_storage = __new_start + __len;
}
The growth strategy in _M_check_len usually follows a factor of 2 (or similar geometric growth):
size_type _M_check_len(size_type __n, const char *__s) const {
if (max_size() - size() < __n) __throw_length_error((__s));
const size_type __len = size() + std::max(size(), __n);
return (__len < size() || __len > max_size()) ? max_size() : __len;
}
Resizing (resize)
resize adjusts the number of elements. If shrinking, it destroys elements at the end. If growing and capacity is sufficient, it default-cosntructs new elements. If capacity is exceeded, it performs a reallocation similar to push_back.
void resize(size_type __new_size) {
if (__new_size > size())
_M_default_append(__new_size - size());
else if (__new_size < size())
_M_erase_at_end(this->_M_impl._M_start + __new_size);
}
Memory Management (reserve / shrink_to_fit)
reserve forces a capacity increase if the requetsed size exceeds current capacity.
void vector<_Tp, _Alloc>::reserve(size_type __n) {
if (__n > this->max_size()) __throw_length_error(("vector::reserve"));
if (this->capacity() < __n) {
// Allocate new, move data, deallocate old
pointer __tmp = _M_allocate_and_copy(...);
std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish, _M_get_Tp_allocator());
_M_deallocate(...);
this->_M_impl._M_start = __tmp;
this->_M_impl._M_finish = __tmp + __old_size;
this->_M_impl._M_end_of_storage = this->_M_impl._M_start + __n;
}
}
shrink_to_fit attempts to reduce capacity to match size by creating a temporary vector with the exact size and swapping:
std::vector<int> v;
v.push_back(1); // size=1, capacity=1
v.push_back(1); // size=2, capacity=2
v.push_back(1); // size=3, capacity=4
// Force shrink
std::vector<int>(v.begin(), v.end()).swap(v); // size=3, capacity=3
Time Complexity Analysis
| Complexity | Method | Explanation |
|---|---|---|
| O(1) | size(), capacity() | Pointer arithmetic |
| O(1) amortized | push_back() | Geometric reallocation |
| O(n) | insert() | Requires shifting elements |
| O(n) | clear() | Destructor calls for all elements |
| O(n) | reserve() | Copying elements to new storage |
Proof of Amortized O(1) for push_back
Assuming a growth factor of 2, let's analyze n insertions. Let c_i be the cost of the i-th insertion.
- If no reallocation:
c_i = 1. - If reallocation occurs (when
i-1is a power of 2):c_i = i.
The total cost sum is:
$$ \sum_{i=1}^n c_i \le n + \sum_{j=0}^{\lfloor \lg n \rfloor} 2^j \le n + 2n = 3n $$
Thus, the amortized cost per operation is bounded by 3, resulting in O(1).