Coordinate Systems and Transformations
A coordinate system provides a framework for describing spatial information. Typically, a local frame is established with the observer at the origin, allowing descriptions like "three meters ahead." However, in a shared environment, multiple observers lead to conflicting descriptions (e.g., "3m ahead" vs. "3m behind"). To resolve this, a unified World Coordinate System is required to map local observations to a single, consistent global position.
Frame Definition
Geometrically, a frame consists of an origin and a basis (a set of three vectors). Orthonormall bases are standard due to their mathematical convenience. In a frame with origin $\mathbf{p}$ and basis ${\mathbf{u}, \mathbf{v}, \mathbf{w}}$, the coordinates $(u, v, w)$ describe the point:
$$ \mathbf{p} + u \mathbf{u} + v\mathbf{v} + w\mathbf{w} $$
In computer graphics, these vectors must be represented relative to a canonical system, usually the "World" coordinates. For 2D right-handed systems, we typically use origin $o$ and basis vectors $x$ (right) and $y$ (up).
Vector Properties
A vector is defined solely by magnitude and direction. Its definition is independent of its starting point. In the world coordinate system, a vector $\mathbf{a}$ remains $\mathbf{a}$ regardless of where its translated, provided its direction and length are preserved.
Camera Implementation (Local to World)
In the assignment, the camera coordinate system has its origin at position $e$. The basis vectors align with the world axes, meaning the camera looks down the negative Z-axis.
Eigen::Vector3f camera_pos = {0, 0, 5};
Eigen::Matrix4f compute_view_transform(Eigen::Vector3f pos)
{
Eigen::Matrix4f view_matrix = Eigen::Matrix4f::Identity();
Eigen::Matrix4f translation;
translation << 1, 0, 0, -pos[0],
0, 1, 0, -pos[1],
0, 0, 1, -pos[2],
0, 0, 0, 1;
view_matrix = translation * view_matrix;
return view_matrix;
}
Perspective Projection and Flipping Issues
The rotation matrix logic involves decomposing a vector into components parallel and perpendicular to the rotation axis. The parallel component remains unchanged, while the perpendicular component rotates.
Analyzing the Inverted Triangle
Assuming the following:
- The projection matrix parameters (FOV, aspect ratio) are correct.
- The World Coordinate system has X/Y on the ground plane and Z pointing up (Right-handed).
- The view matrix is implemented correctly, looking towards $-Z$.
- The triangle lies flat on the ground (parallel to X/Y) at a depth of -2m.
With the camera at $z=5$, the triangle is at $z=-2$ in world space, translating to $z=-7$ in camera space.
The Problem: Standard perspective projection implements "foreshortening" where $y' = (n/z) \cdot y$.
- Instructor's Impl: Assuming $n$ and $f$ (near/far) are negative (e.g., $-0.1, -50$), $(n/z)$ is positive. The image scales but does not flip vertically. The transformed $z' = n + f - (nf/z)$ remains negative.
- Code's Impl: The provided code uses positive $zNear=0.1$ and $zFar=50$. Since $z$ is negative ($-7$), the term $(n/z)$ becomes negative. This scales the image and flips the Y-axis.
Furthermore, the standard projection moves the negative Z values into the positive range for depth testing. However, the code lacks explicit clipping against the viewing frustum in homogeneous coordinates. It proceeds directly to normalization (vert /= vert.w()) and viewport transformation. The viewport transform maps Y coordinates assuming a top-left origin (0,0), whereas intuitive display expects a bottom-left origin.
Solution: Change the near and far planes to negative values (e.g., $-0.1, -50$) to match the right-handed coordinate system looking down $-Z$.
Viewport Transformation and Buffer Writes
OpenCV Mat displays images with the origin at the top-left. To render correctly for human viewing (bottom-left origin), we must flip the Y-coordinate when writing to the framebuffer.
Original Flawed Logic:
void rst::rasterizer::set_pixel(const Eigen::Vector3f& point, const Eigen::Vector3f& color)
{
// Maps (0,0) to top-left, but calculation is prone to overflow
if (point.x() < 0 || point.x() >= width ||
point.y() < 0 || point.y() >= height) return;
// Bug: If point.y() is 0, height - 0 = height, causing out-of-bounds access
auto idx = (height - point.y()) * width + point.x();
frame_buf[idx] = color;
}
If point.y() is 0, the index becomes height * width, which exceeds the valid buffer range $[0, height \times width)$. While vector access might not throw an error immediately, it corrupts memory.
Corrected Logic:
void rst::rasterizer::set_pixel(const Eigen::Vector3f& point, const Eigen::Vector3f& color)
{
if (point.x() < 0 || point.x() >= width ||
point.y() < 0 || point.y() >= height) return;
// Subtract 1 to ensure index stays within [0, size-1]
auto idx = (height - 1 - point.y()) * width + point.x();
frame_buf[idx] = color;
}
Note that the viewport transformation also maps the Z-buffer depth from the projected range to the normalized device coordinates (NDC) range for depth testing.