BRAM Overview
BRAM, short for Block RAM, is a dedicated on-chip memory resource embedded in Xilinx FPGAs, optimized for high-throughput data and instruction storage. Unlike distributed RAM constructed from lookup tables (LUTs), BRAM delivers larger storage capacity with minimal logic resource overhead, making it suitable for implementing a wide range of memory functions in digital designs.
Block Memory Generator (BMG) IP Core Features
Xilinx's BMG IP core abstracts underlying BRAM primitives, enabling users to configure custom memory implementations without manual primitive instantiation. Supported memory types include single-port RAM, simple dual-port RAM, true dual-port RAM, single-port ROM, and dual-port ROM. For dual-port configurations, both ports operate independently, with configurable clock domains, operating modes, optional output registers, and I/O pins (simple dual-port RAM does not support per-port operating mode selection).
Key BMG IP core features include:
- Memory packing algorithms: The IP maps user-defined memory specifications to BRAM primitives using one of three optimization strategies:
- Minimum area algorithm: Uses the smallest posible number of BRAM primitives to reduce overall memory resource utilization.
- Low power algorithm: Activates the minimal set of BRAM primitives during read/write operations to cut dynamic power consumption.
- Fixed primitive algorithm: Uses only a single type of BRAM primitive for the entire memory implementation to ensure predictable timing behavior.
- Per-port operating modes: Each port can be configured independently for WRITE FIRST, READ FIRST, or NO CHANGE operation modes.
- Flexible port aspect ratios: The data width of port A can differ from port B by a factor of 1, 2, 4, 8, 16, or 32, supporting cross-width data access requirements.
- Byte write enable: Supports byte-granular write operations, applicable when memory width is a multiple of 8 bits (no parity) or 9 bits (with per-byte parity).
- Optional pipeline stages: Supports optional register stages at the BRAM primitive output and core output to improve timing performance for high-frequency designs, available only when output registers are enabled for the selected memory configuration.
BMG Port Operating Modes
WRITE FIRST Mode
In WRITE FIRST mode, input data is written to the target address and simultaneously driven to the data output port during write operations. For write operations (write enable WEA high), the newly written data appears on DOUTA at the next rising clock edge. For read operations (WEA low), the data stored at the target address is output on DOUTA after one clock cycle.
READ FIRST Mode
In READ FIRST mode, the pre-existing data stored at the write address is output to the data port first, while new input data is committed to memory. When WEA is high, the original data at the target address is output on DOUTA at the next clock cycle, while the new data is written to storage. Read operations follow the same timing as WRITE FIRST mode, with data available after one clock cycle.
NO CHANGE Mode
In NO CHANGE mode, the output register retains its previous value during write operations, and is not affected by write operations on the same port. When WEA is high, input data is written to the target address, but DOUT remains unchanged, holding the result of the most recent read operation. Read operations return the target address data after one clock cycle, consistent with other operating modes.
Asymmetric Port Width Configuration
The BMG IP supports port aspect ratios ranging from 1:32 to 32:1, meaning port A data width can be up to 32 times larger than port B, and vice versa. For example, a true dual-port RAM configured as 32-bit width × 2048 depth on port A will appear as 8-bit width × 8192 depth on port B. In this configuration, port A's address bus addra is 11 bits wide, while port B's address bus addrb is 13 bits wide. The 32-bit data word at port A address A0 is split into four 8-bit segments, corresponding to port B addresses B0 (lowest 8 bits), B1, B2, and B3 (highest 8 bits) respectively.
Byte Write Functionality
Byte write support allows granular write control over individual bytes of a wider memory word. When configured for 8-bit byte size, no parity bits are included, and total memory width must be a multiple of 8. For 9-bit byte size, each byte includes an additional parity bit, and total memory width must be a multiple of 9. The byte write enable signal uses one bit per byte: for example, if WEA[2:0] is set to 011, only the lower two bytes of DINA (bits [23:0] for a 32-bit wide memory) are written to the target address, while higher bytes remain unmodified.
Optional Output Registers
Optional output registers improve the core's maximum operating frequency by reducing clock-to-output delay. Two independent register stages are available for configuration per port: one at the output of the underlying BRAM primitive, and a second at the top-level core output. The BRAM primitive output register mitigates delay from the primitive's internal logic, while the core output register isolates delay from the output routing multiplexer. Each enabled register stage adds one additional clock cycle of latency to read operations. For configurations with no output registers, read data is available one clock cycle after the address is asserted. With one register stage enabled, read latency increases to two clock cycles, and with both stages enabled, total read latency is three clock cycles.
BMG IP Implementation and Simulation
Single-Port ROM Simulation
Single-port ROM supports read access to the entire memory space via a single port. The ROM is pre-initialized with a COE file containing sequential values from 1 to 256.
`timescale 1ns / 1ps
module sp_rom_test(
input logic rd_clk,
input logic rst_n
);
logic [7:0] rom_rd_addr;
logic [8:0] rom_data_out;
always_ff @(posedge rd_clk or negedge rst_n) begin
if(!rst_n) begin
rom_rd_addr <= 8'h0;
end else begin
rom_rd_addr <= rom_rd_addr + 8'h1;
end
end
sp_rom_ip sp_rom_inst (
.clka(rd_clk),
.addra(rom_rd_addr),
.douta(rom_data_out)
);
endmodule
Simulation results show that the address is sampled on the rising edge of rd_clk, and the corresponding data is output on rom_data_out at the next rising clock edge.
Dual-Port ROM Simulation
Dual-port ROM supports independent read access to the memory space via two separate ports, which can operate in asynchronous clock domains.
`timescale 1ns / 1ps
module dp_rom_test(
input logic porta_clk,
input logic portb_clk,
input logic rst_n
);
logic [7:0] porta_rd_addr;
logic [8:0] porta_data_out;
logic [7:0] portb_rd_addr;
logic [8:0] portb_data_out;
always_ff @(posedge porta_clk or negedge rst_n) begin
if(!rst_n) begin
porta_rd_addr <= 8'h0;
end else begin
porta_rd_addr <= porta_rd_addr + 8'h1;
end
end
always_ff @(posedge portb_clk or negedge rst_n) begin
if(!rst_n) begin
portb_rd_addr <= 8'h0;
end else begin
portb_rd_addr <= portb_rd_addr + 8'h1;
end
end
dp_rom_ip dp_rom_inst (
.clka(porta_clk),
.addra(porta_rd_addr),
.douta(porta_data_out),
.clkb(portb_clk),
.addrb(portb_rd_addr),
.doutb(portb_data_out)
);
endmodule
Each port's read operation follows the same timing as the single-port ROM, with data available one clock cycle after address assertion, independent of the other port's operation.
Single-Port RAM Simulation
Single-port RAM supports both read and write access via a single port, with write enable signal wea high indicating write operations, and low indicating read operations.
`timescale 1ns / 1ps
module sp_ram_test(
input logic sys_clk,
input logic rst_n
);
logic [7:0] ram_addr;
logic [8:0] wr_data;
logic [8:0] rd_data;
logic wr_en;
always_ff @(posedge sys_clk or negedge rst_n) begin
if(!rst_n) begin
ram_addr <= 8'h0;
wr_en <= 1'b1;
wr_data <= 9'h0;
end else if(ram_addr == 8'd200) begin
ram_addr <= 8'h0;
wr_en <= 1'b0;
end else begin
ram_addr <= ram_addr + 8'h1;
wr_data <= wr_data + 9'h1;
end
end
sp_ram_ip sp_ram_inst (
.clka(sys_clk),
.wea(wr_en),
.addra(ram_addr),
.dina(wr_data),
.douta(rd_data)
);
endmodule
When wr_en is high, the testbench writes sequential values from 0 to 200 to addresses 0 to 200. After the write sequence completes, wr_en is pulled low to initiate read operations, with stored data appearing on rd_data one clock cycle after address assertion.
Simple Dual-Port RAM Simulation
Simple dual-port RAM separates read and write operations to dedicated ports: port A supports only write operations, while port B supports only read operations.
`timescale 1ns / 1ps
module sdp_ram_test(
input logic wr_clk,
input logic rd_clk,
input logic rst_n
);
logic [7:0] wr_port_addr;
logic [8:0] wr_data;
logic wr_en;
logic [7:0] rd_port_addr;
logic [8:0] rd_data;
always_ff @(posedge wr_clk or negedge rst_n) begin
if(!rst_n) begin
wr_port_addr <= 8'h0;
wr_en <= 1'b1;
wr_data <= 9'h0;
rd_port_addr <= 8'h0;
end else if(wr_port_addr == 8'd200) begin
wr_port_addr <= 8'h0;
rd_port_addr <= 8'h0;
wr_en <= 1'b0;
end else begin
wr_port_addr <= wr_port_addr + 8'h1;
rd_port_addr <= rd_port_addr + 8'h1;
wr_data <= wr_data + 9'h1;
end
end
sdp_ram_ip sdp_ram_inst (
.clka(wr_clk),
.wea(wr_en),
.addra(wr_port_addr),
.dina(wr_data),
.clkb(rd_clk),
.addrb(rd_port_addr),
.doutb(rd_data)
);
endmodule
Port A performs write operations first, writing values 0 to 200 to addresses 0 to 200. Once the write sequence completes, wr_en is pulled low, and port B reads the stored data, with read data available one clock cycle after the read address is asserted on port B.
True Dual-Port RAM Simulation
True dual-port RAM supports independent read and write operations on both ports, which can operate in asynchronous clock domains.
`timescale 1ns / 1ps
module tdp_ram_test(
input logic porta_clk,
input logic portb_clk,
input logic rst_n
);
logic [7:0] porta_addr;
logic [8:0] porta_wr_data;
logic porta_wr_en;
logic [8:0] porta_rd_data;
logic [7:0] portb_addr;
logic [8:0] portb_wr_data;
logic portb_wr_en;
logic [8:0] portb_rd_data;
always_ff @(posedge porta_clk or negedge rst_n) begin
if(!rst_n) begin
porta_addr <= 8'h0;
porta_wr_en <= 1'b1;
porta_wr_data <= 9'h0;
end else if(porta_addr == 8'd200) begin
porta_addr <= 8'h0;
porta_wr_en <= 1'b0;
end else begin
porta_addr <= porta_addr + 8'h1;
porta_wr_data <= porta_wr_data + 9'h1;
end
end
always_ff @(posedge portb_clk or negedge rst_n) begin
if(!rst_n) begin
portb_addr <= 8'h0;
portb_wr_en <= 1'b1;
portb_wr_data <= 9'h0;
end else if(portb_addr == 8'd200) begin
portb_addr <= 8'h0;
portb_wr_en <= 1'b0;
end else begin
portb_addr <= portb_addr + 8'h1;
portb_wr_data <= portb_wr_data + 9'h1;
end
end
tdp_ram_ip tdp_ram_inst (
.clka(porta_clk),
.wea(porta_wr_en),
.addra(porta_addr),
.dina(porta_wr_data),
.douta(porta_rd_data),
.clkb(portb_clk),
.web(portb_wr_en),
.addrb(portb_addr),
.dinb(portb_wr_data),
.doutb(portb_rd_data)
);
endmodule
Each port follows the same timing as single-port RAM, independent of the other port's operation. Note that simultaneous write operations to the same address from both ports will result in undefined data, so such conditions should be avoided in design.