Theoretical Framework of the DUET Algorithm
The Degenerate Unmixing Estimation Technique (DUET) is a popular method for Blind Source Separation (BSS) when the number of sources exceeds the number of sensors (the underdetermined case). It relies on the assumption of W-disjoint orthogonality, which posits that the time-frequency representations of multiple sources do not overlap significantly. In a two-channel setup, the mixing model can be expressed as:
x1(t) = Σ s_i(t)
x2(t) = Σ a_i * s_i(t - d_i)
Where a_i represents the relative attenuation and d_i represents the relative time delay of the i-th source. By transforming the signals into the Short-Time Fourier Transform (STFT) domain, we can estimate these parameters at each time-frequency point.
Step-by-Step MATLAB Implementation
1. Signal Transformation and Preprocessing
First, we convert the time-domain mixture into the time-frequency domain using STFT. We utilize a Hamming window and define specific FFT lengths to balance frequency resolution and temporal localization.
% Load stereo mixture
[input_sig, sampling_rate] = audioread('mixture_signal.wav');
ch1 = input_sig(:, 1);
ch2 = input_sig(:, 2);
% STFT Configuration
w_size = 1024;
hop_size = 512;
n_fft = 2048;
[S1, F_axis, T_axis] = stft(ch1, sampling_rate, 'Window', hamming(w_size), 'OverlapLength', w_size - hop_size, 'FFTLength', n_fft);
[S2, ~, ~] = stft(ch2, sampling_rate, 'Window', hamming(w_size), 'OverlapLength', w_size - hop_size, 'FFTLength', n_fft);
2. Estimating Attenuation and Delay Parameters
At each time-frequency point (t, f), we calculate the complex ratio of the two channels. The magnitude of this ratio provides the attenuation, while the phase provides the delay informasion.
% Compute the complex ratio map
tf_ratio = S2 ./ (S1 + eps);
% Relative attenuation (alpha)
attenuation_map = abs(tf_ratio);
% Relative delay (delta)
% Frequency values are reshaped for vectorized calculation
freq_mat = repmat(F_axis, 1, size(S1, 2));
delay_map = -angle(tf_ratio) ./ (2 * pi * freq_mat + eps);
3. Two-Dimensional Histogarm and Cluster Identification
By plotting a 2D histogram of the attenuation and delay values, we can identify clusters that correspond to individual sources. Each peak in the histogram represents a source's spatial signature.
% Define histogram bins
alpha_bins = linspace(0, 2, 100);
delta_bins = linspace(-2e-3, 2e-3, 100);
% Generate 2D weighted histogram
[counts, x_edges, y_edges] = histcounts2(attenuation_map(:), delay_map(:), alpha_bins, delta_bins);
% Smooth the histogram to improve peak detection
smoothed_hist = imgaussfilt(counts, 1.2);
% Identify peaks (coordinates of source signatures)
source_peaks = imregionalmax(smoothed_hist);
[peak_indices_x, peak_indices_y] = find(source_peaks);
4. Masking and Signal Reconstruction
Once the source centers are identified, we create binary or soft masks to isolate the time-frequency components of each source. These masks are then applied to the original STFT of the mixture, followed by an Inverse STFT (ISTFT) to recover the time-domain signals.
num_sources = length(peak_indices_x);
recovered_signals = zeros(length(ch1), num_sources);
for k = 1:num_sources
% Calculate distance from the k-th cluster center
target_alpha = x_edges(peak_indices_x(k));
target_delta = y_edges(peak_indices_y(k));
% Masking logic based on proximity
dist_map = abs(attenuation_map - target_alpha) + abs(delay_map - target_delta) * 1000;
source_mask = dist_map < 0.2; % Threshold for selection
% Extract and Inverse Transform
source_stft = S1 .* source_mask;
recovered_signals(:, k) = istft(source_stft, 'Window', hamming(w_size), 'OverlapLength', w_size - hop_size, 'FFTLength', n_fft);
end
Optimization Strategies
To improve the performance of the DUET algorithm, several refinements can be implemented:
- Weighting the Histogram: Instead of simple counts, weight the histogram by the energy of the time-frequency points (
|S1| * |S2|) to prioritize high-energy components. - Phase Unwrapping: In high-frequency regions, the phase may wrap around, leading to aliasing in delay estimation. Using phase unwrapping techniques or focusing on lower frequency bins can mitigate this.
- Clustering Algorithms: Instead of simple peak finding, using K-means or Gaussian Mixture Models (GMM) on the
(alpha, delta)pairs can yield more robust source localization.
Comprehensive Implementation Function
function out_sources = perform_duet(y1, y2, fs)
% Parameters
n_win = 512;
n_fft = 1024;
% Spectral analysis
[X1, F, T] = stft(y1, fs, 'Window', hann(n_win), 'FFTLength', n_fft);
[X2, ~, ~] = stft(y2, fs, 'Window', hann(n_win), 'FFTLength', n_fft);
% Parametric map extraction
H_ratio = X2 ./ (X1 + eps);
alpha_map = abs(H_ratio);
f_grid = repmat(F, 1, width(X1));
delta_map = -angle(H_ratio) ./ (2 * pi * f_grid + eps);
% Clustering via Energy-Weighted Histogram
weight = abs(X1) .* abs(X2);
% [Logic for weighted histogram and peak detection goes here...]
% Synthesis
% [Logic for ISTFT reconstruction goes here...]
out_sources = []; % Placeholder for actual signal matrix
end