How does DL beamforming compare to classical methods in terms of performance?

DL achieves **near-optimal** spectral efficiency (within 1-2% of exhaustive search) while being **10-100× faster**. At high mobility (>100 km/h), DL outperforms classical methods by 15-30%.

Can one trained model work across different channel environments?

**Transfer learning** helps: Pre-train on simulated channels, fine-tune on small amount of real data. Alternatively, use **meta-learning** for fast adaptation to new environments with few samples.

What about computational complexity during inference?

FNN-based beamforming: **40K-1M MACs** (<1ms on GPU, 5-10ms on CPU). Suitable for BS-side processing. For UE-side, use lighter models or offload to BS.

How to handle imperfect CSI?

DL models trained on noisy/quantized CSI generalize well. Additionally, use **robust training**: Add noise augmentation during training to improve robustness.

Does this work for FDD systems where CSI feedback is limited?

Yes! DL can learn **CSI compression** and **feedback quantization** jointly with beamforming. Recent work shows 10-20× feedback overhead reduction.

How to ensure the model generalizes to unseen scenarios?

1. **Diverse training**: Include various SNRs, user distributions, mobility profiles 2. **Regularization**: Dropout, weight decay, data augmentation 3. **Physics-informed constraints**: Incorporate known wireless propagation laws 4. **Continual learning**: Update model online with new data

What hardware is needed for deployment?

**Training**: 1-4 GPUs (V100/A100) for 1-7 days depending on model size. **Inference**: GPU (T4/A10) for low-latency (<1ms), or CPU for relaxed latency (<10ms). Mobile edge servers typically sufficient.

AI-Based Beamforming for mmWave and THz Systems: From Classical to Neural Approaches | Enrico Piovano

Q: How to handle imperfect CSI?

DL models trained on noisy/quantized CSI generalize well. Additionally, use **robust training**: Add noise augmentation during training to improve robustness.

Q: Does this work for FDD systems where CSI feedback is limited?

Yes! DL can learn **CSI compression** and **feedback quantization** jointly with beamforming. Recent work shows 10-20× feedback overhead reduction.

Q: How to ensure the model generalizes to unseen scenarios?

1. **Diverse training**: Include various SNRs, user distributions, mobility profiles 2. **Regularization**: Dropout, weight decay, data augmentation 3. **Physics-informed constraints**: Incorporate known wireless propagation laws 4. **Continual learning**: Update model online with new data

Q: What hardware is needed for deployment?

**Training**: 1-4 GPUs (V100/A100) for 1-7 days depending on model size. **Inference**: GPU (T4/A10) for low-latency (<1ms), or CPU for relaxed latency (<10ms). Mobile edge servers typically sufficient.

Introduction

Millimeter wave (mmWave, 24-100 GHz) and terahertz (THz, 0.1-10 THz) frequencies are essential for 5G/6G to deliver multi-gigabit data rates. However, these high frequencies suffer from severe path loss—free-space path loss increases with the square of frequency:

$L_{\text{free-space}} = \left(\frac{4\pi d f}{c}\right)^2$

where $d$ is distance, $f$ is frequency, and $c$ is speed of light. At 100 GHz, path loss is 36 dB worse than at 3 GHz for the same distance. Additionally, mmWave/THz signals are absorbed by oxygen (60 GHz), water vapor (183 GHz, 325 GHz), and blocked by walls and even human bodies.

The solution: Beamforming. By focusing transmitted power into narrow beams toward intended receivers, we can:

Overcome path loss (20-30 dB beamforming gain)
Increase SNR and extend range
Reduce interference to other users
Enable spatial multiplexing (serving multiple users simultaneously)

Traditional beamforming methods—zero-forcing (ZF), minimum mean square error (MMSE), maximum ratio transmission (MRT)—face critical challenges at mmWave/THz:

Computational complexity: $O(N^3)$ matrix inversions infeasible for 256-1024 antenna arrays
Hybrid architectures: Limited RF chains require joint analog/digital optimization
Near-field effects: At THz with ultra-massive MIMO (UM-MIMO), far-field approximation breaks down
High mobility: 500+ km/h vehicles at 6G frequencies require ultra-fast beam tracking
RIS integration: Reconfigurable Intelligent Surfaces add another optimization dimension

Deep learning has emerged as a transformative solution. Recent 2025 research demonstrates:

10-20× speedup in hybrid precoder design vs. iterative algorithms
Near-optimal performance matching model-based methods
Robustness to imperfect CSI and hardware impairments
Adaptive beam tracking with 12.2% RMSE reduction at 500 km/h

This post provides a complete technical treatment: classical beamforming fundamentals, hybrid analog-digital architectures, state-of-the-art deep learning approaches (supervised, unsupervised, reinforcement learning), RIS-aided beamforming, near-field considerations for THz UM-MIMO, and production deployment strategies.

Prerequisites: Linear algebra (matrix operations, SVD), wireless communication (MIMO, OFDM, channel models), deep learning basics.

Key Papers:

Part I: Classical Beamforming Foundations

Digital Beamforming: The Ideal Case

In fully-digital beamforming, each antenna has a dedicated RF chain (mixer, ADC/DAC), enabling arbitrary complex weights.

System Model (downlink):

$\mathbf{y} = \mathbf{H} \mathbf{W} \mathbf{s} + \mathbf{n}$

Where:

$\mathbf{y} \in \mathbb{C}^{K \times 1}$ : received signal ( $K$ users)
$\mathbf{H} \in \mathbb{C}^{K \times N_t}$ : channel matrix
$\mathbf{W} \in \mathbb{C}^{N_t \times K}$ : digital precoder (beamforming matrix)
$\mathbf{s} \in \mathbb{C}^{K \times 1}$ : data symbols
$\mathbf{n} \sim \mathcal{CN}(0, \sigma^2 \mathbf{I})$ : noise

Beamforming Goal: Design $\mathbf{W}$ to maximize sum rate subject to power constraint:

$\max_{\mathbf{W}} \sum_{k=1}^K \log_2\left(1 + \frac{|\mathbf{h}_k^H \mathbf{w}_k|^2}{\sum_{j \neq k} |\mathbf{h}_k^H \mathbf{w}_j|^2 + \sigma^2}\right)$

$\text{s.t.} \quad \|\mathbf{W}\|_F^2 \leq P_{\text{total}}$

Classical Solutions:

1. Zero-Forcing (ZF):

Eliminates inter-user interference:

$\mathbf{W}_{\text{ZF}} = \mathbf{H}^H (\mathbf{H} \mathbf{H}^H)^{-1} \mathbf{D}$

where $\mathbf{D}$ is diagonal power allocation matrix.

Pros: Simple, zero interference
Cons: Noise amplification, requires $N_t \geq K$ , $O(K^3)$ complexity

2. Maximum Ratio Transmission (MRT):

Maximizes received signal power:

$\mathbf{W}_{\text{MRT}} = \mathbf{H}^H \mathbf{D}$

Pros: Simplest, optimal for single user
Cons: High interference for multi-user

3. MMSE (Regularized ZF):

Balances signal power and interference:

$\mathbf{W}_{\text{MMSE}} = \mathbf{H}^H (\mathbf{H} \mathbf{H}^H + \alpha \mathbf{I})^{-1} \mathbf{D}$

where $\alpha = \sigma^2 / P$ is regularization parameter.

Pros: Better SNR vs. ZF
Cons: Still $O(K^3)$ inversion

The mmWave Challenge: Why Hybrid Beamforming?

At mmWave frequencies with 64-256 antennas, fully-digital beamforming is impractical:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│              DIGITAL vs HYBRID BEAMFORMING                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  FULLY DIGITAL (Ideal but Impractical at mmWave):                      │
│  ────────────────────────────────────────────────                        │
│                                                                          │
│  ┌────┐   ┌─────┐   ┌─────┐   ┌──────┐                                │
│  │ BB │──▶│ DAC │──▶│ Mix │──▶│  PA  │──▶ Antenna 1                   │
│  └────┘   └─────┘   └─────┘   └──────┘                                │
│  ┌────┐   ┌─────┐   ┌─────┐   ┌──────┐                                │
│  │ BB │──▶│ DAC │──▶│ Mix │──▶│  PA  │──▶ Antenna 2                   │
│  └────┘   └─────┘   └─────┘   └──────┘                                │
│    ⋮         ⋮         ⋮         ⋮                                      │
│  ┌────┐   ┌─────┐   ┌─────┐   ┌──────┐                                │
│  │ BB │──▶│ DAC │──▶│ Mix │──▶│  PA  │──▶ Antenna N_t                 │
│  └────┘   └─────┘   └─────┘   └──────┘                                │
│                                                                          │
│  • One RF chain per antenna                                             │
│  • Full control, arbitrary beamforming                                  │
│  • Cost: $10K-50K per RF chain at mmWave                               │
│  • Power: 1-5W per chain                                                │
│  • For 256 antennas: $2.5M+, 1.3kW!                                    │
│                                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                          │
│  HYBRID (Practical Solution):                                           │
│  ──────────────────────────                                              │
│                                                                          │
│                      ┌─────────────────────┐                            │
│                      │ Analog Beamformer   │                            │
│                      │ (Phase Shifters)    │                            │
│                      │   N_t × N_RF       │                            │
│                      └──────────┬──────────┘                            │
│                                 │                                        │
│                     ┌───────────┼───────────┐                           │
│                     │           │           │                           │
│                     ▼           ▼           ▼                           │
│                  Ant 1 ... Ant N_t/N_RF  ...                           │
│                                                                          │
│  ┌────────────┐   ┌─────┐   ┌─────┐   ┌──────┐                        │
│  │  Digital   │──▶│ DAC │──▶│ Mix │──▶│ Analog│──▶ Antennas 1-64      │
│  │ Precoder   │   └─────┘   └─────┘   │Precoder│                      │
│  │            │   ┌─────┐   ┌─────┐   │       │                        │
│  │ (Baseband) │──▶│ DAC │──▶│ Mix │──▶│  (RF) │──▶ Antennas 65-128    │
│  │            │   └─────┘   └─────┘   │       │                        │
│  │   F_BB     │     ⋮         ⋮       │ F_RF  │         ⋮              │
│  │ N_RF × K   │   ┌─────┐   ┌─────┐   │       │                        │
│  │            │──▶│ DAC │──▶│ Mix │──▶│       │──▶ Antennas 193-256    │
│  └────────────┘   └─────┘   └─────┘   └──────┘                        │
│                                                                          │
│  • N_RF << N_t RF chains (e.g., 4-16 chains for 256 antennas)         │
│  • Analog beamformer: phase shifters (constant magnitude)              │
│  • Digital precoder: full flexibility within N_RF streams              │
│  • Cost: 10-20× lower than fully digital                               │
│  • Power: 5-10× lower                                                   │
│                                                                          │
│  EFFECTIVE PRECODER:                                                     │
│  W = F_RF × F_BB                                                        │
│                                                                          │
│  Challenge: Joint optimization of F_RF and F_BB with constraint        │
│  that F_RF has constant-magnitude entries (phase-only control)          │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Hybrid Beamforming Problem Formulation

Objective: Approximate optimal digital precoder $\mathbf{W}_{\text{opt}}$ with hybrid structure:

$\min_{\mathbf{F}_{\text{RF}}, \mathbf{F}_{\text{BB}}} \|\mathbf{W}_{\text{opt}} - \mathbf{F}_{\text{RF}} \mathbf{F}_{\text{BB}}\|_F^2$

Constraints:

$|\mathbf{F}_{\text{RF}}(i,j)| = 1/\sqrt{N_t}$ (constant magnitude for phase shifters)
$\|\mathbf{F}_{\text{RF}} \mathbf{F}_{\text{BB}}\|_F^2 \leq P_{\text{total}}$ (power constraint)

This is a non-convex, NP-hard problem due to constant-magnitude constraint.

Classical Solution: Orthogonal Matching Pursuit (OMP):

Iteratively select analog beamforming vectors:

Python

def omp_hybrid_precoding(H, K, N_RF, N_t):
    """OMP algorithm for hybrid precoding"""
    # Step 1: Compute optimal unconstrained precoder
    U, S, Vh = np.linalg.svd(H)
    W_opt = Vh[:K, :].T.conj()  # First K right singular vectors

    # Step 2: Initialize
    F_RF = []
    residual = W_opt

    # Step 3: Iteratively select RF beamformers
    for _ in range(N_RF):
        # Find best matching steering vector
        A = get_array_response_matrix(N_t)  # DFT codebook
        correlations = A.T.conj() @ residual
        best_idx = np.argmax(np.abs(correlations).sum(axis=1))
        best_vector = A[:, best_idx]

        F_RF.append(best_vector)
        residual = W_opt - np.array(F_RF).T @ np.linalg.pinv(np.array(F_RF).T) @ W_opt

    F_RF = np.array(F_RF).T  # (N_t × N_RF)

    # Step 4: Compute digital precoder
    F_BB = np.linalg.pinv(F_RF) @ W_opt  # (N_RF × K)

    # Step 5: Normalize
    F_BB = F_BB * np.sqrt(P_total) / np.linalg.norm(F_RF @ F_BB, 'fro')

    return F_RF, F_BB

Limitations:

Iterative, slow ( $O(N_t^2 N_{\text{RF}} K)$ per iteration)
Sensitive to codebook design
Suboptimal due to greedy selection

Part II: Deep Learning for Hybrid Beamforming

Architecture 1: Supervised Learning

Idea: Learn the mapping $\mathbf{H} \rightarrow \{\mathbf{F}_{\text{RF}}, \mathbf{F}_{\text{BB}}\}$ from labeled data.

Two-Stage Deep Learning Approach (2022-2025 state-of-the-art):

Code

┌─────────────────────────────────────────────────────────────────────────┐
│              TWO-STAGE DL HYBRID PRECODING                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  STAGE 1: Analog Precoder Design (Classification)                      │
│  ───────────────────────────────────────────────                        │
│                                                                          │
│  Input: Channel H ∈ ℂ^(K × N_t)                                        │
│         │                                                                │
│         ▼                                                                │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │  Feature Extraction (CNN/MLP)                                │       │
│  │  • Convert H to [Re(H), Im(H)]                              │       │
│  │  • Extract spatial patterns                                  │       │
│  └──────────────────┬───────────────────────────────────────────┘       │
│                     │                                                    │
│                     ▼                                                    │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │  Beam Selection (Softmax Classifier)                        │       │
│  │  • Select N_RF beams from codebook of size N_c              │       │
│  │  • Output: N_RF indices from codebook                       │       │
│  └──────────────────┬───────────────────────────────────────────┘       │
│                     │                                                    │
│                     ▼                                                    │
│  F_RF = [a(θ₁), a(θ₂), ..., a(θ_{N_RF})]                              │
│         where a(θ) = steering vector at angle θ                        │
│                                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                          │
│  STAGE 2: Digital Precoder Design (Regression)                         │
│  ───────────────────────────────────────────                            │
│                                                                          │
│  Input: H, F_RF                                                         │
│         │                                                                │
│         ▼                                                                │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │  Equivalent Channel                                          │       │
│  │  H_eff = H × F_RF  ∈ ℂ^(K × N_RF)                          │       │
│  └──────────────────┬───────────────────────────────────────────┘       │
│                     │                                                    │
│                     ▼                                                    │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │  Neural Network (MLP)                                        │       │
│  │  • Input: [Re(H_eff), Im(H_eff)] flattened                 │       │
│  │  • Hidden layers with ReLU                                   │       │
│  │  • Output: [Re(F_BB), Im(F_BB)] flattened                  │       │
│  └──────────────────┬───────────────────────────────────────────┘       │
│                     │                                                    │
│                     ▼                                                    │
│  F_BB ∈ ℂ^(N_RF × K)                                                   │
│                                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                          │
│  Final Precoder: W = F_RF × F_BB                                       │
│                                                                          │
│  ADVANTAGES:                                                             │
│  • 10-20× faster than iterative OMP                                    │
│  • Near-optimal spectral efficiency                                     │
│  • Handles imperfect CSI gracefully                                     │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

PyTorch Implementation:

Python

import torch
import torch.nn as nn

class AnalogPrecoder

Net(nn.Module):
    """Stage 1: Learn to select analog beamforming vectors"""
    def __init__(self, K, N_t, N_RF, codebook_size):
        super().__init__()
        input_dim = 2 * K * N_t  # Real and imag parts of H

        self.network = nn.Sequential(
            nn.Linear(input_dim, 512),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(256, codebook_size)
        )

        # Select top N_RF beams
        self.N_RF = N_RF

    def forward(self, H):
        """
        Args:
            H: Channel matrix (batch, K, N_t, 2)  [2 = real, imag]
        Returns:
            beam_indices: Selected beam indices (batch, N_RF)
        """
        batch_size = H.shape[0]
        H_flat = H.view(batch_size, -1)

        # Predict beam scores
        scores = self.network(H_flat)  # (batch, codebook_size)

        # Select top N_RF beams
        _, beam_indices = torch.topk(scores, self.N_RF, dim=1)

        return beam_indices

class DigitalPrecoderNet(nn.Module):
    """Stage 2: Learn digital precoder given analog precoder"""
    def __init__(self, K, N_RF):
        super().__init__()
        input_dim = 2 * K * N_RF  # Real and imag of H_eff
        output_dim = 2 * N_RF * K  # Real and imag of F_BB

        self.network = nn.Sequential(
            nn.Linear(input_dim, 256),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, output_dim)
        )

        self.K = K
        self.N_RF = N_RF

    def forward(self, H_eff):
        """
        Args:
            H_eff: Effective channel H * F_RF (batch, K, N_RF, 2)
        Returns:
            F_BB: Digital precoder (batch, N_RF, K, 2)
        """
        batch_size = H_eff.shape[0]
        H_eff_flat = H_eff.view(batch_size, -1)

        # Predict F_BB
        F_BB_flat = self.network(H_eff_flat)
        F_BB = F_BB_flat.view(batch_size, self.N_RF, self.K, 2)

        return F_BB

class HybridPrecodingNetwork(nn.Module):
    """Complete two-stage hybrid precoding network"""
    def __init__(self, K, N_t, N_RF, codebook_size, codebook):
        super().__init__()
        self.analog_net = AnalogPrecoderNet(K, N_t, N_RF, codebook_size)
        self.digital_net = DigitalPrecoderNet(K, N_RF)
        self.codebook = codebook  # (N_t, codebook_size, 2)
        self.N_RF = N_RF

    def forward(self, H):
        """
        Args:
            H: Channel (batch, K, N_t, 2)
        Returns:
            F_RF: Analog precoder (batch, N_t, N_RF, 2)
            F_BB: Digital precoder (batch, N_RF, K, 2)
        """
        batch_size = H.shape[0]

        # Stage 1: Select analog beams
        beam_indices = self.analog_net(H)  # (batch, N_RF)

        # Construct F_RF from codebook
        F_RF = self.codebook[:, beam_indices, :]  # (N_t, batch, N_RF, 2)
        F_RF = F_RF.permute(1, 0, 2, 3)  # (batch, N_t, N_RF, 2)

        # Compute effective channel
        H_complex = torch.complex(H[..., 0], H[..., 1])  # (batch, K, N_t)
        F_RF_complex = torch.complex(F_RF[..., 0], F_RF[..., 1])  # (batch, N_t, N_RF)
        H_eff_complex = torch.matmul(H_complex, F_RF_complex)  # (batch, K, N_RF)
        H_eff = torch.stack([H_eff_complex.real, H_eff_complex.imag], dim=-1)

        # Stage 2: Design digital precoder
        F_BB = self.digital_net(H_eff)

        return F_RF, F_BB

# Loss function: Spectral efficiency
def spectral_efficiency_loss(H, F_RF, F_BB, noise_var):
    """Compute negative sum rate (to minimize)"""
    batch_size = H.shape[0]
    K = H.shape[1]

    # Convert to complex
    H_c = torch.complex(H[..., 0], H[..., 1])
    F_RF_c = torch.complex(F_RF[..., 0], F_RF[..., 1])
    F_BB_c = torch.complex(F_BB[..., 0], F_BB[..., 1])

    # Effective precoder
    W = torch.matmul(F_RF_c, F_BB_c)  # (batch, N_t, K)

    # Received signal: y_k = h_k^H w_k s_k + interference + noise
    # SINR for user k
    sum_rate = 0
    for k in range(K):
        h_k = H_c[:, k, :]  # (batch, N_t)
        w_k = W[:, :, k]    # (batch, N_t)

        # Signal power
        signal = torch.abs(torch.sum(h_k.conj() * w_k, dim=1)) ** 2  # (batch,)

        # Interference power
        interference = 0
        for j in range(K):
            if j != k:
                w_j = W[:, :, j]
                interference += torch.abs(torch.sum(h_k.conj() * w_j, dim=1)) ** 2

        # SINR
        sinr = signal / (interference + noise_var)

        # Rate (bits/s/Hz)
        rate = torch.log2(1 + sinr)
        sum_rate += rate

    # Return negative (for minimization)
    return -torch.mean(sum_rate)

Training:

Python

# Generate training data
def generate_channel(batch_size, K, N_t):
    """Generate Rayleigh fading channels"""
    h_real = torch.randn(batch_size, K, N_t)
    h_imag = torch.randn(batch_size, K, N_t)
    H = torch.stack([h_real, h_imag], dim=-1) / np.sqrt(2)
    return H

# Training loop
model = HybridPrecodingNetwork(K=16, N_t=64, N_RF=4, codebook_size=128, codebook=dft_codebook)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

for epoch in range(num_epochs):
    H_batch = generate_channel(batch_size=32, K=16, N_t=64)

    F_RF, F_BB = model(H_batch)
    loss = spectral_efficiency_loss(H_batch, F_RF, F_BB, noise_var=0.01)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    print(f"Epoch {epoch}, Loss: {-loss.item():.3f} bits/s/Hz")

Architecture 2: Unsupervised Learning for Near-Field Beamforming

Challenge: Labeled data (optimal $\mathbf{F}_{\text{RF}}, \mathbf{F}_{\text{BB}}$ ) expensive to obtain. For THz UM-MIMO, near-field effects make classical methods inaccurate.

Solution: Unsupervised learning directly optimizes spectral efficiency without labels.

Architecture (2024 breakthrough for extremely large-scale MIMO):

Python

class UnsupervisedBeamformingNet(nn.Module):
    """Unsupervised DL for near-field beamforming

    Based on: 'Near-field Beamforming for Extremely Large-scale MIMO
    Based on Unsupervised Deep Learning' (2024)
    """
    def __init__(self, N_t, K):
        super().__init__()

        # Encoder: H → latent representation
        self.encoder = nn.Sequential(
            nn.Linear(2 * K * N_t, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU()
        )

        # Decoder: latent → beamforming weights
        self.decoder = nn.Sequential(
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 2 * N_t * K),
            nn.Tanh()  # Bounded output
        )

        self.N_t = N_t
        self.K = K

    def forward(self, H):
        """
        Args:
            H: Near-field channel (batch, K, N_t, 2)
        Returns:
            W: Beamforming matrix (batch, N_t, K, 2)
        """
        batch_size = H.shape[0]

        # Encode
        H_flat = H.view(batch_size, -1)
        latent = self.encoder(H_flat)

        # Decode
        W_flat = self.decoder(latent)
        W = W_flat.view(batch_size, self.N_t, self.K, 2)

        # Power normalization
        W = self.power_normalize(W)

        return W

    def power_normalize(self, W, P_total=1.0):
        """Enforce power constraint"""
        W_complex = torch.complex(W[..., 0], W[..., 1])
        power = torch.sum(torch.abs(W_complex) ** 2, dim=[1, 2], keepdim=True)
        W_complex = W_complex * torch.sqrt(P_total / power)
        return torch.stack([W_complex.real, W_complex.imag], dim=-1)

# Training with unsupervised loss (sum rate)
def train_unsupervised(model, num_epochs, batch_size, K, N_t):
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

    for epoch in range(num_epochs):
        # Generate random channels
        H = generate_near_field_channel(batch_size, K, N_t)

        # Forward pass
        W = model(H)

        # Compute sum rate (unsupervised objective)
        sum_rate = compute_sum_rate(H, W, noise_var=0.01)

        # Maximize sum rate = minimize negative sum rate
        loss = -sum_rate

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if epoch % 100 == 0:
            print(f"Epoch {epoch}, Sum Rate: {sum_rate.item():.3f} bits/s/Hz")

def compute_sum_rate(H, W, noise_var):
    """Compute achievable sum rate"""
    H_c = torch.complex(H[..., 0], H[..., 1])  # (batch, K, N_t)
    W_c = torch.complex(W[..., 0], W[..., 1])  # (batch, N_t, K)

    batch_size, K = H_c.shape[0], H_c.shape[1]
    sum_rate = 0

    for k in range(K):
        h_k = H_c[:, k, :]  # (batch, N_t)
        w_k = W_c[:, :, k]  # (batch, N_t)

        # Signal
        signal_power = torch.abs(torch.sum(h_k.conj() * w_k, dim=1)) ** 2

        # Interference
        interference_power = 0
        for j in range(K):
            if j != k:
                w_j = W_c[:, :, j]
                interference_power += torch.abs(torch.sum(h_k.conj() * w_j, dim=1)) ** 2

        # SINR
        sinr = signal_power / (interference_power + noise_var)
        rate = torch.log2(1 + sinr)
        sum_rate += rate

    return torch.mean(sum_rate)

Advantages:

No labeled data needed
Directly optimizes end objective (sum rate)
Handles near-field effects automatically

Part III: RIS-Aided Beamforming

Reconfigurable Intelligent Surfaces (RIS)

RIS are planar surfaces with many passive reflecting elements that can be programmed to shape the electromagnetic environment.

Key Benefits for 6G:

Coverage extension: Overcome blockages, serve non-line-of-sight users
Energy efficiency: Passive elements (no amplification), low power
Beamforming gain: Additional degrees of freedom
Cost-effective: Much cheaper than deploying more base stations

System Model with RIS:

$\mathbf{y} = (\mathbf{H}_{\text{dir}} + \mathbf{H}_2 \boldsymbol{\Theta} \mathbf{H}_1) \mathbf{W} \mathbf{s} + \mathbf{n}$

Where:

$\mathbf{H}_{\text{dir}}$ : Direct BS-to-user channel
$\mathbf{H}_1$ : BS-to-RIS channel ( $M \times N_t$ , $M$ = RIS elements)
$\mathbf{H}_2$ : RIS-to-user channel ( $K \times M$ )
$\boldsymbol{\Theta} = \text{diag}(\beta_1 e^{j\phi_1}, \ldots, \beta_M e^{j\phi_M})$ : RIS phase shifts
$\mathbf{W}$ : BS beamforming matrix

Joint Optimization Problem:

$\max_{\mathbf{W}, \boldsymbol{\Theta}} \sum_{k=1}^K R_k(\mathbf{W}, \boldsymbol{\Theta})$

Subject to:

Power constraint on $\mathbf{W}$
$|\beta_m| \leq 1, \phi_m \in [0, 2\pi)$ for RIS elements

This is extremely non-convex and challenging to solve with classical methods.

Deep Learning for RIS-Aided Beamforming

2025 State-of-the-Art: Deep learning for RIS-aided THz massive MIMO with hybrid-field channels.

Python

class RISAidedBeamformingNet(nn.Module):
    """Joint BS beamforming and RIS phase shift optimization

    Based on: 'Deep Learning–Based Channel Extrapolation and Multiuser
    Beamforming for RIS-aided Terahertz Massive MIMO Systems' (2025)
    """
    def __init__(self, N_t, M, K):
        super().__init__()
        self.N_t = N_t  # BS antennas
        self.M = M      # RIS elements
        self.K = K      # Users

        # Network for BS beamformer
        self.bs_beamformer = nn.Sequential(
            nn.Linear(2 * (K * N_t + K * M + M * N_t), 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Linear(256, 2 * N_t * K)
        )

        # Network for RIS phase shifts
        self.ris_controller = nn.Sequential(
            nn.Linear(2 * (K * M + M * N_t), 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, M)  # Phase shifts (real-valued, will map to [0, 2π))
        )

    def forward(self, H_dir, H1, H2):
        """
        Args:
            H_dir: Direct channel (batch, K, N_t, 2)
            H1: BS-to-RIS (batch, M, N_t, 2)
            H2: RIS-to-user (batch, K, M, 2)
        Returns:
            W: BS beamformer (batch, N_t, K, 2)
            Theta: RIS phase shifts (batch, M)
        """
        batch_size = H_dir.shape[0]

        # Concatenate all channel info
        h_all = torch.cat([
            H_dir.view(batch_size, -1),
            H1.view(batch_size, -1),
            H2.view(batch_size, -1)
        ], dim=1)

        # Predict BS beamformer
        W_flat = self.bs_beamformer(h_all)
        W = W_flat.view(batch_size, self.N_t, self.K, 2)
        W = self.power_normalize(W)

        # Predict RIS phase shifts
        h_ris = torch.cat([H1.view(batch_size, -1), H2.view(batch_size, -1)], dim=1)
        phase_logits = self.ris_controller(h_ris)
        Theta = 2 * np.pi * torch.sigmoid(phase_logits)  # Map to [0, 2π)

        return W, Theta

    def power_normalize(self, W, P=1.0):
        W_c = torch.complex(W[..., 0], W[..., 1])
        power = torch.sum(torch.abs(W_c) ** 2, dim=[1, 2], keepdim=True)
        W_c = W_c * torch.sqrt(P / (power + 1e-8))
        return torch.stack([W_c.real, W_c.imag], dim=-1)

Performance (2025 results):

Low pilot overhead: Robust channel estimation and beamforming
Hybrid-field: Handles both near-field and far-field users
Gains: 10-15 dB improvement over no-RIS baseline

Part IV: Adaptive Beam Tracking for High Mobility

Challenge: At 500+ km/h (high-speed trains, UAVs), channels change faster than beam training can complete.

2025 Solution: Predefined-time adaptive neural network beam training achieves 12.2% RMSE reduction.

Python

class AdaptiveBeamTracker(nn.Module):
    """Adaptive neural network for fast beam tracking

    Based on: 'Improving Signal Coverage in Millimeter‐Wave Massive MIMO
    via Efficient Predefined‐Time Adaptive Neural Network–Based Beam Training'
    (2025)
    """
    def __init__(self, N_beams, history_len=5):
        super().__init__()

        # LSTM for temporal tracking
        self.lstm = nn.LSTM(
            input_size=N_beams,  # Beam power measurements
            hidden_size=128,
            num_layers=2,
            batch_first=True
        )

        # Prediction head
        self.predictor = nn.Sequential(
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, N_beams)
        )

        self.history_len = history_len

    def forward(self, beam_measurements_seq):
        """
        Args:
            beam_measurements_seq: Sequence of beam power measurements
                                   (batch, seq_len, N_beams)
        Returns:
            predicted_best_beam: Index of predicted best beam (batch,)
        """
        # LSTM encoding
        lstm_out, _ = self.lstm(beam_measurements_seq)

        # Predict beam scores for next time step
        beam_scores = self.predictor(lstm_out[:, -1, :])

        # Select best beam
        predicted_best_beam = torch.argmax(beam_scores, dim=1)

        return predicted_best_beam

# Training
def train_beam_tracker(model, dataloader, num_epochs):
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
    criterion = nn.CrossEntropyLoss()

    for epoch in range(num_epochs):
        for history, next_best_beam in dataloader:
            # history: (batch, seq_len, N_beams)
            # next_best_beam: (batch,) ground truth

            predicted_beam_logits = model.predictor(model.lstm(history)[0][:, -1, :])
            loss = criterion(predicted_beam_logits, next_best_beam)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

Results (500 km/h mobility):

12.2% RMSE reduction vs. conventional methods
Real-time responsiveness: <10ms prediction latency
Robustness: Maintains performance in dynamic environments

Sources:

AI-Based Beamforming for mmWave and THz Systems: From Classical to Neural Approaches

Table of Contents

Introduction

Part I: Classical Beamforming Foundations

Digital Beamforming: The Ideal Case

The mmWave Challenge: Why Hybrid Beamforming?

Hybrid Beamforming Problem Formulation

Part II: Deep Learning for Hybrid Beamforming

Architecture 1: Supervised Learning

Architecture 2: Unsupervised Learning for Near-Field Beamforming

Part III: RIS-Aided Beamforming

Reconfigurable Intelligent Surfaces (RIS)

Deep Learning for RIS-Aided Beamforming

Part IV: Adaptive Beam Tracking for High Mobility

Frequently Asked Questions

Enrico Piovano, PhD

Related Articles

Deep Learning for Channel Estimation in Massive MIMO Systems

AI-RAN: The AI-Native Foundation for 6G Networks

Building Intelligent RAN: O-RAN and RIC Architecture Deep Dive

6G Network Architecture: AI at Every Layer - A Complete Technical Vision for IMT-2030

Transformer Architecture: A Complete Deep Dive

Related Articles

DL for THz UM-MIMO: Foundation Models (Aug 2025)

RIS-aided THz Massive MIMO Beamforming

Adaptive Neural Network Beam Training (2025)

RIS Aided Terahertz Communication - Springer