Skip to main content
Back to Blog

AI-Based Beamforming for mmWave and THz Systems: From Classical to Neural Approaches

Detailed technical look at AI-driven beamforming for millimeter wave and terahertz massive MIMO systems—from hybrid beamforming architectures to deep learning methods, RIS-aided systems, and near-field beamforming for 6G ultra-massive MIMO.

6 min read
Share:

Introduction

Millimeter wave (mmWave, 24-100 GHz) and terahertz (THz, 0.1-10 THz) frequencies are essential for 5G/6G to deliver multi-gigabit data rates. However, these high frequencies suffer from severe path loss—free-space path loss increases with the square of frequency:

Lfree-space=(4πdfc)2L_{\text{free-space}} = \left(\frac{4\pi d f}{c}\right)^2

where dd is distance, ff is frequency, and cc is speed of light. At 100 GHz, path loss is 36 dB worse than at 3 GHz for the same distance. Additionally, mmWave/THz signals are absorbed by oxygen (60 GHz), water vapor (183 GHz, 325 GHz), and blocked by walls and even human bodies.

The solution: Beamforming. By focusing transmitted power into narrow beams toward intended receivers, we can:

  • Overcome path loss (20-30 dB beamforming gain)
  • Increase SNR and extend range
  • Reduce interference to other users
  • Enable spatial multiplexing (serving multiple users simultaneously)

Traditional beamforming methods—zero-forcing (ZF), minimum mean square error (MMSE), maximum ratio transmission (MRT)—face critical challenges at mmWave/THz:

  1. Computational complexity: O(N3)O(N^3) matrix inversions infeasible for 256-1024 antenna arrays
  2. Hybrid architectures: Limited RF chains require joint analog/digital optimization
  3. Near-field effects: At THz with ultra-massive MIMO (UM-MIMO), far-field approximation breaks down
  4. High mobility: 500+ km/h vehicles at 6G frequencies require ultra-fast beam tracking
  5. RIS integration: Reconfigurable Intelligent Surfaces add another optimization dimension

Deep learning has emerged as a transformative solution. Recent 2025 research demonstrates:

  • 10-20× speedup in hybrid precoder design vs. iterative algorithms
  • Near-optimal performance matching model-based methods
  • Robustness to imperfect CSI and hardware impairments
  • Adaptive beam tracking with 12.2% RMSE reduction at 500 km/h

This post provides a complete technical treatment: classical beamforming fundamentals, hybrid analog-digital architectures, state-of-the-art deep learning approaches (supervised, unsupervised, reinforcement learning), RIS-aided beamforming, near-field considerations for THz UM-MIMO, and production deployment strategies.

Prerequisites: Linear algebra (matrix operations, SVD), wireless communication (MIMO, OFDM, channel models), deep learning basics.

Key Papers:


Part I: Classical Beamforming Foundations

Digital Beamforming: The Ideal Case

In fully-digital beamforming, each antenna has a dedicated RF chain (mixer, ADC/DAC), enabling arbitrary complex weights.

System Model (downlink):

y=HWs+n\mathbf{y} = \mathbf{H} \mathbf{W} \mathbf{s} + \mathbf{n}

Where:

  • yCK×1\mathbf{y} \in \mathbb{C}^{K \times 1}: received signal (KK users)
  • HCK×Nt\mathbf{H} \in \mathbb{C}^{K \times N_t}: channel matrix
  • WCNt×K\mathbf{W} \in \mathbb{C}^{N_t \times K}: digital precoder (beamforming matrix)
  • sCK×1\mathbf{s} \in \mathbb{C}^{K \times 1}: data symbols
  • nCN(0,σ2I)\mathbf{n} \sim \mathcal{CN}(0, \sigma^2 \mathbf{I}): noise

Beamforming Goal: Design W\mathbf{W} to maximize sum rate subject to power constraint:

maxWk=1Klog2(1+hkHwk2jkhkHwj2+σ2)\max_{\mathbf{W}} \sum_{k=1}^K \log_2\left(1 + \frac{|\mathbf{h}_k^H \mathbf{w}_k|^2}{\sum_{j \neq k} |\mathbf{h}_k^H \mathbf{w}_j|^2 + \sigma^2}\right)

s.t.WF2Ptotal\text{s.t.} \quad \|\mathbf{W}\|_F^2 \leq P_{\text{total}}

Classical Solutions:

1. Zero-Forcing (ZF):

Eliminates inter-user interference:

WZF=HH(HHH)1D\mathbf{W}_{\text{ZF}} = \mathbf{H}^H (\mathbf{H} \mathbf{H}^H)^{-1} \mathbf{D}

where D\mathbf{D} is diagonal power allocation matrix.

  • Pros: Simple, zero interference
  • Cons: Noise amplification, requires NtKN_t \geq K, O(K3)O(K^3) complexity

2. Maximum Ratio Transmission (MRT):

Maximizes received signal power:

WMRT=HHD\mathbf{W}_{\text{MRT}} = \mathbf{H}^H \mathbf{D}

  • Pros: Simplest, optimal for single user
  • Cons: High interference for multi-user

3. MMSE (Regularized ZF):

Balances signal power and interference:

WMMSE=HH(HHH+αI)1D\mathbf{W}_{\text{MMSE}} = \mathbf{H}^H (\mathbf{H} \mathbf{H}^H + \alpha \mathbf{I})^{-1} \mathbf{D}

where α=σ2/P\alpha = \sigma^2 / P is regularization parameter.

  • Pros: Better SNR vs. ZF
  • Cons: Still O(K3)O(K^3) inversion

The mmWave Challenge: Why Hybrid Beamforming?

At mmWave frequencies with 64-256 antennas, fully-digital beamforming is impractical:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│              DIGITAL vs HYBRID BEAMFORMING                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  FULLY DIGITAL (Ideal but Impractical at mmWave):                      │
│  ────────────────────────────────────────────────                        │
│                                                                          │
│  ┌────┐   ┌─────┐   ┌─────┐   ┌──────┐                                │
│  │ BB │──▶│ DAC │──▶│ Mix │──▶│  PA  │──▶ Antenna 1                   │
│  └────┘   └─────┘   └─────┘   └──────┘                                │
│  ┌────┐   ┌─────┐   ┌─────┐   ┌──────┐                                │
│  │ BB │──▶│ DAC │──▶│ Mix │──▶│  PA  │──▶ Antenna 2                   │
│  └────┘   └─────┘   └─────┘   └──────┘                                │
│    ⋮         ⋮         ⋮         ⋮                                      │
│  ┌────┐   ┌─────┐   ┌─────┐   ┌──────┐                                │
│  │ BB │──▶│ DAC │──▶│ Mix │──▶│  PA  │──▶ Antenna N_t                 │
│  └────┘   └─────┘   └─────┘   └──────┘                                │
│                                                                          │
│  • One RF chain per antenna                                             │
│  • Full control, arbitrary beamforming                                  │
│  • Cost: $10K-50K per RF chain at mmWave                               │
│  • Power: 1-5W per chain                                                │
│  • For 256 antennas: $2.5M+, 1.3kW!                                    │
│                                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                          │
│  HYBRID (Practical Solution):                                           │
│  ──────────────────────────                                              │
│                                                                          │
│                      ┌─────────────────────┐                            │
│                      │ Analog Beamformer   │                            │
│                      │ (Phase Shifters)    │                            │
│                      │   N_t × N_RF       │                            │
│                      └──────────┬──────────┘                            │
│                                 │                                        │
│                     ┌───────────┼───────────┐                           │
│                     │           │           │                           │
│                     ▼           ▼           ▼                           │
│                  Ant 1 ... Ant N_t/N_RF  ...                           │
│                                                                          │
│  ┌────────────┐   ┌─────┐   ┌─────┐   ┌──────┐                        │
│  │  Digital   │──▶│ DAC │──▶│ Mix │──▶│ Analog│──▶ Antennas 1-64      │
│  │ Precoder   │   └─────┘   └─────┘   │Precoder│                      │
│  │            │   ┌─────┐   ┌─────┐   │       │                        │
│  │ (Baseband) │──▶│ DAC │──▶│ Mix │──▶│  (RF) │──▶ Antennas 65-128    │
│  │            │   └─────┘   └─────┘   │       │                        │
│  │   F_BB     │     ⋮         ⋮       │ F_RF  │         ⋮              │
│  │ N_RF × K   │   ┌─────┐   ┌─────┐   │       │                        │
│  │            │──▶│ DAC │──▶│ Mix │──▶│       │──▶ Antennas 193-256    │
│  └────────────┘   └─────┘   └─────┘   └──────┘                        │
│                                                                          │
│  • N_RF << N_t RF chains (e.g., 4-16 chains for 256 antennas)         │
│  • Analog beamformer: phase shifters (constant magnitude)              │
│  • Digital precoder: full flexibility within N_RF streams              │
│  • Cost: 10-20× lower than fully digital                               │
│  • Power: 5-10× lower                                                   │
│                                                                          │
│  EFFECTIVE PRECODER:                                                     │
│  W = F_RF × F_BB                                                        │
│                                                                          │
│  Challenge: Joint optimization of F_RF and F_BB with constraint        │
│  that F_RF has constant-magnitude entries (phase-only control)          │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Hybrid Beamforming Problem Formulation

Objective: Approximate optimal digital precoder Wopt\mathbf{W}_{\text{opt}} with hybrid structure:

minFRF,FBBWoptFRFFBBF2\min_{\mathbf{F}_{\text{RF}}, \mathbf{F}_{\text{BB}}} \|\mathbf{W}_{\text{opt}} - \mathbf{F}_{\text{RF}} \mathbf{F}_{\text{BB}}\|_F^2

Constraints:

  1. FRF(i,j)=1/Nt|\mathbf{F}_{\text{RF}}(i,j)| = 1/\sqrt{N_t} (constant magnitude for phase shifters)
  2. FRFFBBF2Ptotal\|\mathbf{F}_{\text{RF}} \mathbf{F}_{\text{BB}}\|_F^2 \leq P_{\text{total}} (power constraint)

This is a non-convex, NP-hard problem due to constant-magnitude constraint.

Classical Solution: Orthogonal Matching Pursuit (OMP):

Iteratively select analog beamforming vectors:

Python
def omp_hybrid_precoding(H, K, N_RF, N_t):
    """OMP algorithm for hybrid precoding"""
    # Step 1: Compute optimal unconstrained precoder
    U, S, Vh = np.linalg.svd(H)
    W_opt = Vh[:K, :].T.conj()  # First K right singular vectors

    # Step 2: Initialize
    F_RF = []
    residual = W_opt

    # Step 3: Iteratively select RF beamformers
    for _ in range(N_RF):
        # Find best matching steering vector
        A = get_array_response_matrix(N_t)  # DFT codebook
        correlations = A.T.conj() @ residual
        best_idx = np.argmax(np.abs(correlations).sum(axis=1))
        best_vector = A[:, best_idx]

        F_RF.append(best_vector)
        residual = W_opt - np.array(F_RF).T @ np.linalg.pinv(np.array(F_RF).T) @ W_opt

    F_RF = np.array(F_RF).T  # (N_t × N_RF)

    # Step 4: Compute digital precoder
    F_BB = np.linalg.pinv(F_RF) @ W_opt  # (N_RF × K)

    # Step 5: Normalize
    F_BB = F_BB * np.sqrt(P_total) / np.linalg.norm(F_RF @ F_BB, 'fro')

    return F_RF, F_BB

Limitations:

  • Iterative, slow (O(Nt2NRFK)O(N_t^2 N_{\text{RF}} K) per iteration)
  • Sensitive to codebook design
  • Suboptimal due to greedy selection

Part II: Deep Learning for Hybrid Beamforming

Architecture 1: Supervised Learning

Idea: Learn the mapping H{FRF,FBB}\mathbf{H} \rightarrow \{\mathbf{F}_{\text{RF}}, \mathbf{F}_{\text{BB}}\} from labeled data.

Two-Stage Deep Learning Approach (2022-2025 state-of-the-art):

Code
┌─────────────────────────────────────────────────────────────────────────┐
│              TWO-STAGE DL HYBRID PRECODING                               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  STAGE 1: Analog Precoder Design (Classification)                      │
│  ───────────────────────────────────────────────                        │
│                                                                          │
│  Input: Channel H ∈ ℂ^(K × N_t)                                        │
│         │                                                                │
│         ▼                                                                │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │  Feature Extraction (CNN/MLP)                                │       │
│  │  • Convert H to [Re(H), Im(H)]                              │       │
│  │  • Extract spatial patterns                                  │       │
│  └──────────────────┬───────────────────────────────────────────┘       │
│                     │                                                    │
│                     ▼                                                    │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │  Beam Selection (Softmax Classifier)                        │       │
│  │  • Select N_RF beams from codebook of size N_c              │       │
│  │  • Output: N_RF indices from codebook                       │       │
│  └──────────────────┬───────────────────────────────────────────┘       │
│                     │                                                    │
│                     ▼                                                    │
│  F_RF = [a(θ₁), a(θ₂), ..., a(θ_{N_RF})]                              │
│         where a(θ) = steering vector at angle θ                        │
│                                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                          │
│  STAGE 2: Digital Precoder Design (Regression)                         │
│  ───────────────────────────────────────────                            │
│                                                                          │
│  Input: H, F_RF                                                         │
│         │                                                                │
│         ▼                                                                │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │  Equivalent Channel                                          │       │
│  │  H_eff = H × F_RF  ∈ ℂ^(K × N_RF)                          │       │
│  └──────────────────┬───────────────────────────────────────────┘       │
│                     │                                                    │
│                     ▼                                                    │
│  ┌──────────────────────────────────────────────────────────────┐       │
│  │  Neural Network (MLP)                                        │       │
│  │  • Input: [Re(H_eff), Im(H_eff)] flattened                 │       │
│  │  • Hidden layers with ReLU                                   │       │
│  │  • Output: [Re(F_BB), Im(F_BB)] flattened                  │       │
│  └──────────────────┬───────────────────────────────────────────┘       │
│                     │                                                    │
│                     ▼                                                    │
│  F_BB ∈ ℂ^(N_RF × K)                                                   │
│                                                                          │
│  ─────────────────────────────────────────────────────────────────────  │
│                                                                          │
│  Final Precoder: W = F_RF × F_BB                                       │
│                                                                          │
│  ADVANTAGES:                                                             │
│  • 10-20× faster than iterative OMP                                    │
│  • Near-optimal spectral efficiency                                     │
│  • Handles imperfect CSI gracefully                                     │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

PyTorch Implementation:

Python
import torch
import torch.nn as nn

class AnalogPrecoder

Net(nn.Module):
    """Stage 1: Learn to select analog beamforming vectors"""
    def __init__(self, K, N_t, N_RF, codebook_size):
        super().__init__()
        input_dim = 2 * K * N_t  # Real and imag parts of H

        self.network = nn.Sequential(
            nn.Linear(input_dim, 512),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(256, codebook_size)
        )

        # Select top N_RF beams
        self.N_RF = N_RF

    def forward(self, H):
        """
        Args:
            H: Channel matrix (batch, K, N_t, 2)  [2 = real, imag]
        Returns:
            beam_indices: Selected beam indices (batch, N_RF)
        """
        batch_size = H.shape[0]
        H_flat = H.view(batch_size, -1)

        # Predict beam scores
        scores = self.network(H_flat)  # (batch, codebook_size)

        # Select top N_RF beams
        _, beam_indices = torch.topk(scores, self.N_RF, dim=1)

        return beam_indices

class DigitalPrecoderNet(nn.Module):
    """Stage 2: Learn digital precoder given analog precoder"""
    def __init__(self, K, N_RF):
        super().__init__()
        input_dim = 2 * K * N_RF  # Real and imag of H_eff
        output_dim = 2 * N_RF * K  # Real and imag of F_BB

        self.network = nn.Sequential(
            nn.Linear(input_dim, 256),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(128, output_dim)
        )

        self.K = K
        self.N_RF = N_RF

    def forward(self, H_eff):
        """
        Args:
            H_eff: Effective channel H * F_RF (batch, K, N_RF, 2)
        Returns:
            F_BB: Digital precoder (batch, N_RF, K, 2)
        """
        batch_size = H_eff.shape[0]
        H_eff_flat = H_eff.view(batch_size, -1)

        # Predict F_BB
        F_BB_flat = self.network(H_eff_flat)
        F_BB = F_BB_flat.view(batch_size, self.N_RF, self.K, 2)

        return F_BB

class HybridPrecodingNetwork(nn.Module):
    """Complete two-stage hybrid precoding network"""
    def __init__(self, K, N_t, N_RF, codebook_size, codebook):
        super().__init__()
        self.analog_net = AnalogPrecoderNet(K, N_t, N_RF, codebook_size)
        self.digital_net = DigitalPrecoderNet(K, N_RF)
        self.codebook = codebook  # (N_t, codebook_size, 2)
        self.N_RF = N_RF

    def forward(self, H):
        """
        Args:
            H: Channel (batch, K, N_t, 2)
        Returns:
            F_RF: Analog precoder (batch, N_t, N_RF, 2)
            F_BB: Digital precoder (batch, N_RF, K, 2)
        """
        batch_size = H.shape[0]

        # Stage 1: Select analog beams
        beam_indices = self.analog_net(H)  # (batch, N_RF)

        # Construct F_RF from codebook
        F_RF = self.codebook[:, beam_indices, :]  # (N_t, batch, N_RF, 2)
        F_RF = F_RF.permute(1, 0, 2, 3)  # (batch, N_t, N_RF, 2)

        # Compute effective channel
        H_complex = torch.complex(H[..., 0], H[..., 1])  # (batch, K, N_t)
        F_RF_complex = torch.complex(F_RF[..., 0], F_RF[..., 1])  # (batch, N_t, N_RF)
        H_eff_complex = torch.matmul(H_complex, F_RF_complex)  # (batch, K, N_RF)
        H_eff = torch.stack([H_eff_complex.real, H_eff_complex.imag], dim=-1)

        # Stage 2: Design digital precoder
        F_BB = self.digital_net(H_eff)

        return F_RF, F_BB

# Loss function: Spectral efficiency
def spectral_efficiency_loss(H, F_RF, F_BB, noise_var):
    """Compute negative sum rate (to minimize)"""
    batch_size = H.shape[0]
    K = H.shape[1]

    # Convert to complex
    H_c = torch.complex(H[..., 0], H[..., 1])
    F_RF_c = torch.complex(F_RF[..., 0], F_RF[..., 1])
    F_BB_c = torch.complex(F_BB[..., 0], F_BB[..., 1])

    # Effective precoder
    W = torch.matmul(F_RF_c, F_BB_c)  # (batch, N_t, K)

    # Received signal: y_k = h_k^H w_k s_k + interference + noise
    # SINR for user k
    sum_rate = 0
    for k in range(K):
        h_k = H_c[:, k, :]  # (batch, N_t)
        w_k = W[:, :, k]    # (batch, N_t)

        # Signal power
        signal = torch.abs(torch.sum(h_k.conj() * w_k, dim=1)) ** 2  # (batch,)

        # Interference power
        interference = 0
        for j in range(K):
            if j != k:
                w_j = W[:, :, j]
                interference += torch.abs(torch.sum(h_k.conj() * w_j, dim=1)) ** 2

        # SINR
        sinr = signal / (interference + noise_var)

        # Rate (bits/s/Hz)
        rate = torch.log2(1 + sinr)
        sum_rate += rate

    # Return negative (for minimization)
    return -torch.mean(sum_rate)

Training:

Python
# Generate training data
def generate_channel(batch_size, K, N_t):
    """Generate Rayleigh fading channels"""
    h_real = torch.randn(batch_size, K, N_t)
    h_imag = torch.randn(batch_size, K, N_t)
    H = torch.stack([h_real, h_imag], dim=-1) / np.sqrt(2)
    return H

# Training loop
model = HybridPrecodingNetwork(K=16, N_t=64, N_RF=4, codebook_size=128, codebook=dft_codebook)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

for epoch in range(num_epochs):
    H_batch = generate_channel(batch_size=32, K=16, N_t=64)

    F_RF, F_BB = model(H_batch)
    loss = spectral_efficiency_loss(H_batch, F_RF, F_BB, noise_var=0.01)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    print(f"Epoch {epoch}, Loss: {-loss.item():.3f} bits/s/Hz")

Architecture 2: Unsupervised Learning for Near-Field Beamforming

Challenge: Labeled data (optimal FRF,FBB\mathbf{F}_{\text{RF}}, \mathbf{F}_{\text{BB}}) expensive to obtain. For THz UM-MIMO, near-field effects make classical methods inaccurate.

Solution: Unsupervised learning directly optimizes spectral efficiency without labels.

Architecture (2024 breakthrough for extremely large-scale MIMO):

Python
class UnsupervisedBeamformingNet(nn.Module):
    """Unsupervised DL for near-field beamforming

    Based on: 'Near-field Beamforming for Extremely Large-scale MIMO
    Based on Unsupervised Deep Learning' (2024)
    """
    def __init__(self, N_t, K):
        super().__init__()

        # Encoder: H → latent representation
        self.encoder = nn.Sequential(
            nn.Linear(2 * K * N_t, 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU()
        )

        # Decoder: latent → beamforming weights
        self.decoder = nn.Sequential(
            nn.Linear(256, 512),
            nn.ReLU(),
            nn.Linear(512, 2 * N_t * K),
            nn.Tanh()  # Bounded output
        )

        self.N_t = N_t
        self.K = K

    def forward(self, H):
        """
        Args:
            H: Near-field channel (batch, K, N_t, 2)
        Returns:
            W: Beamforming matrix (batch, N_t, K, 2)
        """
        batch_size = H.shape[0]

        # Encode
        H_flat = H.view(batch_size, -1)
        latent = self.encoder(H_flat)

        # Decode
        W_flat = self.decoder(latent)
        W = W_flat.view(batch_size, self.N_t, self.K, 2)

        # Power normalization
        W = self.power_normalize(W)

        return W

    def power_normalize(self, W, P_total=1.0):
        """Enforce power constraint"""
        W_complex = torch.complex(W[..., 0], W[..., 1])
        power = torch.sum(torch.abs(W_complex) ** 2, dim=[1, 2], keepdim=True)
        W_complex = W_complex * torch.sqrt(P_total / power)
        return torch.stack([W_complex.real, W_complex.imag], dim=-1)

# Training with unsupervised loss (sum rate)
def train_unsupervised(model, num_epochs, batch_size, K, N_t):
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

    for epoch in range(num_epochs):
        # Generate random channels
        H = generate_near_field_channel(batch_size, K, N_t)

        # Forward pass
        W = model(H)

        # Compute sum rate (unsupervised objective)
        sum_rate = compute_sum_rate(H, W, noise_var=0.01)

        # Maximize sum rate = minimize negative sum rate
        loss = -sum_rate

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if epoch % 100 == 0:
            print(f"Epoch {epoch}, Sum Rate: {sum_rate.item():.3f} bits/s/Hz")

def compute_sum_rate(H, W, noise_var):
    """Compute achievable sum rate"""
    H_c = torch.complex(H[..., 0], H[..., 1])  # (batch, K, N_t)
    W_c = torch.complex(W[..., 0], W[..., 1])  # (batch, N_t, K)

    batch_size, K = H_c.shape[0], H_c.shape[1]
    sum_rate = 0

    for k in range(K):
        h_k = H_c[:, k, :]  # (batch, N_t)
        w_k = W_c[:, :, k]  # (batch, N_t)

        # Signal
        signal_power = torch.abs(torch.sum(h_k.conj() * w_k, dim=1)) ** 2

        # Interference
        interference_power = 0
        for j in range(K):
            if j != k:
                w_j = W_c[:, :, j]
                interference_power += torch.abs(torch.sum(h_k.conj() * w_j, dim=1)) ** 2

        # SINR
        sinr = signal_power / (interference_power + noise_var)
        rate = torch.log2(1 + sinr)
        sum_rate += rate

    return torch.mean(sum_rate)

Advantages:

  • No labeled data needed
  • Directly optimizes end objective (sum rate)
  • Handles near-field effects automatically

Part III: RIS-Aided Beamforming

Reconfigurable Intelligent Surfaces (RIS)

RIS are planar surfaces with many passive reflecting elements that can be programmed to shape the electromagnetic environment.

Key Benefits for 6G:

  • Coverage extension: Overcome blockages, serve non-line-of-sight users
  • Energy efficiency: Passive elements (no amplification), low power
  • Beamforming gain: Additional degrees of freedom
  • Cost-effective: Much cheaper than deploying more base stations

System Model with RIS:

y=(Hdir+H2ΘH1)Ws+n\mathbf{y} = (\mathbf{H}_{\text{dir}} + \mathbf{H}_2 \boldsymbol{\Theta} \mathbf{H}_1) \mathbf{W} \mathbf{s} + \mathbf{n}

Where:

  • Hdir\mathbf{H}_{\text{dir}}: Direct BS-to-user channel
  • H1\mathbf{H}_1: BS-to-RIS channel (M×NtM \times N_t, MM = RIS elements)
  • H2\mathbf{H}_2: RIS-to-user channel (K×MK \times M)
  • Θ=diag(β1ejϕ1,,βMejϕM)\boldsymbol{\Theta} = \text{diag}(\beta_1 e^{j\phi_1}, \ldots, \beta_M e^{j\phi_M}): RIS phase shifts
  • W\mathbf{W}: BS beamforming matrix

Joint Optimization Problem:

maxW,Θk=1KRk(W,Θ)\max_{\mathbf{W}, \boldsymbol{\Theta}} \sum_{k=1}^K R_k(\mathbf{W}, \boldsymbol{\Theta})

Subject to:

  • Power constraint on W\mathbf{W}
  • βm1,ϕm[0,2π)|\beta_m| \leq 1, \phi_m \in [0, 2\pi) for RIS elements

This is extremely non-convex and challenging to solve with classical methods.

Deep Learning for RIS-Aided Beamforming

2025 State-of-the-Art: Deep learning for RIS-aided THz massive MIMO with hybrid-field channels.

Python
class RISAidedBeamformingNet(nn.Module):
    """Joint BS beamforming and RIS phase shift optimization

    Based on: 'Deep Learning–Based Channel Extrapolation and Multiuser
    Beamforming for RIS-aided Terahertz Massive MIMO Systems' (2025)
    """
    def __init__(self, N_t, M, K):
        super().__init__()
        self.N_t = N_t  # BS antennas
        self.M = M      # RIS elements
        self.K = K      # Users

        # Network for BS beamformer
        self.bs_beamformer = nn.Sequential(
            nn.Linear(2 * (K * N_t + K * M + M * N_t), 512),
            nn.ReLU(),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Linear(256, 2 * N_t * K)
        )

        # Network for RIS phase shifts
        self.ris_controller = nn.Sequential(
            nn.Linear(2 * (K * M + M * N_t), 256),
            nn.ReLU(),
            nn.Linear(256, 128),
            nn.ReLU(),
            nn.Linear(128, M)  # Phase shifts (real-valued, will map to [0, 2π))
        )

    def forward(self, H_dir, H1, H2):
        """
        Args:
            H_dir: Direct channel (batch, K, N_t, 2)
            H1: BS-to-RIS (batch, M, N_t, 2)
            H2: RIS-to-user (batch, K, M, 2)
        Returns:
            W: BS beamformer (batch, N_t, K, 2)
            Theta: RIS phase shifts (batch, M)
        """
        batch_size = H_dir.shape[0]

        # Concatenate all channel info
        h_all = torch.cat([
            H_dir.view(batch_size, -1),
            H1.view(batch_size, -1),
            H2.view(batch_size, -1)
        ], dim=1)

        # Predict BS beamformer
        W_flat = self.bs_beamformer(h_all)
        W = W_flat.view(batch_size, self.N_t, self.K, 2)
        W = self.power_normalize(W)

        # Predict RIS phase shifts
        h_ris = torch.cat([H1.view(batch_size, -1), H2.view(batch_size, -1)], dim=1)
        phase_logits = self.ris_controller(h_ris)
        Theta = 2 * np.pi * torch.sigmoid(phase_logits)  # Map to [0, 2π)

        return W, Theta

    def power_normalize(self, W, P=1.0):
        W_c = torch.complex(W[..., 0], W[..., 1])
        power = torch.sum(torch.abs(W_c) ** 2, dim=[1, 2], keepdim=True)
        W_c = W_c * torch.sqrt(P / (power + 1e-8))
        return torch.stack([W_c.real, W_c.imag], dim=-1)

Performance (2025 results):

  • Low pilot overhead: Robust channel estimation and beamforming
  • Hybrid-field: Handles both near-field and far-field users
  • Gains: 10-15 dB improvement over no-RIS baseline

Part IV: Adaptive Beam Tracking for High Mobility

Challenge: At 500+ km/h (high-speed trains, UAVs), channels change faster than beam training can complete.

2025 Solution: Predefined-time adaptive neural network beam training achieves 12.2% RMSE reduction.

Python
class AdaptiveBeamTracker(nn.Module):
    """Adaptive neural network for fast beam tracking

    Based on: 'Improving Signal Coverage in Millimeter‐Wave Massive MIMO
    via Efficient Predefined‐Time Adaptive Neural Network–Based Beam Training'
    (2025)
    """
    def __init__(self, N_beams, history_len=5):
        super().__init__()

        # LSTM for temporal tracking
        self.lstm = nn.LSTM(
            input_size=N_beams,  # Beam power measurements
            hidden_size=128,
            num_layers=2,
            batch_first=True
        )

        # Prediction head
        self.predictor = nn.Sequential(
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, N_beams)
        )

        self.history_len = history_len

    def forward(self, beam_measurements_seq):
        """
        Args:
            beam_measurements_seq: Sequence of beam power measurements
                                   (batch, seq_len, N_beams)
        Returns:
            predicted_best_beam: Index of predicted best beam (batch,)
        """
        # LSTM encoding
        lstm_out, _ = self.lstm(beam_measurements_seq)

        # Predict beam scores for next time step
        beam_scores = self.predictor(lstm_out[:, -1, :])

        # Select best beam
        predicted_best_beam = torch.argmax(beam_scores, dim=1)

        return predicted_best_beam

# Training
def train_beam_tracker(model, dataloader, num_epochs):
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
    criterion = nn.CrossEntropyLoss()

    for epoch in range(num_epochs):
        for history, next_best_beam in dataloader:
            # history: (batch, seq_len, N_beams)
            # next_best_beam: (batch,) ground truth

            predicted_beam_logits = model.predictor(model.lstm(history)[0][:, -1, :])
            loss = criterion(predicted_beam_logits, next_best_beam)

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

        print(f"Epoch {epoch}, Loss: {loss.item():.4f}")

Results (500 km/h mobility):

  • 12.2% RMSE reduction vs. conventional methods
  • Real-time responsiveness: <10ms prediction latency
  • Robustness: Maintains performance in dynamic environments

Sources:

Frequently Asked Questions

Enrico Piovano, PhD

Co-founder & CTO at Goji AI. Former Applied Scientist at Amazon (Alexa & AGI), focused on Agentic AI and LLMs. PhD in Electrical Engineering from Imperial College London. Gold Medalist at the National Mathematical Olympiad.

Related Articles

AI for CommDeep Learning

Deep Learning for Channel Estimation in Massive MIMO Systems

In-depth technical deep dive into deep learning approaches for channel estimation in massive MIMO—from traditional methods to state-of-the-art CNN-LSTM-Transformer hybrid architectures. Complete with equations, implementations, and performance analysis showing 90%+ NMSE reduction.

14 min read
AI for CommArchitecture

AI-RAN: The AI-Native Foundation for 6G Networks

In-depth tour of AI-Radio Access Networks (AI-RAN)—the foundational architecture transforming 5G and enabling 6G. From traditional RAN to AI-native systems, understand the RAN Intelligent Controller (RIC), real-time optimization, and production deployment patterns.

9 min read
AI for CommArchitecture

Building Intelligent RAN: O-RAN and RIC Architecture Deep Dive

A practical deep dive into Open RAN and RAN Intelligent Controller architecture—from E2 interface specifications to xApp/rApp development, deployment patterns, and real-world production implementations powering modern 5G networks.

5 min read
AI for CommArchitecture

6G Network Architecture: AI at Every Layer - A Complete Technical Vision for IMT-2030

Detailed look at 6G (IMT-2030) network architecture—from AI-native air interfaces and semantic communication to integrated sensing, digital twins, and self-evolving protocols. The complete technical roadmap for next-generation wireless beyond 2030.

4 min read
EducationLLMs

Transformer Architecture: A Complete Deep Dive

Deep exploration of the transformer architecture—from embedding layers through attention and feed-forward networks to the output head. Understand why decoder-only models dominate, how residual connections enable deep networks, and the engineering decisions behind GPT, Llama, and modern LLMs.

30 min read