A hybrid Transformer architecture that replaces standard Feed-Forward Networks (FFNs) with Bi-directional Pseudoinverse Learners (Bi-PIL). This enables gradient-free training for FFN layers while ...