TRAM: Bridging Trust Regions and Sharpness Aware Minimization

TRAM: Bridging Trust Regions and Sharpness Aware Minimization

ICLR 2025 Conference Submission501 Authors

24 Sept 2024 (modified: 24 Sept 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsCC BY 4.0

Keywords: sharpness-aware minimization, sam, trust region, optimization, cross-lingual transfer, language modeling

Abstract: Sharpness-aware minimization (SAM) reports improving domain generalization by reducing the loss surface curvature in the parameter space. However, generalization during _fine-tuning_ is often more dependent on the transferability of _representations_ in the function space. Trust-region methods (TR) target this goal by regularizing representation curvature to reduce catastrophic forgetting of pre-trained task-agnostic information while adopting task-specific skills. We consider unifying these strategies for low curvature in both parameter space and function space to improve out-of-domain (OOD) generalization. We propose **Trust Region Aware Minimization** (TRAM), a SAM algorithm fine-tuning for low parameter sharpness and smooth, informative representations preserving pre-trained structure. TRAM uses a trust region bound to inform the SAM adversarial neighborhood, introducing an awareness of function curvature within optimization for flatter minima. We empirically validate TRAM in vision (cross-dataset adaptation) and text (OOD language modeling, zero-shot cross-lingual transfer) tasks where robust domain transfer and representation generality are critical. TRAM outperforms SAM- and TR-based optimization across all tasks, notably surpassing competing methods for hard transfer between _anticorrelated_ domains. TRAM establishes a novel standard in fine-tuning for domain-generalizable models with minimal additional computation over previous sharpness-aware methods.

Submission Number: 501

Loading