Keywords: sharpness-aware minimization, sam, trust region, optimization, cross-lingual transfer, language modeling
Abstract: Sharpness-aware minimization (SAM) reports improving domain generalization by
reducing the loss surface curvature in the parameter space. However,
generalization during _fine-tuning_ is often more dependent on the
transferability of _representations_ in the function space. Trust-region
methods (TR) target this goal by regularizing representation curvature to reduce
catastrophic forgetting of pre-trained task-agnostic information while adopting
task-specific skills. We consider unifying these strategies for low curvature in
both parameter space and function space to improve out-of-domain (OOD)
generalization. We propose **Trust Region Aware Minimization** (TRAM), a
SAM algorithm fine-tuning for low parameter sharpness and smooth, informative
representations preserving pre-trained structure. TRAM uses a trust region bound
to inform the SAM adversarial neighborhood, introducing an awareness of function
curvature within optimization for flatter minima. We empirically validate TRAM
in vision (cross-dataset adaptation) and text (OOD language modeling, zero-shot
cross-lingual transfer) tasks where robust domain transfer and representation
generality are critical. TRAM outperforms SAM- and TR-based optimization across
all tasks, notably surpassing competing methods for hard transfer between
_anticorrelated_ domains. TRAM establishes a novel standard in
fine-tuning for domain-generalizable models with minimal additional computation
over previous sharpness-aware methods.
Submission Number: 501
Loading