In this paper, we present novel strategies for stationary/transient signal separation in audio signals in order to exploit the basic observation that stationary components are sparse in frequency and persistent over time whereas transients are sparse in time and persistent across frequency. We utilize a multi-resolution STFT approach which allows to define structured shrinkage operators to tune into the characteristic spectrotemporal shapes of the stationary and transient signal layers. Structure is incorporated by considering the energy of time-frequency neighbourhoods or modulation spectrum regions instead of individual STFT coefficients, and shrinkage operators are employed in a dual-layered Iterated Shrinkage/Thresholding Algorithm (ISTA) framework. We further propose a novel iterative scheme, Iterative Cross-Shrinkage (ICS). In experiments using artificial test signals, ICS clearly outperforms the dual-layered ISTA and yields particularly good results in conjunction with a dynamic update of the shrinkage thresholds. The application of the novel algorithms to recordings from acoustic musical instruments provides perceptually convincing separation of transients.