In essence, humans are good at identifying an essential set of features that we are worthy of considering when giving their judgments. Neural networks usually lack that ability, but there are ways to add it with Attention modules.
An Attention module draws attention to relevant activations during training. It helps to distill incoming information of what's essential for the problem to solve and provides us with better generalization.
I want to cover an exciting paper that takes fault detection to the next level using the attention module. The research paper, "YiMin Dou et al., 2021, Attention-Based 3D Seismic Fault Segmentation Training by a Few 2D Slice Labels. arXiv:2105.03857" solves two of the significant problems within AI fault detection:
1) Massive data for training. The authors are proposing to use λ-BCE and λ-smooth L1 loss that allows a training network with a few slices of interpreted data.
2) Noisy prediction. The attention module helps reduce the background influence and steers its "attention" to the faulting region. The results are convincing enough; without this module, the network produces much noisy predictions.