What is the contribution of a skin factor or location for the production? A NN is treated as a black-box solution since we can't derive answers based on the given approximation. And no way can somehow extract the function from it. A trained NN is a collection of weights that are interconnected following some architecture.
Scientists developed several so-called explainers techniques to understand the internal logic of NNs so that it would be easier to learn what happens to an input to produce an output image. There are two kinds of explanatory models:
Ante-hoc or Interpretable Models - here are the methods that can be explained based on their internal structure. Those methods like Regression, Naïve Bayes, Random Forests, Decision Trees/Graphs.
Post-hoc is the method that tries to explain the outcome by highlighting a part of an image responsible for the final decision. This explanation can be applied to various NN to explain its output, but it does tell much of a model itself. Methods like: visualization, gradients, inverting DNN, Decomposition [arXiv:1911.12116v1].
We try to make sense of NN with prejudgments and standards that we employ to written mathematical derivations. Mentioned explanatory models try to uncover the hidden meaning of the NN decisions. We need to accept that the black box doesn't mean a wrong solution; it should mean that we need to experiment and find other ways to establish its validity. Otherwise, we will reject any development from this field that would hinder innovation and growth for problems we don't have solutions to yet.