You are what you read, here are mine
Notes taken from Demystifying Entropy, Demystifying Cross-Entropy and Demystifying KL Divergence
Shannon defind the entropy as the smallest possible average size of lossless encoding of the messages sent from the soruce to the destination.
In general, when we need N different values expressed in bits, we need $\log_2N$
bits and we don't need more than this.
If a message type happens 1 out of N times, the above formula gives the minimum size required.
Combining with the probabilities to get the average size, we will get:
If Entropy is high, the average encoding size is significant which means each message tends to have more information. It is why high entropy is associated with disorder, uncertainty, surprise, unpredictability, amount of information.