●This means that as N →∞ there can only be 2 NH ( X ) different result sequences that are probabilistically likely (and each one of them has the same a-priori probability) ●Therefore, we only need NH ( X ) bits to encode any result sequence that is likely to occur ●This means on average we need only H ( X ) bits per result to encode it