*Bounty: 50*

*Bounty: 50*

Highest entropy for sequences made from a

finite alphabet of 4 symbols can achieve, is `Theoretical_Shannon_entropy = log2(4)`

My question is can we use this information to determine the length $l$ of the data i.e, number of elements in a sequence made of these 4 symbols?

**Details:**

Consider a source that emits codewords consisiting of adjacent symbols of length $l$. The sequence length is $N > l$. If the source is binary ($n=2$ symbols), then I have $N(w) = n^l$ possible words of length $l$. Each word is associated with a probability and I need to estimate the probability numerically since it is unknown. Let, $n=4$ and the alphabet set is $A = {1,2,3,4}$. Let the symbol sequence be $b =[2, 3, 4, 1, 2, 2,1, 3,2, 1, 2, 4,ldots]’ $ and $N$ denotes the number of elements in the sequence.

A block of size $l$ is defined as a segment of $l$ consecutive elements of the symbol sequence or in other words a concatenation of several symbols. If $w$ is a symbol sequence of size $l$, then $N(w)$ denotes the number of blocks of $b$ which are identical to $w$.

$p(w)$ is the probability that a block from $b$ is identical to a symbol sequence $w$ of size $l$ i.e.

$$p(w) = frac{N(w)}{n-|w|+1}$$

Let, a symbol sequence of length $N = 10$ be $b = {1,3,4,1,2,1,1,3,4,2}$, here the alphabet set is $mathcal{A} = {1,2,3,4}$ and the number of symbols $|mathcal{A}| = n = 4$. Each symbol can be represented by 4 bits.

- How can I determine the optimal length, $l le N$ of the sequence $b$ using the concept of MAximum entropy principle? In the paper, it appears that the optimal block size is the one for which block entropy is highest. Why is this considered? Basically, I want to find the block length $l$ of each subword of $b$.

Below is an implementation to show entropy calculation for for a data sequence `b = randi([1 n],1,N)`

; where `N= 100`

length of the sequence, $mathcal{A} = {1,2,3,4}$, $|mathcal{A}|$= n = 4. The entropy of the series is

`H_N = 1.9823`

for `N=100`

I then create subwords using different sized windows : `Window = [1,2,4,6,8,10,12]`

This gives Shannon’s entropy for each sub-word (block) as

`H_w = 1.98231466696056 3.22738663199666 5.04414103372749 6.03280771350301 6.38862117669943 6.46383859624266 6.45326152085405`

Out of these entropy values for each block the maximum entropy value is, `maxEntropy = 6.4638`

for block of size `blksze = 12`

.

`LargestEntropy_Theory =log2(n) = log2(4)= 2`

this is the theoretical value which is less than the maxEntropy for the block size 12.

Plot on left side (a) shows the effect of block length on Shannon’s block entropy. The maximum value of block entropy is reached at block length = 12 and corresponding value is 6.4638 which is greater than $log2(4)$.

Plot on right side (b) shows the effect of block length on Shannon’e entropy rate. From this plot, the `maxEntropyRate = 1.9912`

for `blksze2 = 1`

which means considering all the data points, `N`

. What do these mean and which one to select?

- Can entropy of the whole sequence be less than the block size?

What do I infer from these values and how can I apply maximum entropy principle?

```
clear all
N= 100; %total length of the sequence
n=4; % number of unique symbols (alphabets)
b = randi([1 n],1,N); % creating the sequence
p_1 = sum(b==1)/length(b); %calculating probability
p_2 = sum(b==2)/length(b);
p_3 = sum(b==3)/length(b);
p_4 = sum(b==4)/length(b);
p = [p_1,p_2,p_3,p_4];
H_N = -sum(p(p>0).*log2(p(p>0))) % entropyfor the whole sequence for N =100
Window = [1,2,4,6,8,10,12]; %this is the array of different block size
Base=2;
ShEntropy = zeros(1,length(Window));
for NWindows=1:length(Window)
blk_size = Window(NWindows);
ShEntropy(NWindows) =BlockEntropy(Series,blk_size,Base );% this is H_w
store_entropy(NWindows,:) = [ShEntropy(NWindows),blk_size] ;
EntropyRate(NWindows) = ShEntropy(NWindows)/blk_size ;
store_EntropyRate(NWindows,:) = [EntropyRate(NWindows),blk_size] ;
end
temp1 = sortrows(store_entropy,1);
maxEntropy = temp1(end,1)
blksze1 = temp1(end,2)
figure(1)
subplot(1,2,1)
plot(Window(1:end), ShEntropy(1:end));
subplot(1,2,2)
temp2 = sortrows(store_EntropyRate,1);
maxEntropyRate = temp2(end,1)
blksze2 = temp2(end,2)
plot(Window(1:end), EntropyRate(1:end));
LargestEntropy_Theory =log2(n)
function ShEntropy =BlockEntropy(Series,Window,Base )
n=length(Series);
D=zeros(n,Window); % Pre Allocate Memory
for k=1:Window; D(:,k)=circshift(Series,1-k);end
D=D(1:end-Window+1,:); % Truncate Last Part
%
% Repace each Row with a "SYMBOL"
% in this Case a Number ...............
[K l]=size(D);
for k=1:K; MyData(k)=polyval(D(k,:),Base);end
clear D
UniqueMyData = unique(MyData);
nUniqueMyData = length(UniqueMyData);
FreqMyData = zeros(nUniqueMyData,1); % Initialization
for i = 1:nUniqueMyData
FreqMyData(i) = ....
sum(double(MyData == UniqueMyData(i)));
end
% Calculate sample class probabilities
P = FreqMyData / sum(FreqMyData);
% Calculate entropy in base 2
ShEntropy= -sum(P .* log2(P)); % entropy of each block, H_n
end
```