#StackBounty: Help in applying Maximum Entropy Principle in determining block length

Bounty: 50

Highest entropy for sequences made from a
finite alphabet of 4 symbols can achieve, is Theoretical_Shannon_entropy = log2(4) My question is can we use this information to determine the length $l$ of the data i.e, number of elements in a sequence made of these 4 symbols?

Details:

Consider a source that emits codewords consisiting of adjacent symbols of length $l$. The sequence length is $N > l$. If the source is binary ($n=2$ symbols), then I have $N(w) = n^l$ possible words of length $l$. Each word is associated with a probability and I need to estimate the probability numerically since it is unknown. Let, $n=4$ and the alphabet set is $A = {1,2,3,4}$. Let the symbol sequence be $b =[2, 3, 4, 1, 2, 2,1, 3,2, 1, 2, 4,ldots]’ $ and $N$ denotes the number of elements in the sequence.

A block of size $l$ is defined as a segment of $l$ consecutive elements of the symbol sequence or in other words a concatenation of several symbols. If $w$ is a symbol sequence of size $l$, then $N(w)$ denotes the number of blocks of $b$ which are identical to $w$.

$p(w)$ is the probability that a block from $b$ is identical to a symbol sequence $w$ of size $l$ i.e.
$$p(w) = frac{N(w)}{n-|w|+1}$$

Let, a symbol sequence of length $N = 10$ be $b = {1,3,4,1,2,1,1,3,4,2}$, here the alphabet set is $mathcal{A} = {1,2,3,4}$ and the number of symbols $|mathcal{A}| = n = 4$. Each symbol can be represented by 4 bits.

  • How can I determine the optimal length, $l le N$ of the sequence $b$ using the concept of MAximum entropy principle? In the paper, it appears that the optimal block size is the one for which block entropy is highest. Why is this considered? Basically, I want to find the block length $l$ of each subword of $b$.

Below is an implementation to show entropy calculation for for a data sequence b = randi([1 n],1,N); where N= 100 length of the sequence, $mathcal{A} = {1,2,3,4}$, $|mathcal{A}|$= n = 4. The entropy of the series is
H_N = 1.9823 for N=100

I then create subwords using different sized windows : Window = [1,2,4,6,8,10,12] This gives Shannon’s entropy for each sub-word (block) as

H_w = 1.98231466696056 3.22738663199666 5.04414103372749 6.03280771350301 6.38862117669943 6.46383859624266 6.45326152085405

Out of these entropy values for each block the maximum entropy value is, maxEntropy = 6.4638 for block of size blksze = 12.
LargestEntropy_Theory =log2(n) = log2(4)= 2 this is the theoretical value which is less than the maxEntropy for the block size 12.

plot
Plot on left side (a) shows the effect of block length on Shannon’s block entropy. The maximum value of block entropy is reached at block length = 12 and corresponding value is 6.4638 which is greater than $log2(4)$.

Plot on right side (b) shows the effect of block length on Shannon’e entropy rate. From this plot, the maxEntropyRate = 1.9912 for blksze2 = 1 which means considering all the data points, N. What do these mean and which one to select?

  • Can entropy of the whole sequence be less than the block size?

What do I infer from these values and how can I apply maximum entropy principle?

clear all
N= 100; %total length of the sequence
n=4; % number of unique symbols (alphabets)
b = randi([1 n],1,N); % creating the sequence
p_1 = sum(b==1)/length(b); %calculating probability 
p_2 = sum(b==2)/length(b);
p_3 = sum(b==3)/length(b);
p_4 = sum(b==4)/length(b);


p = [p_1,p_2,p_3,p_4];
H_N = -sum(p(p>0).*log2(p(p>0))) % entropyfor the whole sequence for N =100

Window = [1,2,4,6,8,10,12];  %this is the array of different block size
Base=2;

   ShEntropy = zeros(1,length(Window));
for NWindows=1:length(Window)
blk_size = Window(NWindows);
    ShEntropy(NWindows) =BlockEntropy(Series,blk_size,Base );% this is H_w
    store_entropy(NWindows,:) = [ShEntropy(NWindows),blk_size] ;
 EntropyRate(NWindows) = ShEntropy(NWindows)/blk_size ;
 store_EntropyRate(NWindows,:) = [EntropyRate(NWindows),blk_size] ;
end
temp1 = sortrows(store_entropy,1);
maxEntropy = temp1(end,1)
blksze1 = temp1(end,2)
figure(1)
subplot(1,2,1)
plot(Window(1:end), ShEntropy(1:end));
subplot(1,2,2)
temp2 = sortrows(store_EntropyRate,1);
maxEntropyRate = temp2(end,1)
blksze2 = temp2(end,2)
plot(Window(1:end),  EntropyRate(1:end));
LargestEntropy_Theory =log2(n)


 function ShEntropy =BlockEntropy(Series,Window,Base )

    n=length(Series);
    D=zeros(n,Window);  % Pre Allocate Memory
    for k=1:Window;    D(:,k)=circshift(Series,1-k);end
    D=D(1:end-Window+1,:); % Truncate Last Part
    %
    % Repace each Row with a "SYMBOL"
    % in this Case a Number ...............
    [K l]=size(D);
    for k=1:K; MyData(k)=polyval(D(k,:),Base);end
    clear D

    UniqueMyData = unique(MyData);
    nUniqueMyData = length(UniqueMyData);
    FreqMyData = zeros(nUniqueMyData,1); % Initialization
    for i = 1:nUniqueMyData
        FreqMyData(i) = ....
            sum(double(MyData == UniqueMyData(i)));
    end
    % Calculate sample class probabilities
    P = FreqMyData / sum(FreqMyData);
    % Calculate entropy in base 2
         ShEntropy= -sum(P .* log2(P)); % entropy of each block, H_n
    end


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.