#StackBounty: #python #numpy #vectorization Vectorizing calculation in matrix with interdependent values

Bounty: 100

I am tracking multiple discreet timeseries at multiple temporal resolutions, resulting in an SxRxB matrix where S is the number of timeseries, R is the number of different resolutions and B is the buffer, i.e. how many values each series remembers. Each series is discreet and uses a limited range of natural numbers to represent its values. I will call these “symbols” here.

For each series I want to calculate how often any of the previous measurement’s symbols directly precedes any of the current measurement’s symbols, over all measurements. I have solved this with a for-loop as seen below, but would like to vectorize it for obvious reasons.

I’m not sure if my way of structuring data is efficient, so I’m open for suggestions there. Especially the ratios matrix could be done differently I think.

Thanks in advance!

buffer = 10
resolutions = 5
num_series = 10
vocab_size = 10
data = np.full((num_series, resolutions, buffer), -1, dtype=int16)

<...fill data with data...>

# in this example: calculate for series 0 with symbol 0, series 1
# with symbol 1, etc.
indices = []
indices.append( itertools.izip(xrange(num_series), xrange(10)) )
indices.append( itertools.izip(xrange(num_series), xrange(10)) )
indices.append( xrange(resolutions) )

# This is huge! :/
# dimensions: 
#   series and value for which we calculate, 
#   series and value which precedes that measurement, 
#   resolution
ratios = np.empty((num_series, vocab_size, num_series, vocab_size, resolutions))

for idx in itertools.product(*indices):
    s0,v0 = idx[0]  # the series and symbol for which we calculate
    s1,v1 = idx[1]  # the series and symbol which should precede the one above
    res = idx[2]

    # Find the positions where s0==v0
    found0 = np.where(data[s0, res, :] == v0)[0]
    if found0.size == 0:
        continue

    # Check how often s1==v1 right before s0==v0
    candidates = (s1, res, (found0 - 1 + buffer) % buffer)
    found01 = np.count_nonzero(data[candidates] == v1)
    if found01 == 0:
        continue

    # total01 = number of positions where either s0 or s1 is defined (i.e. >=0)
    total01 = len(np.argwhere((data[s0, res, :] >= 0) & (data[s1, res, :] >= 0)))
    ratio = (found01/total01) if total01 > 0 else 0.0
    self._ratios[idx] = ratio


Get this bounty!!!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.