# #StackBounty: #python #numpy #vectorization Vectorizing calculation in matrix with interdependent values

### Bounty: 100

I am tracking multiple discreet timeseries at multiple temporal resolutions, resulting in an SxRxB matrix where S is the number of timeseries, R is the number of different resolutions and B is the buffer, i.e. how many values each series remembers. Each series is discreet and uses a limited range of natural numbers to represent its values. I will call these “symbols” here.

For each series I want to calculate how often any of the previous measurement’s symbols directly precedes any of the current measurement’s symbols, over all measurements. I have solved this with a for-loop as seen below, but would like to vectorize it for obvious reasons.

I’m not sure if my way of structuring data is efficient, so I’m open for suggestions there. Especially the ratios matrix could be done differently I think.

``````buffer = 10
resolutions = 5
num_series = 10
vocab_size = 10
data = np.full((num_series, resolutions, buffer), -1, dtype=int16)

<...fill data with data...>

# in this example: calculate for series 0 with symbol 0, series 1
# with symbol 1, etc.
indices = []
indices.append( itertools.izip(xrange(num_series), xrange(10)) )
indices.append( itertools.izip(xrange(num_series), xrange(10)) )
indices.append( xrange(resolutions) )

# This is huge! :/
# dimensions:
#   series and value for which we calculate,
#   series and value which precedes that measurement,
#   resolution
ratios = np.empty((num_series, vocab_size, num_series, vocab_size, resolutions))

for idx in itertools.product(*indices):
s0,v0 = idx[0]  # the series and symbol for which we calculate
s1,v1 = idx[1]  # the series and symbol which should precede the one above
res = idx[2]

# Find the positions where s0==v0
found0 = np.where(data[s0, res, :] == v0)[0]
if found0.size == 0:
continue

# Check how often s1==v1 right before s0==v0
candidates = (s1, res, (found0 - 1 + buffer) % buffer)
found01 = np.count_nonzero(data[candidates] == v1)
if found01 == 0:
continue

# total01 = number of positions where either s0 or s1 is defined (i.e. >=0)
total01 = len(np.argwhere((data[s0, res, :] >= 0) & (data[s1, res, :] >= 0)))
ratio = (found01/total01) if total01 > 0 else 0.0
self._ratios[idx] = ratio
``````

Get this bounty!!!

This site uses Akismet to reduce spam. Learn how your comment data is processed.