I have some satellite data which looks like the following (scatter plot):
I now want to bin this data into a regular grid over time and latitude and have each bin be equal to the mean of the all the data points that fall within it. I have been experimenting with scipy.stats.binned_statistic_2d and am baffled at the results I am getting.
First, if I pass the “count” statistic into the scipy binning function, it appears to work correctly (minimal code and plot below).
id1 = np.ma.masked_where(id1==0, id1) #id1 is the actual data and I have tried using this masking argument and without to the same effect x_range = np.arange(0,24.25,.25) #setting grid spacing for x and y y_range = np.arange(-13,14,1) xbins, ybins = len(x_range), len(y_range) #number of bins in each dimension H, xedges, yedges, binnumber = stats.binned_statistic_2d(idtime, idlat, values = id1, statistic='count' , bins = [xbins, ybins]) #idtime and idlat are the locations of each id1 value in time and latitude H = np.ma.masked_where(H==0, H) #masking where there was no data XX, YY = np.meshgrid(xedges, yedges) fig = plt.figure(figsize = (13,7)) ax1=plt.subplot(111) plot1 = ax1.pcolormesh(XX,YY,H.T)
Now if I change the statistic to mean, np.mean, np.ma.mean, etc… this is the plot I get which appears to pick out places there is data and where there is none:
Even though the min and max values for this data are 612 and 2237026 respectively. I have written some code that does this manually, but it isn’t pretty and takes forever (and I haven’t completely accounted for edge effects so running to error and then fixing it is taking forever).
I would love some advice to get this to work. Thanks!
Edit: I just noticed that I am getting a runtime warning after running the script which I can’t find any information about online. A google search for the warning returns zero results. The warning occurs for every statistic option except for count.
RuntimeWarning: invalid value encountered in less cbook._putmask(xa,
xa < 0.0, -1)
Edit2: I am attaching some code below that duplicates my problem. This code works for the statistic count but not for mean or any other statistic. This code produces the same run time warning from before in the same manner.
import matplotlib.pyplot as plt import numpy as np from scipy import stats x = np.random.rand(1000) y = np.random.rand(1000) z = np.arange(1000) H, xedges, yedges, binnumber = stats.binned_statistic_2d(x, y, values = z, statistic='count' , bins = [20, 20]) H2, xedges2, yedges2, binnumber2 = stats.binned_statistic_2d(x, y, values = z, statistic='mean' , bins = [20, 20]) XX, YY = np.meshgrid(xedges, yedges) XX2, YY2 = np.meshgrid(xedges2, yedges2) fig = plt.figure(figsize = (13,7)) ax1=plt.subplot(111) plot1 = ax1.pcolormesh(XX,YY,H.T) cbar = plt.colorbar(plot1,ax=ax1, pad = .015, aspect=10) plt.show() fig2 = plt.figure(figsize = (13,7)) ax2=plt.subplot(111) plot2 = ax2.pcolormesh(XX2,YY2,H2.T) cbar = plt.colorbar(plot2,ax=ax2, pad = .015, aspect=10) plt.show()
Edit 3: User8153 was able to identify the problem. The solution was to mask the array from scipy stats where nans occur. I used np.ma.masked_invalid() to do this. Plots of my original data and test data are below for the mean statistic.