#StackBounty: #pgfplots #pgfplotstable Filtering input data as it is read

Bounty: 50

One thing which is quite tedious with producing pgf plots is preparing or the raw data for pgf. Specifically, making the data set smaller to avoid the memory cap.

Generate some dummy data in R:

nPoints <- 10^6
df <- data.frame(seq(nPoints), cumsum(runif(nPoints, 0, 1)))
fwrite(x=df, file="data.dat", sep=" ", col.names=F)

Plotting data.dat directly results in a capacity exceeded error on my machine:

TeX capacity exceeded, sorry [pool size=6177416].

I thought it would be possible to filter the input data directly by doing something like,

    addplot+[only marks] table [
                x index={0}, 
                y expr={ifthenelse(mod(coordindex, 10000) == 0, thisrowno{1}, NaN)},
                unbounded coords=jump,
             ] {data.dat}; 

However this just loads the whole data file and then displays the specified points instead of loading only the specified points. The pgfplotstable package offers pgfplotstabletypeset[every nth row={integer}[shift]{options}] which looks useful. However it isn’t clear what the options should be in order to delete rows from the read data, and whether the typesetting happens during or after data is read.

Is it possible to read only selected lines of a file with pgfplotstable and if so, how?

Get this bounty!!!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.