dfthin <- df[ c(which(iter %% 500 == 0),nrow(df) ]
or
dfthin <- subset(df, (iter %% 500 == 0) | (seq.int(nrow(df)==nrow(df)))
N.B. You should avoid using the name "df" for your variables, because it is the name of a built-in function that you are hiding by doing so. Others may be confused, and eventually you may want to use that function yourself. One solution is to use DF for your variables... another is to use more descriptive names.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Giovanni Azua <bravegag at gmail.com> wrote:
Hello,
I bumped into the following funny use-case. I have too much data for a
given plot. I have the following data frame df:
'data.frame': 5015 obs. of 5 variables:
$ n : Factor w/ 5 levels "1000","2000",..: 1 1 1 1 1 1 1 1 1 1
...
$ iter : int 10 20 30 40 50 60 70 80 90 100 ...
$ Error : num 1.05e-02 1.24e-03 3.67e-04 1.08e-04 4.05e-05 ...
$ Duality_Gap: num 20080 3789 855 443 321 ...
$ Runtime : num 0.00536 0.01353 0.01462 0.01571 0.01681 ...
But if I plot e.g. Runtime vs log(Duality Gap) I have too many
observations due to taking a snapshot every 10 iterations rather than
say 500 and the plot looks very cluttered. So I would like to trim the
data frame including only those records for which iter is multiple of
500 and so I do this:
df <- subset(df, iter %% 500 == 0)
This gives me almost exactly what I need except that the last and most
important Duality Gap observations are of course gone due to the
filtering ... I would like to change the subset clause to be iter %%
500 _or_ the record is the last per n (n is my problem size and
category in this case) ... how can I do that?
I thought of adding a new column that flags whether a given row is the
last element per category as "last" Boolean but this is a bit too
complicated .. is there a simpler condition construct that can be used
with the subset command?
TIA,
Best regards,
Giovanni