Skip to content
Back to formatted view

Raw Message

Message-ID: <2372070191e24ba4a49378c4a69bd8f0@srbc.net>
Date: 2019-08-21T22:54:02Z
From: Shank, Matthew
Subject: Calculating percentile rank of sample dataset compared to reference dataset in R
In-Reply-To: <a8f5e88c964740febae92e3d384348b7@srbc.net>

Hello R-sig-ecology mailing list,



I?m working on a mutlivariate water quality index where the concentration of parameter i at site j is normalized by calculating the percentile rank of the value using a much larger reference dataset.



As an example, I have generated a sample dataset of water quality parameters (df_sample) and a larger reference dataset (df_ref). I?d like to calculate the percentile rank of each parameter, at each site, using a reference dataset of a much larger size.



Example data is below. If anyone has a solution that avoids for loops that would be preferred.





#generate sample data

df_sample <- data.frame(site = letters[1:10], iron = runif(10, min=0, max=1), nitrate = runif(10, min=0, max=10))

df_sample





#generate reference dataset

df_ref <- data.frame(iron = seq(0, 1, length.out = 1000), nitrate = seq(0, 10, length.out = 1000))

df_ref

# now would like to calculate percentile rank of iron and nitrate at all sites (a:j) based on identical columns in df_ref and include as a new column in df_sample



Many thanks,
|><?Ma??tt?)o>


	[[alternative HTML version deleted]]