Restricted Simulation from GPD & Normal Distributions
Hi R-users, (fixing some typos in my previous mail) I have data on one equity-related variable X, denoted by x1,x2,x3,.......x1000 which has been ordered as x1<x2<....<x1000. I have identified the upper and lower 5 percentiles, i.e. x50 and x950 respectively. Based on some analysis, I have inferred that three different density functions fit the three parts of the data decently well, - f1 fits the data for all x<x50 ----- 50 observations - f2 fits the data well for all x50<x<x950 ------- 900 observations - f3 fits the data well for all x>x950------ 50 obsrvations Idea is to simulate 50 new observations from f1 *restricted to (- infinity, x50 ]*, 50 new observations from f3 *restricted to ( x950, infinity )* and 900 new observations from f2 *restricted between (x50, x950 ]*. So total number of observations in the simulated data = 1000 as before. For the example I am working with, f1 and f3 are GPD ( Generalized Pareto Distribution ) while f2 is Normal with some parameters. I want to write a function which will take as inputs - the entire data (of size 1000) - the cut-off points x50 and x950 - the 3 distributions (along with their parameters) - the number of data points from each of the 3 segments (50, 900, 50 in this example) - note that f1, f2 and f3 need to be properly restricted to the corresponding intervals (mentioned in Bold in the description above) and will output the simulated data with original sample size (here, 1000). I'll really appreciate any help writing this function. If anything else is required, please let me know.
On Tue, Dec 27, 2016 at 3:46 PM, Preetam Pal <lordpreetam at gmail.com> wrote:
HI R-users, I have data on one equity-related variable X, denoted by x1,x2,x3,.......x1000 which has been ordered as x1<x2<....<x1000. I have identified the upper and lower 5 percentiles, i.e. x50 and x950 respectively. Based on some analysis, I have inferred that three different density functions fit the three parts of the data decently well, - fi fits the data for all x<x50 ----- 50 observations - f2 fits the data well for all x50<x<x950 ------- 900 observations - f3 fits the data well for all x>x950------ 50 obsrvations Idea is to simulate 50 new observations from f1 *restricted to (- infinity, x50 ]*, 50 new observations from f3 *restricted to ( x950, infinity )* and 900 new observations from f2 *restricted between (x50, x950 ]*. So total number of observations in the simulated data = 1000 as before. For the example I am working with, f1 and f2 are GPD ( Generalized Pareto Distribution ) while f2 is Normal with some parameters. I want to write a function which will take as inputs - the entire data (of size 1000) - the cut-off points x50 and x950 - the 3 distributions (along with their parameters) - the number of data points from each of the 3 segments (50, 900, 50 in this example) - note that f1, f2 and f3 need to be properly restricted to the corresponding intervals (mentioned in Bold in the description above) and will output the simulated data with original sample size (here, 1000). I'll really appreciate any help writing this function. If anything else is required, please let me know. -- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata.
Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]]