I have written a program in C that two xy datasets, aligns these 2 datasets based on shared features, transforms them into equal sized histograms, transforms the histograms into cumulative distribution functions (via GSL) and finally performs a KS_test. I am wanting to validate my program's results and figure'd i would use R but i am kinda stuck at ithe histograms (I have 2 histogram objects with the same #bins, weighted by the Y value) right now and not really sure how to transform these into CDF to perform a KS.test, I looked at the edf function but got kind of lost. I realize this is most likely a very basic question but I am just not that familiar with R :( Thanks in advanc -- View this message in context: http://r.789695.n4.nabble.com/Attempting-to-confirm-a-program-i-wrote-in-C-normalize-2-datasets-transform-into-histogram-transform-tp4656704.html Sent from the R help mailing list archive at Nabble.com.
Attempting to confirm a program i wrote in C (normalize 2 datasets, transform into histogram, transform into CDF, perform KS test)
3 messages · Jeff Newmiller, Tarskin
It is generally expected that the questioner pose a specific example so the respondent can have some assurance they are answering the actual question. In this case please provide a test data set, intermediate results, and final result generated by your C program. It would also show that you made a reasonable effort if you showed what steps you tried in R and where you think they went wrong. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity.
Tarskin <b.c.jansen at lumc.nl> wrote:
I have written a program in C that two xy datasets, aligns these 2 datasets based on shared features, transforms them into equal sized histograms, transforms the histograms into cumulative distribution functions (via GSL) and finally performs a KS_test. I am wanting to validate my program's results and figure'd i would use R but i am kinda stuck at ithe histograms (I have 2 histogram objects with the same #bins, weighted by the Y value) right now and not really sure how to transform these into CDF to perform a KS.test, I looked at the edf function but got kind of lost. I realize this is most likely a very basic question but I am just not that familiar with R :( Thanks in advanc -- View this message in context: http://r.789695.n4.nabble.com/Attempting-to-confirm-a-program-i-wrote-in-C-normalize-2-datasets-transform-into-histogram-transform-tp4656704.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
1 day later
The C program takes 2 mzML files from which the binary strings (according to the X data and Y data is uncompressed/decoded), it then examines spectral (xy data) similiary and combines both datasets into a new one and finally after all similar spectra have been merged it writes it all back into 1 new mzML file. I am assuming that people do not want to get several complete mzML files however, I will however include 2 spectra from different sources that were extracted from the total mzML files that I have been using to test this with. IgG2_G1F.data <http://r.789695.n4.nabble.com/file/n4656833/IgG2_G1F.data> IgG2_G1F.data <http://r.789695.n4.nabble.com/file/n4656833/IgG2_G1F.data> The steps i performed in R: experimental<-read.table(<first data set>, header=T) predicted<-read.table(<second data set>,header=T) hist_e<-hist(rep(experimental[,1],experimental[,2]), breaks=seq(0,2500,1)) hist_p<-hist(rep(predicted[,1],predicted[,2]),breaks=seq(0,2500,1)) -- up to here it does what I expect, my own C program would now transform the GSL hist object to a GSL cdf object -- ecdf(hist_p) seemed the logical choice to me however it complains about error in rank and my attempts to figure out what this means haven't been very clear so far. I would appreciate any pointer saying why this doesn't do what I'm expecting. -- View this message in context: http://r.789695.n4.nabble.com/Attempting-to-confirm-a-program-i-wrote-in-C-normalize-2-datasets-transform-into-histogram-transform-tp4656704p4656833.html Sent from the R help mailing list archive at Nabble.com.