simple generation of artificial data with defined features

Greg Snow · 2008-08-25T15:26:48Z

> -----Original Message----- > From: drflxms [mailto:drflxms at googlemail.com] > Sent: Saturday, August 23, 2008 6:47 AM > To: Greg Snow > Cc: r-help at r-project.org > Subject: Re: Re: [R] simple generation of artificial data > with defined features > > Hello Mr. Greg Snow! > > Thank you very much for your prompt answer. > > I don't think that the election data is the right data to > demonstrate Kappa, you need subjects that are classified by 2 > or more different raters/methods. The election

Greg Snow

Mon, Aug 25, 2008 8:26 AM

Ok, rethinking it in these terms is fine (just a transpose of mine), but you still have the same problem with only having 1 election.  Generally analyzing data with only one datapoint (generally 0 degrees of freedom) does not give you much, if any, information.  Let's look at your doctors finding the stenosis and start with the simpler case of just 2 doctors.  If you only show them 1 video and ask the question once, then the 2 doctors will agree either 100% of the time or 0% of the time.  Is either of those numbers meaningful?  If we add more doctors, then we still will have either 100% agreement or 0% agreement with only 1 observation.  With 1 election, what can you say about the agreement?  If you have info on multiple elections (maybe other candidates within the same election), then you can measure the agreement using kappa style scores, but I don't think that any version of kappa is designed to work for 1 observation.  Hence my suggestion of looking for different data to help understand the function.

Well the help file for the function you are using shows one sample data set, you can also look in the references cited in that same help page, those could lead you to other understandable datasets.

I find that when I am trying to understand something, simulated datasets help me, that way I know the "truth" and can see how the statistic changes for different "truths".  You can keep the story in terms of elections to keep it understandable to the audience, but then simulate data representing multiple elections/offices/etc. looking at different degrees of relationship.  I would start with pure randomness/independence (easy to simulate, any agreement is due to chance), then go to pure dependence (if they voted one way for the 1st election/candidate, the always voted the same for the rest), then look at different levels in between (generate 1st vote randomly, but 2nd vote has 90% probability of being the same, 10% of being ranomly from the remaining) and do this for different levels of dependence.  This should help with your understanding of how the kappa value represents the agreement.

--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
(801) 408-8111

simple generation of artificial data with defined features

Thread (5 messages)