An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110801/3090415a/attachment.pl>
Reorganize(stack data) a dataframe inducing names
3 messages · jim holtman, Francesca PANCOTTO
Try this: had to add extra names to your data since it was not clear how it was organized. Next time use 'dput' to enclose data.
x <- read.table(textConnection(" index time key date values
+ 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 + 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45 + 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22 + 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 + 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40 + 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 + 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48 + 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 + 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 + 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 + 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 + 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 + 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42 + 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43 + 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 + 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40 + 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43") + , header = TRUE + , as.is = TRUE + )
closeAllConnections() x
index time key date values 1 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 2 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45 3 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22 4 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 5 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40 6 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 7 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48 8 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 9 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 10 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 11 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 12 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 13 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42 14 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43 15 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 16 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40 17 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43
# get index of first occurance of 'key' column indx <- !duplicated(x$key) x[indx,]
index time key date values 1 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 4 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 6 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 8 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 9 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 10 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 12 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 15 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50
On Mon, Aug 1, 2011 at 11:13 AM, Francesca <francesca.pancotto at gmail.com> wrote:
Dear Contributors thanks for any help you can provide. I searched the threads but I could not find any query that satisfied my needs. This is my database: ?index time ? ? ? ? values 13732 ?27965 DATA.Q211.SUM.Index ? ?04/08/11 ? ? ? ? 1.42 13733 ?27974 DATA.Q211.SUM.Index ? ?05/10/11 ? ? ? ? 1.45 13734 ?27984 DATA.Q211.SUM.Index ? ?06/01/11 ? ? ? ? 1.22 13746 ?28615 DATA.Q211.TDS.Index ? ?04/07/11 ? ? ? ? 1.35 13747 ?28624 DATA.Q211.TDS.Index ? ?05/20/11 ? ? ? ? 1.40 13754 ?29262 DATA.Q211.UBS.Index ? ?05/02/11 ? ? ? ? 1.30 13755 ?29272 DATA.Q211.UBS.Index ? ?05/03/11 ? ? ? ? 1.48 13761 ?29915 DATA.Q211.UCM.Index ? ?04/28/11 ? ? ? ? 1.43 13768 ?30565 DATA.Q211.VDE.Index ? ?05/02/11 ? ? ? ? 1.48 13775 ?31215 DATA.Q211.WF.Index ? ? 04/14/11 ? ? ? ? 1.44 13776 ?31225 DATA.Q211.WF.Index ? ? 05/12/11 ? ? ? ? 1.42 13789 ?31865 DATA.Q211.WPC.Index ? ?04/01/11 ? ? ? ? 1.40 13790 ?31875 DATA.Q211.WPC.Index ? ?04/08/11 ? ? ? ? 1.42 13791 ?31883 DATA.Q211.WPC.Index ? ?05/10/11 ? ? ? ? 1.43 13804 ?32515 DATA.Q211.XTB.Index ? ?04/29/11 ? ? ? ? 1.50 13805 ?32525 DATA.Q211.XTB.Index ? ?05/30/11 ? ? ? ? 1.40 13806 ?32532 DATA.Q211.XTB.Index ? ?06/28/11 ? ? ? ? 1.43 I need to select only the rows of this database that correspond to each of the first occurrences of the string represented in column index. In the example shown I would like to obtain a new data.frame which is index time ? ? ? ? values 13732 ?27965 DATA.Q211.SUM.Index ? ?04/08/11 ? ? ? ? 1.42 13746 ?28615 DATA.Q211.TDS.Index ? ?04/07/11 ? ? ? ? 1.35 13754 ?29262 DATA.Q211.UBS.Index ? ?05/02/11 ? ? ? ? 1.30 13761 ?29915 DATA.Q211.UCM.Index ? ?04/28/11 ? ? ? ? 1.43 13768 ?30565 DATA.Q211.VDE.Index ? ?05/02/11 ? ? ? ? 1.48 13775 ?31215 DATA.Q211.WF.Index ? ?04/14/11 ? ? ? ? 1.44 13789 ?31865 DATA.Q211.WPC.Index ? ?04/01/11 ? ? ? ? 1.40 13804 ?32515 DATA.Q211.XTB.Index ? ?04/29/11 ? ? ? ? 1.50 As you can see, it is not the whole string to change, rather a substring that is part of it. I want to select only the first values related to the row that presents for the first time the different part of the string(substring). I know how to select rows according to a substring condition on the index column, but I cannot use it here because the substring changes and moreover the number of occurrences per substring is variable. Thank you for any help you can provide. Francesca ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110802/aeb31731/attachment.pl>