Skip to content

Variance of multiple non-contiguous time periods?

10 messages · Jim Lemon, PIKAL Petr, CJ Davies +1 more

#
On 30/10/14 21:33, Jim Lemon wrote:
If I understand, you mean to calculate deviations for each individual
'chunk' of each transition & then aggregate the results? This is what
I'd been thinking about, but is there a sensible manner within R to
achieve this, or is it something for which it would be easier to
preprocess the data in an external tool? Is there some way to subset the
data such that I can work over just contiguous 'chunks'?

Regards,
CJ Davies
#
On Mon, 3 Nov 2014 12:45:03 PM CJ Davies wrote:
the
Exactly. If there is some combination of existing variables that can be 
combined to make a set of unique values for each "chunk", you can 
calculate the deviations within each "chunk", then average the squared 
deviations for each type of "chunk", weighting by the duration of the 
"chunks" so that you don't bias the pooled variance toward the longer 
"chunks".

Jim
#
On 04/11/14 09:11, Jim Lemon wrote:
I am stumped for a way of automating this process though. Each line of 
log data looks like this;

2406	55.4	(-11.2, 1.0, -0.9)	(-4.1, 1.0, 0.0)	7.077912	0.9203392	(0.0, 
0.7, -0.1, 0.7)	8.129684	89.41537	-8.212769	(0.0, 0.7, -0.1, 0.7) 
8.129684	89.41537	351.7872	1	0	0	False	0.15	3	37.76761	True	False	0 
transition 1

Where the last variable defines which transition is currently active. 
However to separate these data into 'chunks' would involve making a 
comparison between each line of data & the preceding line of data to 
determine whether it is part of the same contiguous 'chunk'. Is this 
something that would be better achieved using external preprocessing 
written in a language I am more familiar with, as I haven't the foggiest 
how I would approach this within R?

Regards,
CJ Davies
#
Hi
First you need to import it to R which could be tricky based on above line.
Some values will probably need to process through regular expression.

If I understand correctly number after transition is a signal which estimets continuous chunks. If it is true then

?rle is a function which can estimate length of chunks.

Cheers
Petr
________________________________
Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a jsou ur?eny pouze jeho adres?t?m.
Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a jeho kopie vyma?te ze sv?ho syst?mu.
Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento email jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat.
Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou modifikacemi ?i zpo?d?n?m p?enosu e-mailu.

V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?:
- vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu.
- a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn? p?ijmout; Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze strany p??jemce s dodatkem ?i odchylkou.
- trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech.
- odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn? zmocn?n nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi tohoto emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo jejich existence je adres?tovi ?i osob? j?m zastoupen? zn?m?.

This e-mail and any documents attached to it may be confidential and are intended only for its intended recipients.
If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and its copies from your system.
If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any manner.
The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of the email.

In case that this e-mail forms part of business dealings:
- the sender reserves the right to end negotiations about entering into a contract in any time, for any reason, and without stating any reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer) excludes any acceptance of the offer on the part of the recipient containing any amendment or variation.
- the sender insists on that the respective contract is concluded only upon an express mutual agreement on all its aspects.
- the sender of this e-mail informs that he/she is not authorized to enter into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such authorization or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization is known to the recipient of the person represented by the recipient.
#
On 04/11/14 16:13, PIKAL Petr wrote:
Importing into R wasn't an issue; some of the fields contain spaces & 
symbols, but all the fields are tab separated so I can simply use;

foo <- read.csv("bar",header=T,sep="\t")

I've just written a hacky bit of Java that gives me the lines of each 
'chunk' as a separate list & I think I'll then calculate these 
particular values using Java's Math class rather than trying to come up 
with a sensible way to import these 'chunks' back into R. When it comes 
to string/list manipulation like this I think my knowledge in Java & 
lack of knowledge in R makes the former the better option!

Regards,
CJ Davies
#
On Nov 4, 2014, at 8:35 AM, CJ Davies wrote:

            
snipped
If you had offered the output of dput(head(foo, 20) ) and explained what defined a "chunk-defining transition", it would have been fairly easy to show you how to use cumsum in an ave() call to construct a grouping variable.
David Winsemius
Alameda, CA, USA
#
On 04/11/14 17:02, David Winsemius wrote:
Here is an example 100 lines of the input --> http://paste2.org/2LZVGP5K

The final value on each line, under the header "environment", is always 
one of ["real", "transition 1", "transition 2", "transition 3", 
"transition 4"]. A 'chunk-defining transition' is when this value changes.

If there is a way to do this in R in a more elegant fashion than my 
hacky Java, then I would be glad to learn.

Regards,
CJ Davies
#
On Nov 4, 2014, at 9:16 AM, CJ Davies wrote:

            
That pasted material does not appear to preserve the tabs. Input with your suggested code "does not work" in the sense that it brings in an object like this.
trying URL 'http://paste2.org/2LZVGP5K'
Content type 'text/html; charset=UTF-8' length unknown
opened URL
.......... .......... ........
downloaded 28 Kb
'data.frame':	2829 obs. of  1 variable:
 $ X..DOCTYPE.html.: Factor w/ 669 levels "","          ",..: 106 104 219 233 220 222 221 215 217 79 ...

I SAY AGAIN:

Need ; output of dput(head(foo, 100) )
David Winsemius
Alameda, CA, USA
#
On 04/11/14 17:42, David Winsemius wrote:
That was a pastebin URI, so what you downloaded was HTML instead of raw
text. This is the raw text;

http://cjdavies.org/foo

Regards,
CJ Davies
#
On Nov 4, 2014, at 3:41 PM, CJ Davies wrote:

            
Well, it was text but it had no tabs. On this mailing list, HTML is considered evil.
FALSE  TRUE 
  503   106
1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26 
 20   1   6   1   1   1   4   1   1   2  16   1   7   4  14   2   6   1   2   4   1   4   2   8   6   2 
 27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52 
  2   1   7   1   1   2   2   2   6  10   3   1  12   3   1  10  18   6   1   6  14   4   1  19  13  10 
 53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78 
  6   2  10  14   3   2   1   2   1   1   1  15   4   2   2   6  21   5   1  16   5   3   1   2  21   3 
 79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102 103 104 
  1   2   3   4   4   3   5   1   9   1   3   3   7   2   5   6   6   5  13   1   1   8   1   2   2   3 
105 106 107 
  6   9  70 

So now you have a chunking index and can use `by` or `ave` or `for()`-loops
That was displayed as it it had tabs and after correcting the error of using T for TRUE it did succeed.
David Winsemius
Alameda, CA, USA