Skip to content

date in plot, can't add regression line

4 messages · Norbert Skalski, Nordlund, Dan (DSHS/RDA), John Kane +1 more

#
Hello all,

I have been using R for about 3 weeks and I am frustrated by a problem.  I have read R in a nutshell, scoured the internet for help but I either am not understanding examples or am missing something completely basic.  Here is the problem:

I want to plot data that contains dates on the x axis.  Then I want to fit a line to the data.  I have been unable to do it.

This is an example of the data (in a dataframe called "tradeflavorbyday"), 40 lines of it (I'm sorry it's not in a runnable form, not sure how to get that from R) :
     tradeflavor  timestamp   x
1              1 2009-01-22   1
2              2 2009-01-22   1
3              1 2009-01-23   1
4              1 2009-01-27  54
5              1 2009-01-28 105
6              2 2009-01-28   2
7             16 2009-01-28   2
8              1 2009-01-29  71
9             16 2009-01-29   2
10             1 2009-01-30  42
11             1 2009-02-02  19
12            16 2009-02-02   2
13             1 2009-02-03  36
14             4 2009-02-03   2
15             8 2009-02-03   3
16             1 2009-02-04  73
17             8 2009-02-04  12
18            16 2009-02-04   7
19             1 2009-02-05  53
20             8 2009-02-05   6
21            16 2009-02-05   9
22             1 2009-02-06  38
23             4 2009-02-06   6
24             8 2009-02-06   2
25            16 2009-02-06   3
26             1 2009-02-09  42
27             2 2009-02-09   2
28             4 2009-02-09   1
29             8 2009-02-09   2
30             1 2009-02-10  87
31             4 2009-02-10   2
32             8 2009-02-10   4
33            16 2009-02-10   3
34             1 2009-02-11  55
35             2 2009-02-11   6
36             4 2009-02-11   4
37             8 2009-02-11   2
38            16 2009-02-11   8
39             1 2009-02-12 153
40             2 2009-02-12   6


The plot displays the x column as the yaxis and the date as the x axis, grouped by the tradetype column.
The timestamp column:
[1] "POSIXlt" "POSIXt"

So in this case I want to plot tradetype 1 (method 1):

xdates <- tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1]
ydata <- tradeflavorbyday$x[tradeflavorbyday$tradeflavor == 1]

plot(xdates, ydata, col="black", xlab="Dates", ylab="Count")

Up to here it works great.

Now a abline through lm:

xylm <- lm(ydata~xdates)   <------ this fails, can't do dates as below
abline(xylm, col="black")
Error in model.frame.default(formula = ydata ~ xdates, drop.unused.levels = TRUE) : 
  invalid type (list) for variable 'xdates'



So I try this instead (method 2):
xdata <- 1:length(tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1])
ydata <- tradeflavorbyday$x[tradeflavorbyday$tradeflavor == 1]

xylm <- lm(ydata~xdata)   <------ now this works, great
abline(xylm, col="black")

The problem now is that I can't get the dates onto the xaxis.  I have tried turning off the axis using xaxt="n" and reploting using the axis.POSIXct() call but it does not want to display the dates:

dateseq = seq(xdates[1], xdates[length(xdates)], by="month")
axis.POSIXct(1, at=dateseq, format="%Y\n%b")




I have tried combining both approaches by plotting dates and trying to fit the line using method 2:
xdates <- tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1]
xdata <- 1:length(tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1])
ydata <- tradeflavorbyday$x[tradeflavorbyday$tradeflavor == 1]

plot(xdates, ydata, col="black", xlab="Dates", ylab="Count", xaxt="n")
dateseq = seq(xdates[1], xdates[length(xdates)], by="month")
axis.POSIXct(1, at=dateseq, format="%Y\n%b")

xylm <- lm(ydata~xdata)  <- works
abline(xylm, col="black")  <- does nothing

In this case the call to lm and abline "works" but nothing is drawn.  Confused I plugged in the coefficients manually (I have complete data, so they will be different than the example data I pasted):
Call:
lm(formula = ydata ~ xdata)

Coefficients:
(Intercept)        xdata  
    6.11491     -0.02577  

Abline(6.11491, -0.02577)  <- call worked, but nothing shown

Just by chance I added many 0 to flatten out the slope:

Abline(6.11491, -0. 0000000002577)  <- call worked and a horizontal line appeared?????

So I took off a 0:

Abline(6.11491, -0. 000000002577)  <- the line moved significantly down

So I took off another 0:

Abline(6.11491, -0. 00000002577)   <- line disappeared

I guess the slope causes it to go vertical and disappear of the graph.

I have no idea how to solve my issue.  If anyone can see my basic idiotic error please point it out, or maybe you have another suggestion, I will gladly try it.

Thanks for your help!!
#
You might try converting timestamp as follows

xdates <- as.POSIXct(tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1])

Your original code should now work.


Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
#
First of all, a practical way to supply data is to use the function dput()

Just do dput(mydata) and copy and paste the results into your email.  The reader can copy and paste into R and have an identical data set.

I am not sure I have followed exactly what you are doing but here is something that may approach what you want, done using the ggp;pt2 package.  Do 'install.packages("ggplot2) if you do not have it.

Anyway here is roughly your data set in the dput format

mydata  <-  structure(list(tradeflavor = c(1L, 2L, 1L, 1L, 1L, 2L, 16L, 1L, 
16L, 1L, 1L, 16L, 1L, 4L, 8L, 1L, 8L, 16L, 1L, 8L, 16L, 1L, 4L, 
8L, 16L, 1L, 2L, 4L, 8L, 1L, 4L, 8L, 16L, 1L, 2L, 4L, 8L, 16L, 
1L, 2L), timestamp = structure(c(14266, 14266, 14267, 14271, 
14272, 14272, 14272, 14273, 14273, 14274, 14277, 14277, 14278, 
14278, 14278, 14279, 14279, 14279, 14280, 14280, 14280, 14281, 
14281, 14281, 14281, 14284, 14284, 14284, 14284, 14285, 14285, 
14285, 14285, 14286, 14286, 14286, 14286, 14286, 14287, 14287
), class = "Date"), x = c(1L, 1L, 1L, 54L, 105L, 2L, 2L, 71L, 
2L, 42L, 19L, 2L, 36L, 2L, 3L, 73L, 12L, 7L, 53L, 6L, 9L, 38L, 
6L, 2L, 3L, 42L, 2L, 1L, 2L, 87L, 2L, 4L, 3L, 55L, 6L, 4L, 2L, 
8L, 153L, 6L)), .Names = c("tradeflavor", "timestamp", "x"), row.names = c(NA, 
-40L), class = "data.frame")

#=====================================
library(ggplot2)

# first subset

m1data  <-  subset(mydata, tradeflavor == 1)

# plot for tradeflavor = 1
p1   <-  ggplot(m1data , aes( timestamp, x)) + geom_point()   +
           geom_smooth(method = lm, se = FALSE)
p1

m2data  <-  subset(mydata, tradeflavor == 2)

p2   <-  ggplot(m2data , aes( timestamp, x )) + geom_point()   +
           geom_smooth(method = lm, se = FALSE)
p2

# plot a grid of results
pgrid  <-  p   <-  ggplot(mydata , aes( timestamp, x)) + geom_point()   +
           geom_smooth(method = lm, se = FALSE) + facet_grid(tradeflavor ~ .)
pgrid

# Have fun with R.




John Kane
Kingston ON Canada
____________________________________________________________
GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys
Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails
#
Hello,

Inline.
Em 28-08-2012 18:23, Nordlund, Dan (DSHS/RDA) escreveu:
It does, I've just tried it.

Also, regarding the op statement "(I'm sorry it's not in a runnable 
form, not sure how to get that from R)":

# It's easy to read in the data
tfday <- read.table(text="
    tradeflavor  timestamp   x
1              1 2009-01-22   1
2              2 2009-01-22   1
3              1 2009-01-23   1
[...etc...]
39             1 2009-02-12 153
40             2 2009-02-12   6
", header=TRUE, stringsAsFactors=FALSE)

# But it's better to paste the output of dput().
dput(tfday)
structure(list(tradeflavor = c(1L, 2L, 1L, 1L, 1L, 2L, 16L, 1L,
16L, 1L, 1L, 16L, 1L, 4L, 8L, 1L, 8L, 16L, 1L, 8L, 16L, 1L, 4L,
8L, 16L, 1L, 2L, 4L, 8L, 1L, 4L, 8L, 16L, 1L, 2L, 4L, 8L, 16L,
1L, 2L), timestamp = structure(c(1232582400, 1232582400, 1232668800,
1233014400, 1233100800, 1233100800, 1233100800, 1233187200, 1233187200,
1233273600, 1233532800, 1233532800, 1233619200, 1233619200, 1233619200,
1233705600, 1233705600, 1233705600, 1233792000, 1233792000, 1233792000,
1233878400, 1233878400, 1233878400, 1233878400, 1234137600, 1234137600,
1234137600, 1234137600, 1234224000, 1234224000, 1234224000, 1234224000,
1234310400, 1234310400, 1234310400, 1234310400, 1234310400, 1234396800,
1234396800), class = c("POSIXct", "POSIXt"), tzone = ""), x = c(1L,
1L, 1L, 54L, 105L, 2L, 2L, 71L, 2L, 42L, 19L, 2L, 36L, 2L, 3L,
73L, 12L, 7L, 53L, 6L, 9L, 38L, 6L, 2L, 3L, 42L, 2L, 1L, 2L,
87L, 2L, 4L, 3L, 55L, 6L, 4L, 2L, 8L, 153L, 6L)), .Names = c("tradeflavor",
"timestamp", "x"), row.names = c("1", "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17",
"18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28",
"29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39",
"40"), class = "data.frame")

# Now all we need to do is copy and paste this into an R session:

tfday <- structure(...etc...)

# Finally, for the sake of completeness, the rest of the code.
tfday$timestamp <- as.POSIXct(tfday$timestamp)
inx <- tfday$tradeflavor == 1 # do this once
xdates <- tfday$timestamp[inx]
ydata <- tfday$x[inx]

plot(xdates, ydata)
model <- lm(ydata ~ xdates)
abline(model)

Hope this helps,

Rui Barradas