Hello all,
I have been using R for about 3 weeks and I am frustrated by a problem. I have read R in a nutshell, scoured the internet for help but I either am not understanding examples or am missing something completely basic. Here is the problem:
I want to plot data that contains dates on the x axis. Then I want to fit a line to the data. I have been unable to do it.
This is an example of the data (in a dataframe called "tradeflavorbyday"), 40 lines of it (I'm sorry it's not in a runnable form, not sure how to get that from R) :
tradeflavor timestamp x
1 1 2009-01-22 1
2 2 2009-01-22 1
3 1 2009-01-23 1
4 1 2009-01-27 54
5 1 2009-01-28 105
6 2 2009-01-28 2
7 16 2009-01-28 2
8 1 2009-01-29 71
9 16 2009-01-29 2
10 1 2009-01-30 42
11 1 2009-02-02 19
12 16 2009-02-02 2
13 1 2009-02-03 36
14 4 2009-02-03 2
15 8 2009-02-03 3
16 1 2009-02-04 73
17 8 2009-02-04 12
18 16 2009-02-04 7
19 1 2009-02-05 53
20 8 2009-02-05 6
21 16 2009-02-05 9
22 1 2009-02-06 38
23 4 2009-02-06 6
24 8 2009-02-06 2
25 16 2009-02-06 3
26 1 2009-02-09 42
27 2 2009-02-09 2
28 4 2009-02-09 1
29 8 2009-02-09 2
30 1 2009-02-10 87
31 4 2009-02-10 2
32 8 2009-02-10 4
33 16 2009-02-10 3
34 1 2009-02-11 55
35 2 2009-02-11 6
36 4 2009-02-11 4
37 8 2009-02-11 2
38 16 2009-02-11 8
39 1 2009-02-12 153
40 2 2009-02-12 6
The plot displays the x column as the yaxis and the date as the x axis, grouped by the tradetype column.
The timestamp column:
class(tradeflavorbyday$timestamp)
[1] "POSIXlt" "POSIXt"
So in this case I want to plot tradetype 1 (method 1):
xdates <- tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1]
ydata <- tradeflavorbyday$x[tradeflavorbyday$tradeflavor == 1]
plot(xdates, ydata, col="black", xlab="Dates", ylab="Count")
Up to here it works great.
Now a abline through lm:
xylm <- lm(ydata~xdates) <------ this fails, can't do dates as below
abline(xylm, col="black")
lm(ydata~xdates)
Error in model.frame.default(formula = ydata ~ xdates, drop.unused.levels = TRUE) :
invalid type (list) for variable 'xdates'
So I try this instead (method 2):
xdata <- 1:length(tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1])
ydata <- tradeflavorbyday$x[tradeflavorbyday$tradeflavor == 1]
xylm <- lm(ydata~xdata) <------ now this works, great
abline(xylm, col="black")
The problem now is that I can't get the dates onto the xaxis. I have tried turning off the axis using xaxt="n" and reploting using the axis.POSIXct() call but it does not want to display the dates:
dateseq = seq(xdates[1], xdates[length(xdates)], by="month")
axis.POSIXct(1, at=dateseq, format="%Y\n%b")
I have tried combining both approaches by plotting dates and trying to fit the line using method 2:
xdates <- tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1]
xdata <- 1:length(tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1])
ydata <- tradeflavorbyday$x[tradeflavorbyday$tradeflavor == 1]
plot(xdates, ydata, col="black", xlab="Dates", ylab="Count", xaxt="n")
dateseq = seq(xdates[1], xdates[length(xdates)], by="month")
axis.POSIXct(1, at=dateseq, format="%Y\n%b")
xylm <- lm(ydata~xdata) <- works
abline(xylm, col="black") <- does nothing
In this case the call to lm and abline "works" but nothing is drawn. Confused I plugged in the coefficients manually (I have complete data, so they will be different than the example data I pasted):
lm(ydata~xdata)
Call:
lm(formula = ydata ~ xdata)
Coefficients:
(Intercept) xdata
6.11491 -0.02577
Abline(6.11491, -0.02577) <- call worked, but nothing shown
Just by chance I added many 0 to flatten out the slope:
Abline(6.11491, -0. 0000000002577) <- call worked and a horizontal line appeared?????
So I took off a 0:
Abline(6.11491, -0. 000000002577) <- the line moved significantly down
So I took off another 0:
Abline(6.11491, -0. 00000002577) <- line disappeared
I guess the slope causes it to go vertical and disappear of the graph.
I have no idea how to solve my issue. If anyone can see my basic idiotic error please point it out, or maybe you have another suggestion, I will gladly try it.
Thanks for your help!!
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Norbert Skalski
Sent: Tuesday, August 28, 2012 9:49 AM
To: r-help at r-project.org
Subject: [R] date in plot, can't add regression line
Hello all,
I have been using R for about 3 weeks and I am frustrated by a problem.
I have read R in a nutshell, scoured the internet for help but I either
am not understanding examples or am missing something completely basic.
Here is the problem:
I want to plot data that contains dates on the x axis. Then I want to
fit a line to the data. I have been unable to do it.
This is an example of the data (in a dataframe called
"tradeflavorbyday"), 40 lines of it (I'm sorry it's not in a runnable
form, not sure how to get that from R) :
tradeflavor timestamp x
1 1 2009-01-22 1
2 2 2009-01-22 1
3 1 2009-01-23 1
4 1 2009-01-27 54
5 1 2009-01-28 105
6 2 2009-01-28 2
7 16 2009-01-28 2
8 1 2009-01-29 71
9 16 2009-01-29 2
10 1 2009-01-30 42
11 1 2009-02-02 19
12 16 2009-02-02 2
13 1 2009-02-03 36
14 4 2009-02-03 2
15 8 2009-02-03 3
16 1 2009-02-04 73
17 8 2009-02-04 12
18 16 2009-02-04 7
19 1 2009-02-05 53
20 8 2009-02-05 6
21 16 2009-02-05 9
22 1 2009-02-06 38
23 4 2009-02-06 6
24 8 2009-02-06 2
25 16 2009-02-06 3
26 1 2009-02-09 42
27 2 2009-02-09 2
28 4 2009-02-09 1
29 8 2009-02-09 2
30 1 2009-02-10 87
31 4 2009-02-10 2
32 8 2009-02-10 4
33 16 2009-02-10 3
34 1 2009-02-11 55
35 2 2009-02-11 6
36 4 2009-02-11 4
37 8 2009-02-11 2
38 16 2009-02-11 8
39 1 2009-02-12 153
40 2 2009-02-12 6
The plot displays the x column as the yaxis and the date as the x axis,
grouped by the tradetype column.
The timestamp column:
class(tradeflavorbyday$timestamp)
[1] "POSIXlt" "POSIXt"
So in this case I want to plot tradetype 1 (method 1):
xdates <- tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1]
ydata <- tradeflavorbyday$x[tradeflavorbyday$tradeflavor == 1]
plot(xdates, ydata, col="black", xlab="Dates", ylab="Count")
Up to here it works great.
Now a abline through lm:
xylm <- lm(ydata~xdates) <------ this fails, can't do dates as below
abline(xylm, col="black")
lm(ydata~xdates)
Error in model.frame.default(formula = ydata ~ xdates,
drop.unused.levels = TRUE) :
invalid type (list) for variable 'xdates'
You might try converting timestamp as follows
xdates <- as.POSIXct(tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1])
Your original code should now work.
Hope this is helpful,
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
First of all, a practical way to supply data is to use the function dput()
Just do dput(mydata) and copy and paste the results into your email. The reader can copy and paste into R and have an identical data set.
I am not sure I have followed exactly what you are doing but here is something that may approach what you want, done using the ggp;pt2 package. Do 'install.packages("ggplot2) if you do not have it.
Anyway here is roughly your data set in the dput format
mydata <- structure(list(tradeflavor = c(1L, 2L, 1L, 1L, 1L, 2L, 16L, 1L,
16L, 1L, 1L, 16L, 1L, 4L, 8L, 1L, 8L, 16L, 1L, 8L, 16L, 1L, 4L,
8L, 16L, 1L, 2L, 4L, 8L, 1L, 4L, 8L, 16L, 1L, 2L, 4L, 8L, 16L,
1L, 2L), timestamp = structure(c(14266, 14266, 14267, 14271,
14272, 14272, 14272, 14273, 14273, 14274, 14277, 14277, 14278,
14278, 14278, 14279, 14279, 14279, 14280, 14280, 14280, 14281,
14281, 14281, 14281, 14284, 14284, 14284, 14284, 14285, 14285,
14285, 14285, 14286, 14286, 14286, 14286, 14286, 14287, 14287
), class = "Date"), x = c(1L, 1L, 1L, 54L, 105L, 2L, 2L, 71L,
2L, 42L, 19L, 2L, 36L, 2L, 3L, 73L, 12L, 7L, 53L, 6L, 9L, 38L,
6L, 2L, 3L, 42L, 2L, 1L, 2L, 87L, 2L, 4L, 3L, 55L, 6L, 4L, 2L,
8L, 153L, 6L)), .Names = c("tradeflavor", "timestamp", "x"), row.names = c(NA,
-40L), class = "data.frame")
#=====================================
library(ggplot2)
# first subset
m1data <- subset(mydata, tradeflavor == 1)
# plot for tradeflavor = 1
p1 <- ggplot(m1data , aes( timestamp, x)) + geom_point() +
geom_smooth(method = lm, se = FALSE)
p1
m2data <- subset(mydata, tradeflavor == 2)
p2 <- ggplot(m2data , aes( timestamp, x )) + geom_point() +
geom_smooth(method = lm, se = FALSE)
p2
# plot a grid of results
pgrid <- p <- ggplot(mydata , aes( timestamp, x)) + geom_point() +
geom_smooth(method = lm, se = FALSE) + facet_grid(tradeflavor ~ .)
pgrid
# Have fun with R.
John Kane
Kingston ON Canada
-----Original Message-----
From: norbert.skalski at ronin-capital.com
Sent: Tue, 28 Aug 2012 11:48:32 -0500
To: r-help at r-project.org
Subject: [R] date in plot, can't add regression line
Hello all,
I have been using R for about 3 weeks and I am frustrated by a problem.
I have read R in a nutshell, scoured the internet for help but I either
am not understanding examples or am missing something completely basic.
Here is the problem:
I want to plot data that contains dates on the x axis. Then I want to
fit a line to the data. I have been unable to do it.
This is an example of the data (in a dataframe called
"tradeflavorbyday"), 40 lines of it (I'm sorry it's not in a runnable
form, not sure how to get that from R) :
tradeflavor timestamp x
1 1 2009-01-22 1
2 2 2009-01-22 1
3 1 2009-01-23 1
4 1 2009-01-27 54
5 1 2009-01-28 105
6 2 2009-01-28 2
7 16 2009-01-28 2
8 1 2009-01-29 71
9 16 2009-01-29 2
10 1 2009-01-30 42
11 1 2009-02-02 19
12 16 2009-02-02 2
13 1 2009-02-03 36
14 4 2009-02-03 2
15 8 2009-02-03 3
16 1 2009-02-04 73
17 8 2009-02-04 12
18 16 2009-02-04 7
19 1 2009-02-05 53
20 8 2009-02-05 6
21 16 2009-02-05 9
22 1 2009-02-06 38
23 4 2009-02-06 6
24 8 2009-02-06 2
25 16 2009-02-06 3
26 1 2009-02-09 42
27 2 2009-02-09 2
28 4 2009-02-09 1
29 8 2009-02-09 2
30 1 2009-02-10 87
31 4 2009-02-10 2
32 8 2009-02-10 4
33 16 2009-02-10 3
34 1 2009-02-11 55
35 2 2009-02-11 6
36 4 2009-02-11 4
37 8 2009-02-11 2
38 16 2009-02-11 8
39 1 2009-02-12 153
40 2 2009-02-12 6
The plot displays the x column as the yaxis and the date as the x axis,
grouped by the tradetype column.
The timestamp column:
class(tradeflavorbyday$timestamp)
[1] "POSIXlt" "POSIXt"
So in this case I want to plot tradetype 1 (method 1):
xdates <- tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1]
ydata <- tradeflavorbyday$x[tradeflavorbyday$tradeflavor == 1]
plot(xdates, ydata, col="black", xlab="Dates", ylab="Count")
Up to here it works great.
Now a abline through lm:
xylm <- lm(ydata~xdates) <------ this fails, can't do dates as below
abline(xylm, col="black")
lm(ydata~xdates)
Error in model.frame.default(formula = ydata ~ xdates, drop.unused.levels
= TRUE) :
invalid type (list) for variable 'xdates'
So I try this instead (method 2):
xdata <- 1:length(tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor
== 1])
ydata <- tradeflavorbyday$x[tradeflavorbyday$tradeflavor == 1]
xylm <- lm(ydata~xdata) <------ now this works, great
abline(xylm, col="black")
The problem now is that I can't get the dates onto the xaxis. I have
tried turning off the axis using xaxt="n" and reploting using the
axis.POSIXct() call but it does not want to display the dates:
dateseq = seq(xdates[1], xdates[length(xdates)], by="month")
axis.POSIXct(1, at=dateseq, format="%Y\n%b")
I have tried combining both approaches by plotting dates and trying to
fit the line using method 2:
xdates <- tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1]
xdata <- 1:length(tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor
== 1])
ydata <- tradeflavorbyday$x[tradeflavorbyday$tradeflavor == 1]
plot(xdates, ydata, col="black", xlab="Dates", ylab="Count", xaxt="n")
dateseq = seq(xdates[1], xdates[length(xdates)], by="month")
axis.POSIXct(1, at=dateseq, format="%Y\n%b")
xylm <- lm(ydata~xdata) <- works
abline(xylm, col="black") <- does nothing
In this case the call to lm and abline "works" but nothing is drawn.
Confused I plugged in the coefficients manually (I have complete data, so
they will be different than the example data I pasted):
lm(ydata~xdata)
Call:
lm(formula = ydata ~ xdata)
Coefficients:
(Intercept) xdata
6.11491 -0.02577
Abline(6.11491, -0.02577) <- call worked, but nothing shown
Just by chance I added many 0 to flatten out the slope:
Abline(6.11491, -0. 0000000002577) <- call worked and a horizontal line
appeared?????
So I took off a 0:
Abline(6.11491, -0. 000000002577) <- the line moved significantly down
So I took off another 0:
Abline(6.11491, -0. 00000002577) <- line disappeared
I guess the slope causes it to go vertical and disappear of the graph.
I have no idea how to solve my issue. If anyone can see my basic idiotic
error please point it out, or maybe you have another suggestion, I will
gladly try it.
Thanks for your help!!
____________________________________________________________
GET FREE SMILEYS FOR YOUR IM & EMAIL - Learn more at http://www.inbox.com/smileys
Works with AIM?, MSN? Messenger, Yahoo!? Messenger, ICQ?, Google Talk? and most webmails
Hello,
Inline.
Em 28-08-2012 18:23, Nordlund, Dan (DSHS/RDA) escreveu:
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Norbert Skalski
Sent: Tuesday, August 28, 2012 9:49 AM
To: r-help at r-project.org
Subject: [R] date in plot, can't add regression line
Hello all,
I have been using R for about 3 weeks and I am frustrated by a problem.
I have read R in a nutshell, scoured the internet for help but I either
am not understanding examples or am missing something completely basic.
Here is the problem:
I want to plot data that contains dates on the x axis. Then I want to
fit a line to the data. I have been unable to do it.
This is an example of the data (in a dataframe called
"tradeflavorbyday"), 40 lines of it (I'm sorry it's not in a runnable
form, not sure how to get that from R) :
tradeflavor timestamp x
1 1 2009-01-22 1
2 2 2009-01-22 1
3 1 2009-01-23 1
4 1 2009-01-27 54
5 1 2009-01-28 105
6 2 2009-01-28 2
7 16 2009-01-28 2
8 1 2009-01-29 71
9 16 2009-01-29 2
10 1 2009-01-30 42
11 1 2009-02-02 19
12 16 2009-02-02 2
13 1 2009-02-03 36
14 4 2009-02-03 2
15 8 2009-02-03 3
16 1 2009-02-04 73
17 8 2009-02-04 12
18 16 2009-02-04 7
19 1 2009-02-05 53
20 8 2009-02-05 6
21 16 2009-02-05 9
22 1 2009-02-06 38
23 4 2009-02-06 6
24 8 2009-02-06 2
25 16 2009-02-06 3
26 1 2009-02-09 42
27 2 2009-02-09 2
28 4 2009-02-09 1
29 8 2009-02-09 2
30 1 2009-02-10 87
31 4 2009-02-10 2
32 8 2009-02-10 4
33 16 2009-02-10 3
34 1 2009-02-11 55
35 2 2009-02-11 6
36 4 2009-02-11 4
37 8 2009-02-11 2
38 16 2009-02-11 8
39 1 2009-02-12 153
40 2 2009-02-12 6
The plot displays the x column as the yaxis and the date as the x axis,
grouped by the tradetype column.
The timestamp column:
class(tradeflavorbyday$timestamp)
[1] "POSIXlt" "POSIXt"
So in this case I want to plot tradetype 1 (method 1):
xdates <- tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1]
ydata <- tradeflavorbyday$x[tradeflavorbyday$tradeflavor == 1]
plot(xdates, ydata, col="black", xlab="Dates", ylab="Count")
Up to here it works great.
Now a abline through lm:
xylm <- lm(ydata~xdates) <------ this fails, can't do dates as below
abline(xylm, col="black")
lm(ydata~xdates)
Error in model.frame.default(formula = ydata ~ xdates,
drop.unused.levels = TRUE) :
invalid type (list) for variable 'xdates'
You might try converting timestamp as follows
xdates <- as.POSIXct(tradeflavorbyday$timestamp[tradeflavorbyday$tradeflavor == 1])
Your original code should now work.
It does, I've just tried it.
Also, regarding the op statement "(I'm sorry it's not in a runnable
form, not sure how to get that from R)":
# It's easy to read in the data
tfday <- read.table(text="
tradeflavor timestamp x
1 1 2009-01-22 1
2 2 2009-01-22 1
3 1 2009-01-23 1
[...etc...]
39 1 2009-02-12 153
40 2 2009-02-12 6
", header=TRUE, stringsAsFactors=FALSE)
# But it's better to paste the output of dput().
dput(tfday)
structure(list(tradeflavor = c(1L, 2L, 1L, 1L, 1L, 2L, 16L, 1L,
16L, 1L, 1L, 16L, 1L, 4L, 8L, 1L, 8L, 16L, 1L, 8L, 16L, 1L, 4L,
8L, 16L, 1L, 2L, 4L, 8L, 1L, 4L, 8L, 16L, 1L, 2L, 4L, 8L, 16L,
1L, 2L), timestamp = structure(c(1232582400, 1232582400, 1232668800,
1233014400, 1233100800, 1233100800, 1233100800, 1233187200, 1233187200,
1233273600, 1233532800, 1233532800, 1233619200, 1233619200, 1233619200,
1233705600, 1233705600, 1233705600, 1233792000, 1233792000, 1233792000,
1233878400, 1233878400, 1233878400, 1233878400, 1234137600, 1234137600,
1234137600, 1234137600, 1234224000, 1234224000, 1234224000, 1234224000,
1234310400, 1234310400, 1234310400, 1234310400, 1234310400, 1234396800,
1234396800), class = c("POSIXct", "POSIXt"), tzone = ""), x = c(1L,
1L, 1L, 54L, 105L, 2L, 2L, 71L, 2L, 42L, 19L, 2L, 36L, 2L, 3L,
73L, 12L, 7L, 53L, 6L, 9L, 38L, 6L, 2L, 3L, 42L, 2L, 1L, 2L,
87L, 2L, 4L, 3L, 55L, 6L, 4L, 2L, 8L, 153L, 6L)), .Names = c("tradeflavor",
"timestamp", "x"), row.names = c("1", "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17",
"18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28",
"29", "30", "31", "32", "33", "34", "35", "36", "37", "38", "39",
"40"), class = "data.frame")
# Now all we need to do is copy and paste this into an R session:
tfday <- structure(...etc...)
# Finally, for the sake of completeness, the rest of the code.
tfday$timestamp <- as.POSIXct(tfday$timestamp)
inx <- tfday$tradeflavor == 1 # do this once
xdates <- tfday$timestamp[inx]
ydata <- tfday$x[inx]
plot(xdates, ydata)
model <- lm(ydata ~ xdates)
abline(model)
Hope this helps,
Rui Barradas
Hope this is helpful,
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204