IP-Address

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090529/dcf25252/attachment-0001.pl>
IP addresses are very (very!) difficult to parse and sort correctly 
because there are all sorts of supported formats.  Try to use something 
like PostgreSQL instead: it is already implemented there.  But if you 
are sure all your data is of the n.n.n.n form, then something along the 
lines of the following should basically work (I have chosen some more 
interesting IP addresses for this):

a <- data.frame(cbind(id=c(138,138,138,138),
                      rank=c(29746,29746,29746,29746),
                      color=c("yellow","red","blue","red"),
                      status=c("no","yes","yes","no"),

ip=c("162.131.58.26","2.131.58.16","2.2.58.10","162.131.58.17")))
a
#    id  rank  color status            ip
# 1 138 29746 yellow     no 162.131.58.26
# 2 138 29746    red    yes   2.131.58.16
# 3 138 29746   blue    yes     2.2.58.10
# 4 138 29746    red     no 162.131.58.17
x <- matrix(unlist(lapply(strsplit(as.character(a$ip), ".", fixed=TRUE), 
as.integer)),
            ncol=4, byrow=TRUE)
a[order(x[,1],x[,2],x[,3],x[,4]),]
#    id  rank  color status            ip
# 3 138 29746   blue    yes     2.2.58.10
# 2 138 29746    red    yes   2.131.58.16
# 4 138 29746    red     no 162.131.58.17
# 1 138 29746 yellow     no 162.131.58.26

Getting rid of the conversions including the matrix(unlist) combo is 
left as an exercise (it's too hot here....)

Allan.
Hi,

Is there any way to sort a tabel with a colum with IP-address?

table:

id rank color status ip
138 29746 yellow no 162.131.58.26
138 29746 red  yes  162.131.58.16
138 29746 blue yes  162.131.58.10
138 29746 red no  162.131.58.17
138 29746 yellow no 162.131.58.14
138 29746 red no  162.131.58.13
138 29746 yellow  no 162.132.58.15
139 29746 green no  162.252.20.69
140 29746 red yes  162.254.20.71
141 29746 yellow no  163.253.7.153
142 31804 green yes  163.253.20.114
144 32360 black yes  161.138.45.226
....

Unfortunately, order doesn't work as I want.

I found an half solusion from John:

mysort <- function(x){
  sort.helper <- function(x){
    prefix <- strsplit(x, "[0-9]")
    prefix <- sapply(prefix, "[", 1)
    prefix[is.na(prefix)] <- ""
    suffix <- strsplit(x, "[^0-9]")
    suffix <- as.numeric(sapply(suffix, "[", 2))
    suffix[is.na(suffix)] <- -Inf
    remainder <- sub("[^0-9]+", "", x)
    remainder <- sub("[0-9]+", "", remainder)
    if (all (remainder == "")) list(prefix, suffix)
    else c(list(prefix, suffix), Recall(remainder))
    }
  ord <- do.call("order", sort.helper(x))
  x[ord]
   } 

mysort (data$ip)  captured only the ip-adresse. How can I capture the whole table and sorted?(ID rank color status ip)

Thank you in advance.

eddie

_________________________________________________________________

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

IP addresses are very (very!) difficult to parse and sort correctly
because there are all sorts of supported formats.  Try to use something
like PostgreSQL instead: it is already implemented there.  But if you
are sure all your data is of the n.n.n.n form, then something along the
lines of the following should basically work (I have chosen some more
interesting IP addresses for this):

a <- data.frame(cbind(id=c(138,138,138,138),
                     rank=c(29746,29746,29746,29746),
                     color=c("yellow","red","blue","red"),
                     status=c("no","yes","yes","no"),

ip=c("162.131.58.26","2.131.58.16","2.2.58.10","162.131.58.17")))
a
#    id  rank  color status            ip
# 1 138 29746 yellow     no 162.131.58.26
# 2 138 29746    red    yes   2.131.58.16
# 3 138 29746   blue    yes     2.2.58.10
# 4 138 29746    red     no 162.131.58.17
x <- matrix(unlist(lapply(strsplit(as.character(a$ip), ".", fixed=TRUE),
as.integer)),
           ncol=4, byrow=TRUE)
a[order(x[,1],x[,2],x[,3],x[,4]),]
#    id  rank  color status            ip
# 3 138 29746   blue    yes     2.2.58.10
# 2 138 29746    red    yes   2.131.58.16
# 4 138 29746    red     no 162.131.58.17
# 1 138 29746 yellow     no 162.131.58.26

Getting rid of the conversions including the matrix(unlist) combo is
left as an exercise (it's too hot here....)
Here's one way:

con <- textConnection(as.character(a$ip))
o <- do.call(order,read.table(con,sep="."))
close(con)
a[o,]
O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
normalizedip <- function(ipstring){
  ipsepstring <- strsplit(ipstring,"\\.")[[1]]
  cat(sapply(ipsepstring,function(x)
       sprintf("%03i",as.numeric(x))),sep=".")
}

normalizedip("1.2.3.55")
yields
 "001.002.003.055"
and therefore should allow you to sort in correct order.
Hi,

Is there any way to sort a tabel with a colum with IP-address?

table:

id rank color status ip
138 29746 yellow no 162.131.58.26
138 29746 red  yes  162.131.58.16
138 29746 blue yes  162.131.58.10
138 29746 red no  162.131.58.17
138 29746 yellow no 162.131.58.14
138 29746 red no  162.131.58.13
138 29746 yellow  no 162.132.58.15
139 29746 green no  162.252.20.69
140 29746 red yes  162.254.20.71
141 29746 yellow no  163.253.7.153
142 31804 green yes  163.253.20.114
144 32360 black yes  161.138.45.226
....

Unfortunately, order doesn't work as I want.

I found an half solusion from John:

mysort <- function(x){
  sort.helper <- function(x){
    prefix <- strsplit(x, "[0-9]")
    prefix <- sapply(prefix, "[", 1)
    prefix[is.na(prefix)] <- ""
    suffix <- strsplit(x, "[^0-9]")
    suffix <- as.numeric(sapply(suffix, "[", 2))
    suffix[is.na(suffix)] <- -Inf
    remainder <- sub("[^0-9]+", "", x)
    remainder <- sub("[0-9]+", "", remainder)
    if (all (remainder == "")) list(prefix, suffix)
    else c(list(prefix, suffix), Recall(remainder))
    }
  ord <- do.call("order", sort.helper(x))
  x[ord]
   } 

mysort (data$ip)  captured only the ip-adresse. How can I capture the whole table and sorted?(ID rank color status ip)

Thank you in advance.

eddie

_________________________________________________________________

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Erich Neuwirth, University of Vienna
Faculty of Computer Science
Computer Supported Didactics Working Group
Visit our SunSITE at http://sunsite.univie.ac.at
Phone: +43-1-4277-39464 Fax: +43-1-4277-39459
Allan Engelhardt wrote:

IP addresses are very (very!) difficult to parse and sort correctly
because there are all sorts of supported formats.  Try to use something
like PostgreSQL instead: it is already implemented there.  But if you
are sure all your data is of the n.n.n.n form, then something along the
lines of the following should basically work (I have chosen some more
interesting IP addresses for this):

a <- data.frame(cbind(id=c(138,138,138,138),
                     rank=c(29746,29746,29746,29746),
                     color=c("yellow","red","blue","red"),
                     status=c("no","yes","yes","no"),

ip=c("162.131.58.26","2.131.58.16","2.2.58.10","162.131.58.17")))
a
#    id  rank  color status            ip
# 1 138 29746 yellow     no 162.131.58.26
# 2 138 29746    red    yes   2.131.58.16
# 3 138 29746   blue    yes     2.2.58.10
# 4 138 29746    red     no 162.131.58.17
x <- matrix(unlist(lapply(strsplit(as.character(a$ip), ".", fixed=TRUE),
as.integer)),
           ncol=4, byrow=TRUE)
a[order(x[,1],x[,2],x[,3],x[,4]),]
#    id  rank  color status            ip
# 3 138 29746   blue    yes     2.2.58.10
# 2 138 29746    red    yes   2.131.58.16
# 4 138 29746    red     no 162.131.58.17
# 1 138 29746 yellow     no 162.131.58.26

Getting rid of the conversions including the matrix(unlist) combo is
left as an exercise (it's too hot here....)

Here's one way:

con <- textConnection(as.character(a$ip))
o <- do.call(order,read.table(con,sep="."))
close(con)
a[o,]

here's another:

    library(gsubfn)
    a[order(gsubfn(
        '[0-9]+',
        ~ sprintf('%03d', as.integer(x)),
        as.character(a$ip))),]

vQ
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090529/6691c8ec/attachment-0001.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090531/6d3e0aaa/attachment-0001.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090531/fe07eab1/attachment-0001.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090531/68df27f0/attachment-0001.pl>
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090530/b5389be5/attachment-0001.pl>
Here is yet another way:

library(gtools)
DF[mixedorder(DF$ip), ]
Hi,

Is there any way to sort a tabel with a colum with IP-address?

table:

id rank color status ip
138 29746 yellow no 162.131.58.26
138 29746 red ?yes ?162.131.58.16
138 29746 blue yes ?162.131.58.10
138 29746 red no ?162.131.58.17
138 29746 yellow no 162.131.58.14
138 29746 red no ?162.131.58.13
138 29746 yellow ?no 162.132.58.15
139 29746 green no ?162.252.20.69
140 29746 red yes ?162.254.20.71
141 29746 yellow no ?163.253.7.153
142 31804 green yes ?163.253.20.114
144 32360 black yes ?161.138.45.226
....

Unfortunately, order doesn't work as I want.

I found an half solusion from John:

mysort <- function(x){
?sort.helper <- function(x){
? ?prefix <- strsplit(x, "[0-9]")
? ?prefix <- sapply(prefix, "[", 1)
? ?prefix[is.na(prefix)] <- ""
? ?suffix <- strsplit(x, "[^0-9]")
? ?suffix <- as.numeric(sapply(suffix, "[", 2))
? ?suffix[is.na(suffix)] <- -Inf
? ?remainder <- sub("[^0-9]+", "", x)
? ?remainder <- sub("[0-9]+", "", remainder)
? ?if (all (remainder == "")) list(prefix, suffix)
? ?else c(list(prefix, suffix), Recall(remainder))
? ?}
?ord <- do.call("order", sort.helper(x))
?x[ord]
? }

mysort (data$ip) ?captured only the ip-adresse. How can I capture the whole table and sorted?(ID rank color status ip)

Thank you in advance.

eddie

_________________________________________________________________

? ? ? ?[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hi VQ,

Thank you. It works like charm. But I think Peter's code is faster. What is the difference? 

i think peter's code is more r-elegant, though less generic.  here's a
quick test, with not so surprising results.  gsubfn is implemented in r,
not c, and it is painfully slow in this test. i also added gabor's
suggestion.

    library(gsubfn)
    library(gtools)
    library(rbenchmark)

    n = 1000
    df = data.frame(
       a=rnorm(n),
       b = rnorm(n),
       c = rnorm(n),
       ip = replicate(n, paste(sample(255, 4), collapse='.'),
simplify=TRUE))
    benchmark(columns=c('test', 'elapsed'), replications=10, order=NULL,
       peda={
          connection = textConnection(as.character(df$ip))
          o = do.call(order, read.table(connection, sep='.'))
          close(connection)
          df[o, ] },
       waku=df[order(gsubfn(perl=TRUE,
          '[0-9]+',
          ~ sprintf('%03d', as.integer(x)),
          as.character(df$ip))), ],
       gagr=df[mixedorder(df$ip), ] )

    # peda 0.070
    # waku 7.070
    # gagr 4.710

vQ
library(gsubfn)
library(gtools)
library(rbenchmark)

n <- 10000
df <- data.frame(
  a = rnorm(n),
  b = rnorm(n),
  c = rnorm(n),
  ip = replicate(n, paste(sample(255, 4), collapse='.'), simplify=TRUE)
)

res <- benchmark(columns=c('test', 'elapsed'), replications=10, order=NULL,
  peda = {
    connection <- textConnection(as.character(df$ip))
    o <- do.call(order, read.table(connection, sep='.'))
    close(connection)
    df[o, ]
  },

  peda2 = {
    connection <- textConnection(as.character(df$ip))
    dfT <- read.table(connection, sep='.', colClasses=rep("integer",
4), quote="", na.strings=NULL, blank.lines.skip=FALSE)
    close(connection)
    o <- do.call(order, dfT)
    df[o, ]
  },

  hb = {
    ip <- strsplit(as.character(df$ip), split=".", fixed=TRUE)
    ip <- unlist(ip, use.names=FALSE)
    ip <- as.integer(ip)
    dim(ip) <- c(4, nrow(df))
    ip <- 256^3*ip[1,] + 256^2*ip[2,] + 256*ip[3,] + ip[4,]
    o <- order(ip)
    df[o, ]
  },

  hb2 = {
    ip <- strsplit(as.character(df$ip), split=".", fixed=TRUE)
    ip <- unlist(ip, use.names=FALSE)
    ip <- as.integer(ip);
    dim(ip) <- c(4, nrow(df))
    o <- sort.list(ip[4,], method="radix", na.last=TRUE)
    for (kk in 3:1) {
      o <- o[sort.list(ip[kk,o], method="radix", na.last=TRUE)]
    }
    df[o, ]
  }
)

print(res)

   test elapsed
1  peda    4.12
2 peda2    4.08
3    hb    0.28
4   hb2    0.25

On Sun, May 31, 2009 at 12:42 AM, Wacek Kusnierczyk
edwin Sendjaja wrote:
Hi VQ,

Thank you. It works like charm. But I think Peter's code is faster. What is the difference?

i think peter's code is more r-elegant, though less generic. ?here's a
quick test, with not so surprising results. ?gsubfn is implemented in r,
not c, and it is painfully slow in this test. i also added gabor's
suggestion.

? ?library(gsubfn)
? ?library(gtools)
? ?library(rbenchmark)

? ?n = 1000
? ?df = data.frame(
? ? ? a=rnorm(n),
? ? ? b = rnorm(n),
? ? ? c = rnorm(n),
? ? ? ip = replicate(n, paste(sample(255, 4), collapse='.'),
simplify=TRUE))
? ?benchmark(columns=c('test', 'elapsed'), replications=10, order=NULL,
? ? ? peda={
? ? ? ? ?connection = textConnection(as.character(df$ip))
? ? ? ? ?o = do.call(order, read.table(connection, sep='.'))
? ? ? ? ?close(connection)
? ? ? ? ?df[o, ] },
? ? ? waku=df[order(gsubfn(perl=TRUE,
? ? ? ? ?'[0-9]+',
? ? ? ? ?~ sprintf('%03d', as.integer(x)),
? ? ? ? ?as.character(df$ip))), ],
? ? ? gagr=df[mixedorder(df$ip), ] )

? ?# peda 0.070
? ?# waku 7.070
? ?# gagr 4.710

vQ

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

wow! :)

vQ
library(gsubfn)
library(gtools)
library(rbenchmark)

n <- 10000
df <- data.frame(
  a = rnorm(n),
  b = rnorm(n),
  c = rnorm(n),
  ip = replicate(n, paste(sample(255, 4), collapse='.'), simplify=TRUE)
)

res <- benchmark(columns=c('test', 'elapsed'), replications=10, order=NULL,
  peda = {
    connection <- textConnection(as.character(df$ip))
    o <- do.call(order, read.table(connection, sep='.'))
    close(connection)
    df[o, ]
  },

  peda2 = {
    connection <- textConnection(as.character(df$ip))
    dfT <- read.table(connection, sep='.', colClasses=rep("integer",
4), quote="", na.strings=NULL, blank.lines.skip=FALSE)
    close(connection)
    o <- do.call(order, dfT)
    df[o, ]
  },

  hb = {
    ip <- strsplit(as.character(df$ip), split=".", fixed=TRUE)
    ip <- unlist(ip, use.names=FALSE)
    ip <- as.integer(ip)
    dim(ip) <- c(4, nrow(df))
    ip <- 256^3*ip[1,] + 256^2*ip[2,] + 256*ip[3,] + ip[4,]
    o <- order(ip)
    df[o, ]
  },

  hb2 = {
    ip <- strsplit(as.character(df$ip), split=".", fixed=TRUE)
    ip <- unlist(ip, use.names=FALSE)
    ip <- as.integer(ip);
    dim(ip) <- c(4, nrow(df))
    o <- sort.list(ip[4,], method="radix", na.last=TRUE)
    for (kk in 3:1) {
      o <- o[sort.list(ip[kk,o], method="radix", na.last=TRUE)]
    }
    df[o, ]
  }
)

print(res)

   test elapsed
1  peda    4.12
2 peda2    4.08
3    hb    0.28
4   hb2    0.25

Not really, just the old saying that any piece of code can be made
twice as fast (which often holds true recursively). /Henrik

On Sun, May 31, 2009 at 1:58 PM, Wacek Kusnierczyk
wow! :)

vQ

Henrik Bengtsson wrote:
library(gsubfn)
library(gtools)
library(rbenchmark)

n <- 10000
df <- data.frame(
? a = rnorm(n),
? b = rnorm(n),
? c = rnorm(n),
? ip = replicate(n, paste(sample(255, 4), collapse='.'), simplify=TRUE)
)

res <- benchmark(columns=c('test', 'elapsed'), replications=10, order=NULL,
? peda = {
? ? connection <- textConnection(as.character(df$ip))
? ? o <- do.call(order, read.table(connection, sep='.'))
? ? close(connection)
? ? df[o, ]
? },

? peda2 = {
? ? connection <- textConnection(as.character(df$ip))
? ? dfT <- read.table(connection, sep='.', colClasses=rep("integer",
4), quote="", na.strings=NULL, blank.lines.skip=FALSE)
? ? close(connection)
? ? o <- do.call(order, dfT)
? ? df[o, ]
? },

? hb = {
? ? ip <- strsplit(as.character(df$ip), split=".", fixed=TRUE)
? ? ip <- unlist(ip, use.names=FALSE)
? ? ip <- as.integer(ip)
? ? dim(ip) <- c(4, nrow(df))
? ? ip <- 256^3*ip[1,] + 256^2*ip[2,] + 256*ip[3,] + ip[4,]
? ? o <- order(ip)
? ? df[o, ]
? },

? hb2 = {
? ? ip <- strsplit(as.character(df$ip), split=".", fixed=TRUE)
? ? ip <- unlist(ip, use.names=FALSE)
? ? ip <- as.integer(ip);
? ? dim(ip) <- c(4, nrow(df))
? ? o <- sort.list(ip[4,], method="radix", na.last=TRUE)
? ? for (kk in 3:1) {
? ? ? o <- o[sort.list(ip[kk,o], method="radix", na.last=TRUE)]
? ? }
? ? df[o, ]
? }
)

print(res)

? ?test elapsed
1 ?peda ? ?4.12
2 peda2 ? ?4.08
3 ? ?hb ? ?0.28
4 ? hb2 ? ?0.25

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.