Extracting File Basename without Extension

Dear all,

The basename() function returns the extension also:
myfile <- "path1/path2/myoutput.txt"
basename(myfile)
[1] "myoutput.txt"

Is there any other function where it just returns
plain base:

"myoutput"

i.e. without 'txt'

- Gundala Viswanath
Jakarta - Indonesia
You can use 'sub' to get rid of the extensions:
sub("^([^.]*).*", "\\1", 'filename.extension')
[1] "filename"
sub("^([^.]*).*", "\\1", 'filename.extension.and.more')
[1] "filename"
sub("^([^.]*).*", "\\1", 'filename without extension')
[1] "filename without extension"
Dear all,

The basename() function returns the extension also:

myfile <- "path1/path2/myoutput.txt"
basename(myfile)
[1] "myoutput.txt"

Is there any other function where it just returns
plain base:

"myoutput"

i.e. without 'txt'

- Gundala Viswanath
Jakarta - Indonesia

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090109/7c03448f/attachment-0001.pl>
G'day all,

On Fri, 9 Jan 2009 08:12:18 -0200

Try this also:

substr(basename(myfile), 1, nchar(basename(myfile)) - 4)
Or, in case that the extension has more than three letters or "myfile"
is a vector of names:

R> myfile <- "path1/path2/myoutput.txt"
R> sapply(strsplit(basename(myfile),"\\."), function(x) paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput"
R> myfile2 <- c(myfile, "path2/path3/myoutput.temp")
R> sapply(strsplit(basename(myfile2),"\\."), function(x) paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput" "myoutput"
R> myfile3 <- c(myfile2, "path4/path5/my.out.put.xls")
R> sapply(strsplit(basename(myfile3),"\\."), function(x) paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput"   "myoutput"   "my.out.put"

HTH.

Cheers,

	Berwin
On Fri, Jan 9, 2009 at 12:10 AM, Gundala Viswanath
<gundalav at gmail.com>wrote:

Dear all,

The basename() function returns the extension also:

myfile <- "path1/path2/myoutput.txt"
basename(myfile)
[1] "myoutput.txt"

Is there any other function where it just returns
plain base:

"myoutput"

i.e. without 'txt'

- Gundala Viswanath
Jakarta - Indonesia
=========================== Full address =============================
Berwin A Turlach                            Tel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability        +65 6516 6650 (self)
Faculty of Science                          FAX : +65 6872 3919       
National University of Singapore     
6 Science Drive 2, Blk S16, Level 7          e-mail: statba at nus.edu.sg
Singapore 117546                    http://www.stat.nus.edu.sg/~statba
G'day all,

On Fri, 9 Jan 2009 08:12:18 -0200
"Henrique Dallazuanna" <wwwhsd at gmail.com> wrote:

Try this also:

substr(basename(myfile), 1, nchar(basename(myfile)) - 4)

Or, in case that the extension has more than three letters or "myfile"
is a vector of names:

R> myfile <- "path1/path2/myoutput.txt"
R> sapply(strsplit(basename(myfile),"\\."), function(x) paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput"
R> myfile2 <- c(myfile, "path2/path3/myoutput.temp")
R> sapply(strsplit(basename(myfile2),"\\."), function(x) paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput" "myoutput"
R> myfile3 <- c(myfile2, "path4/path5/my.out.put.xls")
R> sapply(strsplit(basename(myfile3),"\\."), function(x) paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput"   "myoutput"   "my.out.put"

or have sub do the job for you:

filenames.ext = c("foo.bar", basename("foo/bar/hello.dolly"))
(filenames.noext = sub("[.][^.]*$", "", filenames.ext, perl=TRUE))

vQ
On Fri, Jan 9, 2009 at 6:52 AM, Wacek Kusnierczyk
Berwin A Turlach wrote:
G'day all,

On Fri, 9 Jan 2009 08:12:18 -0200
"Henrique Dallazuanna" <wwwhsd at gmail.com> wrote:

Try this also:

substr(basename(myfile), 1, nchar(basename(myfile)) - 4)

Or, in case that the extension has more than three letters or "myfile"
is a vector of names:

R> myfile <- "path1/path2/myoutput.txt"
R> sapply(strsplit(basename(myfile),"\\."), function(x) paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput"
R> myfile2 <- c(myfile, "path2/path3/myoutput.temp")
R> sapply(strsplit(basename(myfile2),"\\."), function(x) paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput" "myoutput"
R> myfile3 <- c(myfile2, "path4/path5/my.out.put.xls")
R> sapply(strsplit(basename(myfile3),"\\."), function(x) paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput"   "myoutput"   "my.out.put"

or have sub do the job for you:

filenames.ext = c("foo.bar", basename("foo/bar/hello.dolly"))
(filenames.noext = sub("[.][^.]*$", "", filenames.ext, perl=TRUE))
We can omit perl = TRUE here.
On Fri, Jan 9, 2009 at 6:52 AM, Wacek Kusnierczyk

or have sub do the job for you:

filenames.ext = c("foo.bar", basename("foo/bar/hello.dolly"))
(filenames.noext = sub("[.][^.]*$", "", filenames.ext, perl=TRUE))

We can omit perl = TRUE here.

or maybe not, depending on the actual task:

names = replicate(10000, paste(sample(c(letters, "."), 100,
replace=TRUE), collapse=""))
system.time(replicate(10, sub("[.][^.]*$", "", names, perl=TRUE)))
system.time(replicate(10, sub("[.][^.]*$", "", names)))

vQ
G'day Wacek,

On Fri, 09 Jan 2009 12:52:46 +0100

Berwin A Turlach wrote:
G'day all,

On Fri, 9 Jan 2009 08:12:18 -0200
"Henrique Dallazuanna" <wwwhsd at gmail.com> wrote:

Try this also:

substr(basename(myfile), 1, nchar(basename(myfile)) - 4)

Or, in case that the extension has more than three letters or
"myfile" is a vector of names:

R> myfile <- "path1/path2/myoutput.txt"
R> sapply(strsplit(basename(myfile),"\\."), function(x)
R> paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput"
R> myfile2 <- c(myfile, "path2/path3/myoutput.temp")
R> sapply(strsplit(basename(myfile2),"\\."), function(x)
R> paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput" "myoutput"
R> myfile3 <- c(myfile2, "path4/path5/my.out.put.xls")
R> sapply(strsplit(basename(myfile3),"\\."), function(x)
R> paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput"   "myoutput"   "my.out.put"

or have sub do the job for you:

filenames.ext = c("foo.bar", basename("foo/bar/hello.dolly"))
(filenames.noext = sub("[.][^.]*$", "", filenames.ext, perl=TRUE))
Apparently also a possibility, I guess it can be made to work with the
original example and my extensions.

Though, it seems to require the knowledge of perl, or at least perl's
regular expression. 

Cheers,

	Berwin
G'day Wacek,

Or, in case that the extension has more than three letters or
"myfile" is a vector of names:

R> myfile <- "path1/path2/myoutput.txt"
R> sapply(strsplit(basename(myfile),"\\."), function(x)
R> paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput"
R> myfile2 <- c(myfile, "path2/path3/myoutput.temp")
R> sapply(strsplit(basename(myfile2),"\\."), function(x)
R> paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput" "myoutput"
R> myfile3 <- c(myfile2, "path4/path5/my.out.put.xls")
R> sapply(strsplit(basename(myfile3),"\\."), function(x)
R> paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput"   "myoutput"   "my.out.put"

or have sub do the job for you:

filenames.ext = c("foo.bar", basename("foo/bar/hello.dolly"))
(filenames.noext = sub("[.][^.]*$", "", filenames.ext, perl=TRUE))

g'afternoon berwin,
Apparently also a possibility, I guess it can be made to work with the
original example and my extensions.

i guess it does work with the original example and your extensions.
Though, it seems to require the knowledge of perl, or at least perl's
regular expression. 

oh my, sorry.  it' so bad to go an inch out of the cosy world of r. 
but, as gabor pointed, 'perl=TRUE' is inessential here, so you actually
need to know just (very basic) regular expressions, with no 'perl'
implied.  having learnt this simple regex syntax you can avoid the need
for looking up strsplit and paste in tfm, so i'd consider it worthwhile.

vQ
G'day Wacek,

On Fri, 09 Jan 2009 14:22:19 +0100

Apparently also a possibility, I guess it can be made to work with
the original example and my extensions.

i guess it does work with the original example and your extensions.
And I thought that you would have known for sure.....
Though, it seems to require the knowledge of perl, or at least
perl's regular expression.    
oh my, sorry.  it' so bad to go an inch out of the cosy world of r. 
Well, if that's how you feel, don't do it.

I regularly use other languages besides R.  Mostly C and Fortran,
occasionally Python.  But I never found time to learn Perl or Java or
awk or C++ or....; some people do not have the time to learn all
languages under the sun. Also, if one concentrates on a few, one can
learn them really well.
but, as gabor pointed, 'perl=TRUE' is inessential here,
I thought that your answer to Gabor indicated that, depending on the
context, perl=TRUE was essential; though I must admit that I did not
run that code.
so you actually need to know just (very basic) regular expressions,
with no 'perl' implied.  having learnt this simple regex syntax you
can avoid the need for looking up strsplit and paste in tfm, so i'd
consider it worthwhile.
As people say, YMMV, I do not need to look up strsplit and/or paste;
but I would have to look up what the regular expression syntax or
finally memorise it; something I did not consider worthwhile so far.

Cheers,

	Berwin

G'day Wacek,

On Fri, 09 Jan 2009 12:52:46 +0100
Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:

Berwin A Turlach wrote:
G'day all,

On Fri, 9 Jan 2009 08:12:18 -0200
"Henrique Dallazuanna" <wwwhsd at gmail.com> wrote:

Try this also:

substr(basename(myfile), 1, nchar(basename(myfile)) - 4)

Or, in case that the extension has more than three letters or
"myfile" is a vector of names:

R> myfile <- "path1/path2/myoutput.txt"
R> sapply(strsplit(basename(myfile),"\\."), function(x)
R> paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput"
R> myfile2 <- c(myfile, "path2/path3/myoutput.temp")
R> sapply(strsplit(basename(myfile2),"\\."), function(x)
R> paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput" "myoutput"
R> myfile3 <- c(myfile2, "path4/path5/my.out.put.xls")
R> sapply(strsplit(basename(myfile3),"\\."), function(x)
R> paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput"   "myoutput"   "my.out.put"
using fixed = TRUE and not escaping '.' is slightly more efficient.
or have sub do the job for you:

filenames.ext = c("foo.bar", basename("foo/bar/hello.dolly"))
(filenames.noext = sub("[.][^.]*$", "", filenames.ext, perl=TRUE))
Apparently also a possibility, I guess it can be made to work with the
original example and my extensions.

Though, it seems to require the knowledge of perl, or at least perl's
regular expression.
Actually, that's a valid regex in any of the variants offered.  A more 
conventional writing of it is the second of
f <- 'foo.bar.R'
sub("[.][^.]*$", "", f)
[1] "foo.bar"
sub("\\.[^.]*$", "", f)
[1] "foo.bar"

It is the last that is used at various points in R's own code 
(although sometimes with restrictions on what an 'extension' is, e.g.

sub("\\.[[:alnum:]]*$", "", f)

appears in the current SHLIB code.)
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
G'day Wacek,

On Fri, 09 Jan 2009 14:22:19 +0100
Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:

Apparently also a possibility, I guess it can be made to work with
the original example and my extensions.

i guess it does work with the original example and your extensions.

And I thought that you would have known for sure.....

i thought i did until you made that comment, which made me think you've
just discovered it doesn't.

Though, it seems to require the knowledge of perl, or at least
perl's regular expression.    

oh my, sorry.  it' so bad to go an inch out of the cosy world of r. 

Well, if that's how you feel, don't do it.

quite the opposite.
I regularly use other languages besides R.  Mostly C and Fortran,
occasionally Python.  But I never found time to learn Perl or Java or
awk or C++ or....; some people do not have the time to learn all
languages under the sun. Also, if one concentrates on a few, one can
learn them really well.

i think i did not suggest the original poster to learn perl.

many responses on this list involve regular expressions, and regexes are
so ubiquitous in code that has to do with parsing and processing text,
be it filenames or loads of data, that a user of r may well want to
learn a bit of this stuff in addition to the details about how real
numbers are represented below the surface.

you may want to keep saying 'use r where applicable instead of worse
tools', and i'd like to keep saying 'use regexes where they're
applicable instead of worse tools'.  same philosophy.

but, as gabor pointed, 'perl=TRUE' is inessential here,

I thought that your answer to Gabor indicated that, depending on the
context, perl=TRUE was essential; though I must admit that I did not
run that code.  

i see i may have expressed that wrongly.  it should have been "but maybe
you'd want to keep it", meaning this is not essential for the final
result, but may improve the runtime.

so you actually need to know just (very basic) regular expressions,
with no 'perl' implied.  having learnt this simple regex syntax you
can avoid the need for looking up strsplit and paste in tfm, so i'd
consider it worthwhile.

As people say, YMMV, I do not need to look up strsplit and/or paste;
but I would have to look up what the regular expression syntax or
finally memorise it; something I did not consider worthwhile so far.

well, the regex syntax is fairly standard, though variations exist among
languages.  once you learn it, or rather the ideas behind the syntax,
you're well equipped for quite a range of tasks.  on the other hand, the
details of strsplit and paste are pretty r-specific, and you don't gain
much by remembering them (except for freeing yourself from having to
read tfm again).

i have seen quite a bunch of programs written by scientists who spent
over one hundred lines of code on just parsing command line arguments; 
i wish they knew regexes exist (and better, getopt-like modules too). 
if you're doing serious programming without knowing regexes, you're
rather lucky.

vQ
Actually, that's a valid regex in any of the variants offered.  A more
conventional writing of it is the second of

f <- 'foo.bar.R'
sub("[.][^.]*$", "", f)
[1] "foo.bar"
sub("\\.[^.]*$", "", f)
[1] "foo.bar"

more conventional in r, perhaps.  it's not portable, due to the 'escape
the escape to have an escape' feature of r when it comes to regexes; in
perl, for example, /\\.[^.]*$/ would hardly do the job.

vQ
G'day Wacek,

On Fri, 09 Jan 2009 15:19:46 +0100

i think i did not suggest the original poster to learn perl.
As I see it, you didn't suggest anything to the original poster, at
least not directly.  But, since these days you have to be subscribed
to r-help to post IIRC, it is probably reasonable to assume that the
original poster saw your posting.
many responses on this list involve regular expressions, and regexes
are so ubiquitous in code that has to do with parsing and processing
text, be it filenames or loads of data, that a user of r may well
want to learn a bit of this stuff in addition to the details about
how real numbers are represented below the surface.
You should really get rid of that chip on your shoulder.
you may want to keep saying 'use r where applicable instead of worse
tools', 
Now I am getting really confused, isn't your suggested solution not
using R too?  So why would I say something like this?  I suggest you get
rid of the chip on the other shoulder too... :)
well, the regex syntax is fairly standard, though variations exist
among languages.  
Exactly, and the existence of these variations which can trip one up
and require to rtfm in anything but the most simplest situations,
that's why regexp are not what jumps first to my mind.
i have seen quite a bunch of programs written by scientists who spent
over one hundred lines of code on just parsing command line
arguments; i wish they knew regexes exist (and better, getopt-like
modules too). if you're doing serious programming without knowing
regexes, you're rather lucky.
Probably depends on what one calls serious.

Best wishes,

	Berwin
Dear all,

The basename() function returns the extension also:

myfile <- "path1/path2/myoutput.txt"
basename(myfile)
[1] "myoutput.txt"

Is there any other function where it just returns
plain base:

"myoutput"

i.e. without 'txt'
I'm curious about something: does "file extension" have a standard 
definition?  Most (all?  I haven't tried them all) of the solutions 
presented in this thread would return an empty string for the "plain 
base" if given the filename ".bashrc".

Windows (where file extensions really mean something), though reluctant 
to create such a file, appears to agree that the extension is bashrc, 
even though to me it appears clear that that file has no extension.

Duncan Murdoch
Hi,
[mailto:r-help-bounces at r-project.org] On Behalf Of Henrique 
Dallazuanna

Try this also:

substr(basename(myfile), 1, nchar(basename(myfile)) - 4)

This, of course, assumes that the extensions are always 3 characters.
Sometimes there might be more ("index.html"), sometimes less
("shellscript.sh").

Although my solution is not as compact as the others (I wish I was
proficient in 'mastering regular expressions'), I'd like to provide my
little code-snippet which does not require any regular expressions (but
expects a . in the filename).

######################
x1 <- "roland.txt"
x2 <- "roland.html"
x3 <- "roland.sh"

no.extension <- function(astring) {
  if (substr(astring, nchar(astring), nchar(astring))==".") {
    return(substr(astring, 1, nchar(astring)-1))
  } else {
    no.extension(substr(astring, 1, nchar(astring)-1))
  }
}

no.extension(x1)
no.extension(x2)
no.extension(x3)
######################

Hope this helps a bit,
Roland

P.S. Any suggestions how to become more proficient with regular
expressions? The O'Reilly book ("Mastering...")? Whenever I tried
anything more complicated than basic usage (things like ^ $ * . ) in R,
I was way faster to write a new function (like above) instead of finding
a regex solution.

By the way: it might be still possible to *write* regular expressions,
but what about code re-use? Are there people who can easily *read*
complicated regular expressions?

----------
This mail has been sent through the MPI for Demographic Research.  Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance.
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090109/9253107a/attachment-0001.pl>
Hi,

[mailto:r-help-bounces at r-project.org] On Behalf Of Henrique
Dallazuanna

Try this also:

substr(basename(myfile), 1, nchar(basename(myfile)) - 4)

This, of course, assumes that the extensions are always 3 characters.
The regex solutions posted did not make this assumption.
P.S. Any suggestions how to become more proficient with regular
expressions? The O'Reilly book ("Mastering...")? Whenever I tried
anything more complicated than basic usage (things like ^ $ * . ) in R,
I was way faster to write a new function (like above) instead of finding
a regex solution.
See the links in the Links box at:
http://gsubfn.googlecode.com
By the way: it might be still possible to *write* regular expressions,
but what about code re-use? Are there people who can easily *read*
complicated regular expressions?

----------
This mail has been sent through the MPI for Demographic Research.  Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

On 1/8/2009 9:10 PM, Gundala Viswanath wrote:
Dear all,

The basename() function returns the extension also:

myfile <- "path1/path2/myoutput.txt"
basename(myfile)
[1] "myoutput.txt"

Is there any other function where it just returns
plain base:

"myoutput"

i.e. without 'txt'
I'm curious about something: does "file extension" have a standard
definition?  Most (all?  I haven't tried them all) of the solutions
presented in this thread would return an empty string for the "plain
base" if given the filename ".bashrc".

Windows (where file extensions really mean something), though reluctant
to create such a file, appears to agree that the extension is bashrc,
even though to me it appears clear that that file has no extension.
I'm not sure what is clear about it, but the GNU utility agrees with you:
basename abc/.exe .exe
.exe
basename abc/1.exe .exe
1

Anyone want to contribute code for an optional suffix= argument for R's
basename()?
O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
On 1/8/2009 9:10 PM, Gundala Viswanath wrote:
Dear all,

The basename() function returns the extension also:

myfile <- "path1/path2/myoutput.txt"
basename(myfile)
[1] "myoutput.txt"

Is there any other function where it just returns
plain base:

"myoutput"

i.e. without 'txt'
I'm curious about something: does "file extension" have a standard
definition?  Most (all?  I haven't tried them all) of the solutions
presented in this thread would return an empty string for the "plain
base" if given the filename ".bashrc".

Windows (where file extensions really mean something), though reluctant
to create such a file, appears to agree that the extension is bashrc,
even though to me it appears clear that that file has no extension.
Duncan,

That is going to be highly OS and even OS version specific. More
information here:

  http://en.wikipedia.org/wiki/Filename
  http://en.wikipedia.org/wiki/Filename_extension

There are relevant standard extensions for standard file formats:

  http://en.wikipedia.org/wiki/List_of_file_formats

but that does not guarantee that user created filenames will adhere to
them, especially for text files.

As you note, filenames beginning with a '.' will be common on
Unixen/Linuxen as otherwise normally hidden system/config files. Such
files would actually create problems if attempted to be opened on
Windows with certain applications and I have even seen problems with
such files when using SMB under Linux to access files on a server.

HTH,

Marc Schwartz
Duncan Murdoch wrote:

I'm curious about something: does "file extension" have a standard
definition?  Most (all?  I haven't tried them all) of the solutions
presented in this thread would return an empty string for the "plain
base" if given the filename ".bashrc".

right;  there's a straightforward fix to my solution that accounts for
cases such as '.bashrc':

names = c("foo.bar", ".zee")
sub("(.+)[.][^.]+$", "\\1", names)

you could also use a lookbehind if possible (not in r, afaik).

vQ
P.S. Any suggestions how to become more proficient with regular
expressions? The O'Reilly book ("Mastering...")? Whenever I tried
anything more complicated than basic usage (things like ^ $ * . ) in R,
I was way faster to write a new function (like above) instead of finding
a regex solution.

the book you mention is good.
you may also consider http://www.regular-expressions.info/

regexes are usually well explained with lots of examples in perl books.
By the way: it might be still possible to *write* regular expressions,
but what about code re-use? Are there people who can easily *read*
complicated regular expressions?

in some cases it is possible to write regular expressions in a way that
facilitates reading them by a human.  in perl, for example, you can use
so-called readable regexes:

/
   (.+)    # match and remember at least one arbitrary character
   [.]     # match a dot
   [^.]+ # match at least one non-dot character
   $  # end of string anchor
/x;

you can also use within regex comments:

/(.+)(?# one or more chars)[.](?# a dot)[^.]+(?# one or more
non-dots)$(?# end of string)/

nothing of the sorts in r, however.

vQ
On Fri, Jan 9, 2009 at 4:20 PM, Wacek Kusnierczyk

Duncan Murdoch wrote:

I'm curious about something: does "file extension" have a standard
definition?  Most (all?  I haven't tried them all) of the solutions
presented in this thread would return an empty string for the "plain
base" if given the filename ".bashrc".

right;  there's a straightforward fix to my solution that accounts for
cases such as '.bashrc':

names = c("foo.bar", ".zee")
sub("(.+)[.][^.]+$", "\\1", names)

you could also use a lookbehind if possible (not in r, afaik).

or:
sub(".*[.]", ".", names)
[1] ".bar" ".zee"
On Fri, Jan 9, 2009 at 4:28 PM, Wacek Kusnierczyk
Rau, Roland wrote:
P.S. Any suggestions how to become more proficient with regular
expressions? The O'Reilly book ("Mastering...")? Whenever I tried
anything more complicated than basic usage (things like ^ $ * . ) in R,
I was way faster to write a new function (like above) instead of finding
a regex solution.

the book you mention is good.
you may also consider http://www.regular-expressions.info/

regexes are usually well explained with lots of examples in perl books.

By the way: it might be still possible to *write* regular expressions,
but what about code re-use? Are there people who can easily *read*
complicated regular expressions?

in some cases it is possible to write regular expressions in a way that
facilitates reading them by a human.  in perl, for example, you can use
so-called readable regexes:

/
  (.+)    # match and remember at least one arbitrary character
  [.]     # match a dot
  [^.]+ # match at least one non-dot character
  $  # end of string anchor
/x;

you can also use within regex comments:

/(.+)(?# one or more chars)[.](?# a dot)[^.]+(?# one or more
non-dots)$(?# end of string)/

nothing of the sorts in r, however.
Supports that if you begin the regular expression with (?x) and
use perl = TRUE.  See ?regexp
On Fri, Jan 9, 2009 at 4:20 PM, Wacek Kusnierczyk

right;  there's a straightforward fix to my solution that accounts for
cases such as '.bashrc':

names = c("foo.bar", ".zee")
sub("(.+)[.][^.]+$", "\\1", names)

you could also use a lookbehind if possible (not in r, afaik).

or:

sub(".*[.]", ".", names)

[1] ".bar" ".zee"

it was "foo" that was desired...

vQ