Skip to content

Extracting File Basename without Extension

26 messages · Gundala Viswanath, jim holtman, Henrique Dallazuanna +8 more

Messages 1–25 of 26

#
Dear all,

The basename() function returns the extension also:
[1] "myoutput.txt"


Is there any other function where it just returns
plain base:

"myoutput"

i.e. without 'txt'

- Gundala Viswanath
Jakarta - Indonesia
#
You can use 'sub' to get rid of the extensions:
[1] "filename"
[1] "filename"
[1] "filename without extension"
On Thu, Jan 8, 2009 at 9:10 PM, Gundala Viswanath <gundalav at gmail.com> wrote:

  
    
#
G'day all,

On Fri, 9 Jan 2009 08:12:18 -0200
"Henrique Dallazuanna" <wwwhsd at gmail.com> wrote:

            
Or, in case that the extension has more than three letters or "myfile"
is a vector of names:

R> myfile <- "path1/path2/myoutput.txt"
R> sapply(strsplit(basename(myfile),"\\."), function(x) paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput"
R> myfile2 <- c(myfile, "path2/path3/myoutput.temp")
R> sapply(strsplit(basename(myfile2),"\\."), function(x) paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput" "myoutput"
R> myfile3 <- c(myfile2, "path4/path5/my.out.put.xls")
R> sapply(strsplit(basename(myfile3),"\\."), function(x) paste(x[1:(length(x)-1)], collapse="."))
[1] "myoutput"   "myoutput"   "my.out.put"

HTH.

Cheers,

	Berwin
=========================== Full address =============================
Berwin A Turlach                            Tel.: +65 6516 4416 (secr)
Dept of Statistics and Applied Probability        +65 6516 6650 (self)
Faculty of Science                          FAX : +65 6872 3919       
National University of Singapore     
6 Science Drive 2, Blk S16, Level 7          e-mail: statba at nus.edu.sg
Singapore 117546                    http://www.stat.nus.edu.sg/~statba
#
Berwin A Turlach wrote:
or have sub do the job for you:

filenames.ext = c("foo.bar", basename("foo/bar/hello.dolly"))
(filenames.noext = sub("[.][^.]*$", "", filenames.ext, perl=TRUE))



vQ
#
On Fri, Jan 9, 2009 at 6:52 AM, Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
We can omit perl = TRUE here.
#
Gabor Grothendieck wrote:
or maybe not, depending on the actual task:

names = replicate(10000, paste(sample(c(letters, "."), 100,
replace=TRUE), collapse=""))
system.time(replicate(10, sub("[.][^.]*$", "", names, perl=TRUE)))
system.time(replicate(10, sub("[.][^.]*$", "", names)))

vQ
#
G'day Wacek,

On Fri, 09 Jan 2009 12:52:46 +0100
Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:

            
Apparently also a possibility, I guess it can be made to work with the
original example and my extensions.

Though, it seems to require the knowledge of perl, or at least perl's
regular expression. 

Cheers,

	Berwin
#
Berwin A Turlach wrote:
g'afternoon berwin,
i guess it does work with the original example and your extensions.
oh my, sorry.  it' so bad to go an inch out of the cosy world of r. 
but, as gabor pointed, 'perl=TRUE' is inessential here, so you actually
need to know just (very basic) regular expressions, with no 'perl'
implied.  having learnt this simple regex syntax you can avoid the need
for looking up strsplit and paste in tfm, so i'd consider it worthwhile.

vQ
#
G'day Wacek,

On Fri, 09 Jan 2009 14:22:19 +0100
Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:

            
And I thought that you would have known for sure.....
Well, if that's how you feel, don't do it.

I regularly use other languages besides R.  Mostly C and Fortran,
occasionally Python.  But I never found time to learn Perl or Java or
awk or C++ or....; some people do not have the time to learn all
languages under the sun. Also, if one concentrates on a few, one can
learn them really well.
I thought that your answer to Gabor indicated that, depending on the
context, perl=TRUE was essential; though I must admit that I did not
run that code.
As people say, YMMV, I do not need to look up strsplit and/or paste;
but I would have to look up what the regular expression syntax or
finally memorise it; something I did not consider worthwhile so far.

Cheers,

	Berwin
#
On Fri, 9 Jan 2009, Berwin A Turlach wrote:

            
using fixed = TRUE and not escaping '.' is slightly more efficient.
Actually, that's a valid regex in any of the variants offered.  A more 
conventional writing of it is the second of
[1] "foo.bar"
[1] "foo.bar"

It is the last that is used at various points in R's own code 
(although sometimes with restrictions on what an 'extension' is, e.g.

sub("\\.[[:alnum:]]*$", "", f)

appears in the current SHLIB code.)
#
Berwin A Turlach wrote:
i thought i did until you made that comment, which made me think you've
just discovered it doesn't.
quite the opposite.
i think i did not suggest the original poster to learn perl.

many responses on this list involve regular expressions, and regexes are
so ubiquitous in code that has to do with parsing and processing text,
be it filenames or loads of data, that a user of r may well want to
learn a bit of this stuff in addition to the details about how real
numbers are represented below the surface.

you may want to keep saying 'use r where applicable instead of worse
tools', and i'd like to keep saying 'use regexes where they're
applicable instead of worse tools'.  same philosophy.
i see i may have expressed that wrongly.  it should have been "but maybe
you'd want to keep it", meaning this is not essential for the final
result, but may improve the runtime.
well, the regex syntax is fairly standard, though variations exist among
languages.  once you learn it, or rather the ideas behind the syntax,
you're well equipped for quite a range of tasks.  on the other hand, the
details of strsplit and paste are pretty r-specific, and you don't gain
much by remembering them (except for freeing yourself from having to
read tfm again).

i have seen quite a bunch of programs written by scientists who spent
over one hundred lines of code on just parsing command line arguments; 
i wish they knew regexes exist (and better, getopt-like modules too). 
if you're doing serious programming without knowing regexes, you're
rather lucky.

vQ
#
Prof Brian Ripley wrote:
more conventional in r, perhaps.  it's not portable, due to the 'escape
the escape to have an escape' feature of r when it comes to regexes; in
perl, for example, /\\.[^.]*$/ would hardly do the job.

vQ
#
G'day Wacek,

On Fri, 09 Jan 2009 15:19:46 +0100
Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:

            
As I see it, you didn't suggest anything to the original poster, at
least not directly.  But, since these days you have to be subscribed
to r-help to post IIRC, it is probably reasonable to assume that the
original poster saw your posting.
You should really get rid of that chip on your shoulder.
Now I am getting really confused, isn't your suggested solution not
using R too?  So why would I say something like this?  I suggest you get
rid of the chip on the other shoulder too... :)
Exactly, and the existence of these variations which can trip one up
and require to rtfm in anything but the most simplest situations,
that's why regexp are not what jumps first to my mind.
Probably depends on what one calls serious.

Best wishes,

	Berwin
#
On 1/8/2009 9:10 PM, Gundala Viswanath wrote:
I'm curious about something: does "file extension" have a standard 
definition?  Most (all?  I haven't tried them all) of the solutions 
presented in this thread would return an empty string for the "plain 
base" if given the filename ".bashrc".

Windows (where file extensions really mean something), though reluctant 
to create such a file, appears to agree that the extension is bashrc, 
even though to me it appears clear that that file has no extension.

Duncan Murdoch
#
Hi,
This, of course, assumes that the extensions are always 3 characters.
Sometimes there might be more ("index.html"), sometimes less
("shellscript.sh").

Although my solution is not as compact as the others (I wish I was
proficient in 'mastering regular expressions'), I'd like to provide my
little code-snippet which does not require any regular expressions (but
expects a . in the filename).

######################
x1 <- "roland.txt"
x2 <- "roland.html"
x3 <- "roland.sh"

no.extension <- function(astring) {
  if (substr(astring, nchar(astring), nchar(astring))==".") {
    return(substr(astring, 1, nchar(astring)-1))
  } else {
    no.extension(substr(astring, 1, nchar(astring)-1))
  }
}

no.extension(x1)
no.extension(x2)
no.extension(x3)
######################

Hope this helps a bit,
Roland

P.S. Any suggestions how to become more proficient with regular
expressions? The O'Reilly book ("Mastering...")? Whenever I tried
anything more complicated than basic usage (things like ^ $ * . ) in R,
I was way faster to write a new function (like above) instead of finding
a regex solution.

By the way: it might be still possible to *write* regular expressions,
but what about code re-use? Are there people who can easily *read*
complicated regular expressions?

----------
This mail has been sent through the MPI for Demographic Research.  Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance.
#
On Fri, Jan 9, 2009 at 10:23 AM, Rau, Roland <Rau at demogr.mpg.de> wrote:
The regex solutions posted did not make this assumption.
See the links in the Links box at:
http://gsubfn.googlecode.com
#
Duncan Murdoch wrote:
I'm not sure what is clear about it, but the GNU utility agrees with you:
.exe
1

Anyone want to contribute code for an optional suffix= argument for R's
basename()?
#
on 01/09/2009 09:00 AM Duncan Murdoch wrote:
Duncan,

That is going to be highly OS and even OS version specific. More
information here:

  http://en.wikipedia.org/wiki/Filename
  http://en.wikipedia.org/wiki/Filename_extension

There are relevant standard extensions for standard file formats:

  http://en.wikipedia.org/wiki/List_of_file_formats

but that does not guarantee that user created filenames will adhere to
them, especially for text files.

As you note, filenames beginning with a '.' will be common on
Unixen/Linuxen as otherwise normally hidden system/config files. Such
files would actually create problems if attempted to be opened on
Windows with certain applications and I have even seen problems with
such files when using SMB under Linux to access files on a server.

HTH,

Marc Schwartz
#
right;  there's a straightforward fix to my solution that accounts for
cases such as '.bashrc':

names = c("foo.bar", ".zee")
sub("(.+)[.][^.]+$", "\\1", names)

you could also use a lookbehind if possible (not in r, afaik).

vQ
#
Rau, Roland wrote:
the book you mention is good.
you may also consider http://www.regular-expressions.info/

regexes are usually well explained with lots of examples in perl books.
in some cases it is possible to write regular expressions in a way that
facilitates reading them by a human.  in perl, for example, you can use
so-called readable regexes:

/
   (.+)    # match and remember at least one arbitrary character
   [.]     # match a dot
   [^.]+ # match at least one non-dot character
   $  # end of string anchor
/x;

you can also use within regex comments:

/(.+)(?# one or more chars)[.](?# a dot)[^.]+(?# one or more
non-dots)$(?# end of string)/


nothing of the sorts in r, however.

vQ
#
On Fri, Jan 9, 2009 at 4:20 PM, Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
or:
[1] ".bar" ".zee"
#
On Fri, Jan 9, 2009 at 4:28 PM, Wacek Kusnierczyk
<Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> wrote:
Supports that if you begin the regular expression with (?x) and
use perl = TRUE.  See ?regexp
1 day later
#
Gabor Grothendieck wrote:
it was "foo" that was desired...

vQ