Skip to content

A simple string alienation problem

4 messages · Soumyadip Bhattacharyya, Eric Berger, PIKAL Petr +1 more

#
***Dear Eric,*****
sending from gmail following the way you suggested. Hope now everyone
can see this email. **** I have also attached the first 50 rows of the
FIght.csv.***
***Output - I will try to do Market basket analysis on this to find
out rules that I am learning. so once I have the data in transactional
format - then I can run the algorithm and keep learning. This little
problem has caused a barrier in my path - I can alienate the string in
excel - but wanted to do in R - so researching I tried doing this:
x<- substr(x, 1, nchar(x) - 1)  // but I wasn't successful and I tried
many other things - its not coming in the transactional format. ***
Hence now reached out to the experts.**** Many Thanks.

Hello Dear R Community,

I would ask a little bit of help from you please:I have a dataset,
which is in a CSV file ? I have read it into R as follows:

      V1
tropical fruit"
whole milk"
pip fruit"
other vegetables"
whole milk"
rolls/buns"

The issue is: the data set in csv file also appears with the quotation
marks ?. I can?t get rid of the quotation marks. I want to do it in R.
The Quotes only appear at the end of the string. The dataset has many
rows ? this is just a copy. My intention is to be able to get rid of
the quotes and then want to separate the strings with a ?/?. i.e.
rolls/buns should be rolls in one column and buns in another.

I know this is something very simple I am lacking ? but if you could
please show me how to do this? If someone could throw some light
please. I read the data in with a simple read.csv statement:
Output:
'data.frame':   38765 obs. of  1 variable:
$ V1: chr  "tropical fruit\"" "whole milk\"" "pip fruit\"" "other
vegetables\"" ...

Many Thanks in advance for your help.
Kind Regards,
Sam.
#
Hi Sam,
My code below adds new columns to your data frame so you have the original
columns in order to compare.
(Also this could help in case there are a few rows that don't work in the
full set.)
name
HTH,
Eric




On Tue, Apr 14, 2020 at 1:55 PM Soumyadip Bhattacharyya <
s.b.sam2801 at gmail.com> wrote:

            

  
  
#
Hi

attachement did not went through, only limited attachement types are allowed - see Posting guide.

I am not sure if R is the best possibility to remove some characters. If " is at the end of all your strings
structure(list(V1 = c("adfvadfg\"", "sdfasd\"", "vafdv\"", "hjk/tiuk\""
)), class = "data.frame", row.names = c(NA, -4L))
combination of sapply, apply and substr could remove trailing ".
V1        
[1,] "adfvadfg"
[2,] "sdfasd"  
[3,] "vafdv"   
[4,] "hjk/tiuk"
And splitting acccording to / is simpler but it ends in list
$adfvadfg
[1] "adfvadfg"

$sdfasd
[1] "sdfasd"

$vafdv
[1] "vafdv"

$`hjk/tiuk`
[1] "hjk"  "tiuk"

Changing to data frame you could find yourself, I believe it is mentioned several times on Stackexchange, simple as.data.frame is not a best option.

Cheers
Petr
#
I'm very confused by the phrase "string alienation".
You mention two problems:
(1) remove " from a string
      sub('"', '', vector.of.strings)
      will do that.  See
      ?grep
      for details.
(2) split a string at occurrences of /
      strsplit(vector.of.strings, "/")
      will do that.  It gives you a list of vectors of strings.  See
      ?strsplit
      for details.