Dataframe Manipulation
Hello Ulrik,
Can you please explain this code means how and what this code is doing
because I'm not able to understand it, if you can explain it i can use it
in future by doing some Lil bit manipulation.
Thanks
data_help <-
data_help %>%
mutate(Purchase_ID = 1:n()) %>%
group_by(Purchase_ID) %>%
do(split_items(.))
cat_help %>% gather("Foo", "Item") %>%
filter(!is.na(Item)) %>%
left_join(data_help, by = "Item") %>%
group_by(Foo, Purchase_ID) %>%
summarise(Item = paste(Item, collapse = ", ")) %>%
spread(key = "Foo", value = "Item")
On 31 August 2017 at 13:17, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:
Hi Hemant,
the solution is really quite similar, and the logic is identical:
library(readr)
library(dplyr)
library(stringr)
library(tidyr)
data_help <- read_csv("data_help.csv")
cat_help <- read_csv("cat_help.csv")
# Helper function to split the Items and create a data_frame
split_items <- function(items){
x <- items$Items_purchased_on_Receipts %>%
str_split(pattern = ",") %>%
unlist(use.names = FALSE)
data_frame(Item = x, Purchase_ID = items$Purchase_ID)
}
data_help <-
data_help %>%
mutate(Purchase_ID = 1:n()) %>%
group_by(Purchase_ID) %>%
do(split_items(.))
cat_help %>% gather("Foo", "Item") %>%
filter(!is.na(Item)) %>%
left_join(data_help, by = "Item") %>%
group_by(Foo, Purchase_ID) %>%
summarise(Item = paste(Item, collapse = ", ")) %>%
spread(key = "Foo", value = "Item")
HTH
Ulrik
On Wed, 30 Aug 2017 at 13:22 Hemant Sain <hemantsain55 at gmail.com> wrote:
by using these two tables we have to create third table in this format where categories will be on the top and transaction will be in the rows, On 30 August 2017 at 16:42, Hemant Sain <hemantsain55 at gmail.com> wrote:
Hello Ulrik, Can you please once check this code again on the following data set because it doesn't giving same output to me due to absence of quantity,a compare to previous demo data set becaue spiting is getting done on the basis of quantity and in real data set quantity is missing. so please use following data set and help me out please consider this mail is my final email i won't bother you again but its about my job please help me . Note* the file I'm attaching is very confidential On 30 August 2017 at 15:02, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:
Hi Hemant,
Does this help you along?
table_1 <- textConnection("Item_1;Item_2;Item_3
1KG banana;300ML milk;1kg sugar
2Large Corona_Beer;2pack Fries;
2 Lux_Soap;1kg sugar;")
table_1 <- read.csv(table_1, sep = ";", na.strings = "",
stringsAsFactors = FALSE, check.names = FALSE)
table_2 <- textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy
Products
Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk
Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter
Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red
M;sugar
Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste")
table_2 <- read.csv(table_2, sep = ";", na.strings = "",
stringsAsFactors = FALSE, check.names = FALSE)
library(tidyr)
library(dplyr)
table_2 <- gather(table_2, "Category", "Item")
table_1 <- gather(table_1, "Foo", "Item") %>%
filter(!is.na(Item))
table_1 <- separate(table_1, col = "Item", into = c("Quantity",
"Item"), sep = " ")
table_3 <- left_join(table_1, table_2, by = "Item") %>%
mutate(Item = paste(Quantity, Item)) %>%
select(-Quantity)
table_3 %>%
group_by(Foo, Category) %>%
summarise(Item = paste(Item, collapse = ", ")) %>%
spread(key = "Category", value = "Item")
You need to figure out how to handle words written with different cases
and how to get the quantity in an universal way. For the code above, I
corrected these things by hand in the example data.
HTH
Ulrik
On Wed, 30 Aug 2017 at 10:16 Hemant Sain <hemantsain55 at gmail.com>
wrote:
Hey PIKAL, It's not a homework neithe that is the real dataset i have signer NDA for my company so that i can share the original data file, Actually I'm working on a market basket analysis task but not able to convert my existing data table to appropriate format so that i can apply Apriori algorithm using R, and this is very important me to get it done because I'm an intern and if i won't get it done they will not going to hire me as a full-time employee. i tried everything by myself but not able to get it done. your precious 10-15 can save my upcoming years. so please if you can please help me through this. i want another dataset based on first two dataset i have mentioned . Thanks On 30 August 2017 at 12:49, PIKAL Petr <petr.pikal at precheza.cz> wrote:
Hi It seems to me like homework, there is no homework policy on this
help
list. What do you want to do with your table 3? It seems to me futile. Anyway, some combination of melt, merge, cast and regular expressions could be employed in such task, but it could be rather tricky. But be aware that Suger does not match sugar (I wonder that sugar is dairy product) and you mix uppercase and lowercase letters which could be also problematic, when matching words. Cheers Petr
-----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
Hemant
Sain
Sent: Wednesday, August 30, 2017 8:28 AM To: r-help at r-project.org Subject: [R] Dataframe Manipulation i want to do a market basket analysis and I?m trying to create a
dataset
for that
i have two tables, one table contains daily transaction of
products in
which
each row of table shows item purchased by the customer, The second
table
contains parent group under those products are fallen, for example
under
fruit
category there are several fruits like mango, banana, apple etc. i want to create a third table in which parent group are mentioned
as
header
which can be extracted from Table 2, and all the rows represent
transaction of
products with their names, and if there is no transaction for any parent
category
then
the cell supposed to fill as NA. please help me with R or C/c++
code( R
would be
preferred) here I?m attaching you all three tables for better
reference
i have
first two tables and i want to get a table like table 3 Tables are explained in the attached doc. -- hemantsain.com
________________________________ Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a
jsou
ur?eny pouze jeho adres?t?m. Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a
jeho kopie
vyma?te ze sv?ho syst?mu. Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento
jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat. Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou
modifikacemi
?i zpo?d?n?m p?enosu e-mailu. V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?: - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu. - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn?
p?ijmout;
Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze
strany
p??jemce s dodatkem ?i odchylkou. - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech. - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn?
zmocn?n
nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi
tohoto
emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo
jejich
existence je adres?tovi ?i osob? j?m zastoupen? zn?m?. This e-mail and any documents attached to it may be confidential and
are
intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and
its
copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any
manner.
The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of
the
email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering
into a
contract in any time, for any reason, and without stating any
reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer)
excludes
any acceptance of the offer on the part of the recipient containing
any
amendment or variation. - the sender insists on that the respective contract is concluded
only
upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to
enter
into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such
authorization
or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization
is
known to the recipient of the person represented by the recipient.
--
hemantsain.com
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html <http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
-- hemantsain.com
-- hemantsain.com
-- hemantsain.com
On 31 August 2017 at 13:17, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:
Hi Hemant,
the solution is really quite similar, and the logic is identical:
library(readr)
library(dplyr)
library(stringr)
library(tidyr)
data_help <- read_csv("data_help.csv")
cat_help <- read_csv("cat_help.csv")
# Helper function to split the Items and create a data_frame
split_items <- function(items){
x <- items$Items_purchased_on_Receipts %>%
str_split(pattern = ",") %>%
unlist(use.names = FALSE)
data_frame(Item = x, Purchase_ID = items$Purchase_ID)
}
data_help <-
data_help %>%
mutate(Purchase_ID = 1:n()) %>%
group_by(Purchase_ID) %>%
do(split_items(.))
cat_help %>% gather("Foo", "Item") %>%
filter(!is.na(Item)) %>%
left_join(data_help, by = "Item") %>%
group_by(Foo, Purchase_ID) %>%
summarise(Item = paste(Item, collapse = ", ")) %>%
spread(key = "Foo", value = "Item")
HTH
Ulrik
On Wed, 30 Aug 2017 at 13:22 Hemant Sain <hemantsain55 at gmail.com> wrote:
by using these two tables we have to create third table in this format where categories will be on the top and transaction will be in the rows, On 30 August 2017 at 16:42, Hemant Sain <hemantsain55 at gmail.com> wrote:
Hello Ulrik, Can you please once check this code again on the following data set because it doesn't giving same output to me due to absence of quantity,a compare to previous demo data set becaue spiting is getting done on the basis of quantity and in real data set quantity is missing. so please use following data set and help me out please consider this mail is my final email i won't bother you again but its about my job please help me . Note* the file I'm attaching is very confidential On 30 August 2017 at 15:02, Ulrik Stervbo <ulrik.stervbo at gmail.com> wrote:
Hi Hemant,
Does this help you along?
table_1 <- textConnection("Item_1;Item_2;Item_3
1KG banana;300ML milk;1kg sugar
2Large Corona_Beer;2pack Fries;
2 Lux_Soap;1kg sugar;")
table_1 <- read.csv(table_1, sep = ";", na.strings = "",
stringsAsFactors = FALSE, check.names = FALSE)
table_2 <- textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy
Products
Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk
Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter
Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red
M;sugar
Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste")
table_2 <- read.csv(table_2, sep = ";", na.strings = "",
stringsAsFactors = FALSE, check.names = FALSE)
library(tidyr)
library(dplyr)
table_2 <- gather(table_2, "Category", "Item")
table_1 <- gather(table_1, "Foo", "Item") %>%
filter(!is.na(Item))
table_1 <- separate(table_1, col = "Item", into = c("Quantity",
"Item"), sep = " ")
table_3 <- left_join(table_1, table_2, by = "Item") %>%
mutate(Item = paste(Quantity, Item)) %>%
select(-Quantity)
table_3 %>%
group_by(Foo, Category) %>%
summarise(Item = paste(Item, collapse = ", ")) %>%
spread(key = "Category", value = "Item")
You need to figure out how to handle words written with different cases
and how to get the quantity in an universal way. For the code above, I
corrected these things by hand in the example data.
HTH
Ulrik
On Wed, 30 Aug 2017 at 10:16 Hemant Sain <hemantsain55 at gmail.com>
wrote:
Hey PIKAL, It's not a homework neithe that is the real dataset i have signer NDA for my company so that i can share the original data file, Actually I'm working on a market basket analysis task but not able to convert my existing data table to appropriate format so that i can apply Apriori algorithm using R, and this is very important me to get it done because I'm an intern and if i won't get it done they will not going to hire me as a full-time employee. i tried everything by myself but not able to get it done. your precious 10-15 can save my upcoming years. so please if you can please help me through this. i want another dataset based on first two dataset i have mentioned . Thanks On 30 August 2017 at 12:49, PIKAL Petr <petr.pikal at precheza.cz> wrote:
Hi It seems to me like homework, there is no homework policy on this
help
list. What do you want to do with your table 3? It seems to me futile. Anyway, some combination of melt, merge, cast and regular expressions could be employed in such task, but it could be rather tricky. But be aware that Suger does not match sugar (I wonder that sugar is dairy product) and you mix uppercase and lowercase letters which could be also problematic, when matching words. Cheers Petr
-----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
Hemant
Sain
Sent: Wednesday, August 30, 2017 8:28 AM To: r-help at r-project.org Subject: [R] Dataframe Manipulation i want to do a market basket analysis and I?m trying to create a
dataset
for that
i have two tables, one table contains daily transaction of
products in
which
each row of table shows item purchased by the customer, The second
table
contains parent group under those products are fallen, for example
under
fruit
category there are several fruits like mango, banana, apple etc. i want to create a third table in which parent group are mentioned
as
header
which can be extracted from Table 2, and all the rows represent
transaction of
products with their names, and if there is no transaction for any parent
category
then
the cell supposed to fill as NA. please help me with R or C/c++
code( R
would be
preferred) here I?m attaching you all three tables for better
reference
i have
first two tables and i want to get a table like table 3 Tables are explained in the attached doc. -- hemantsain.com
________________________________ Tento e-mail a jak?koliv k n?mu p?ipojen? dokumenty jsou d?v?rn? a
jsou
ur?eny pouze jeho adres?t?m. Jestli?e jste obdr?el(a) tento e-mail omylem, informujte laskav? neprodlen? jeho odes?latele. Obsah tohoto emailu i s p??lohami a
jeho kopie
vyma?te ze sv?ho syst?mu. Nejste-li zam??len?m adres?tem tohoto emailu, nejste opr?vn?ni tento
jakkoliv u??vat, roz?i?ovat, kop?rovat ?i zve?ej?ovat. Odes?latel e-mailu neodpov?d? za eventu?ln? ?kodu zp?sobenou
modifikacemi
?i zpo?d?n?m p?enosu e-mailu. V p??pad?, ?e je tento e-mail sou??st? obchodn?ho jedn?n?: - vyhrazuje si odes?latel pr?vo ukon?it kdykoliv jedn?n? o uzav?en? smlouvy, a to z jak?hokoliv d?vodu i bez uveden? d?vodu. - a obsahuje-li nab?dku, je adres?t opr?vn?n nab?dku bezodkladn?
p?ijmout;
Odes?latel tohoto e-mailu (nab?dky) vylu?uje p?ijet? nab?dky ze
strany
p??jemce s dodatkem ?i odchylkou. - trv? odes?latel na tom, ?e p??slu?n? smlouva je uzav?ena teprve v?slovn?m dosa?en?m shody na v?ech jej?ch n?le?itostech. - odes?latel tohoto emailu informuje, ?e nen? opr?vn?n uzav?rat za spole?nost ??dn? smlouvy s v?jimkou p??pad?, kdy k tomu byl p?semn?
zmocn?n
nebo p?semn? pov??en a takov? pov??en? nebo pln? moc byly adres?tovi
tohoto
emailu p??padn? osob?, kterou adres?t zastupuje, p?edlo?eny nebo
jejich
existence je adres?tovi ?i osob? j?m zastoupen? zn?m?. This e-mail and any documents attached to it may be confidential and
are
intended only for its intended recipients. If you received this e-mail by mistake, please immediately inform its sender. Delete the contents of this e-mail with all attachments and
its
copies from your system. If you are not the intended recipient of this e-mail, you are not authorized to use, disseminate, copy or disclose this e-mail in any
manner.
The sender of this e-mail shall not be liable for any possible damage caused by modifications of the e-mail or by delay with transfer of
the
email. In case that this e-mail forms part of business dealings: - the sender reserves the right to end negotiations about entering
into a
contract in any time, for any reason, and without stating any
reasoning.
- if the e-mail contains an offer, the recipient is entitled to immediately accept such offer; The sender of this e-mail (offer)
excludes
any acceptance of the offer on the part of the recipient containing
any
amendment or variation. - the sender insists on that the respective contract is concluded
only
upon an express mutual agreement on all its aspects. - the sender of this e-mail informs that he/she is not authorized to
enter
into any contracts on behalf of the company except for cases in which he/she is expressly authorized to do so in writing, and such
authorization
or power of attorney is submitted to the recipient or the person represented by the recipient, or the existence of such authorization
is
known to the recipient of the person represented by the recipient.
--
hemantsain.com
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/ posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- hemantsain.com
-- hemantsain.com
hemantsain.com [[alternative HTML version deleted]]