Dear All, I am in need of generating artificial data for machine learning classification and regression analysis. What I am looking for is something similar to Python sklearn.datasets.make_classification and sklearn.datasets.make_regression: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html I have searched CRAN for something similar, but found nothing. Could someone please help me with this? Thanks in advance, Paul
Looking for package for data generation for classification and regression
9 messages · Tom Woolman, Sarah Goslee, Ranjan Maitra +1 more
Hi Paul. Have you considered just going onto Kaggle and GitHub and searching for some of the many freely available real datasets that are posted there? I'm seeing a lot of productivity there days with research focused on data generation, and not just on creating algorithms and predictive models. Which is a good thing for us ;) One of the current research papers I'm working on now is based on mining a dataset I discovered on Kaggle a few months back and trying to create a novel solution for that. Proper credit will of course be provided in the citation references for the data provider. Thanks, Tom
On 2022-03-03 16:00, Paul Smith wrote:
Dear All, I am in need of generating artificial data for machine learning classification and regression analysis. What I am looking for is something similar to Python sklearn.datasets.make_classification and sklearn.datasets.make_regression: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html I have searched CRAN for something similar, but found nothing. Could someone please help me with this? Thanks in advance, Paul
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sounds interesting, Tom! Thanks! I am trying to find datasets for creating assignments for students of a course of machine learning. Paul
On Thu, Mar 3, 2022 at 9:04 PM Tom Woolman <twoolman at ontargettek.com> wrote:
Hi Paul. Have you considered just going onto Kaggle and GitHub and searching for some of the many freely available real datasets that are posted there? I'm seeing a lot of productivity there days with research focused on data generation, and not just on creating algorithms and predictive models. Which is a good thing for us ;) One of the current research papers I'm working on now is based on mining a dataset I discovered on Kaggle a few months back and trying to create a novel solution for that. Proper credit will of course be provided in the citation references for the data provider. Thanks, Tom On 2022-03-03 16:00, Paul Smith wrote:
Dear All, I am in need of generating artificial data for machine learning classification and regression analysis. What I am looking for is something similar to Python sklearn.datasets.make_classification and sklearn.datasets.make_regression: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html I have searched CRAN for something similar, but found nothing. Could someone please help me with this? Thanks in advance, Paul
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Paul, If you aren't committed to creating your own, the cluster.datasets package might be of interest. I've also used http://cs.joensuu.fi/sipu/datasets/ quite often. Sarah
On Thu, Mar 3, 2022 at 4:20 PM Paul Smith <phhs80 at gmail.com> wrote:
Sounds interesting, Tom! Thanks! I am trying to find datasets for creating assignments for students of a course of machine learning. Paul On Thu, Mar 3, 2022 at 9:04 PM Tom Woolman <twoolman at ontargettek.com> wrote:
Hi Paul. Have you considered just going onto Kaggle and GitHub and searching for some of the many freely available real datasets that are posted there? I'm seeing a lot of productivity there days with research focused on data generation, and not just on creating algorithms and predictive models. Which is a good thing for us ;) One of the current research papers I'm working on now is based on mining a dataset I discovered on Kaggle a few months back and trying to create a novel solution for that. Proper credit will of course be provided in the citation references for the data provider. Thanks, Tom On 2022-03-03 16:00, Paul Smith wrote:
Dear All, I am in need of generating artificial data for machine learning classification and regression analysis. What I am looking for is something similar to Python sklearn.datasets.make_classification and sklearn.datasets.make_regression: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html I have searched CRAN for something similar, but found nothing. Could someone please help me with this? Thanks in advance, Paul
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Sarah Goslee (she/her) http://www.sarahgoslee.com
Thanks, Sarah! Your answer is quite helpful! Paul
On Thu, Mar 3, 2022 at 10:43 PM Sarah Goslee <sarah.goslee at gmail.com> wrote:
Hi Paul, If you aren't committed to creating your own, the cluster.datasets package might be of interest. I've also used http://cs.joensuu.fi/sipu/datasets/ quite often. Sarah On Thu, Mar 3, 2022 at 4:20 PM Paul Smith <phhs80 at gmail.com> wrote:
Sounds interesting, Tom! Thanks! I am trying to find datasets for creating assignments for students of a course of machine learning. Paul On Thu, Mar 3, 2022 at 9:04 PM Tom Woolman <twoolman at ontargettek.com> wrote:
Hi Paul. Have you considered just going onto Kaggle and GitHub and searching for some of the many freely available real datasets that are posted there? I'm seeing a lot of productivity there days with research focused on data generation, and not just on creating algorithms and predictive models. Which is a good thing for us ;) One of the current research papers I'm working on now is based on mining a dataset I discovered on Kaggle a few months back and trying to create a novel solution for that. Proper credit will of course be provided in the citation references for the data provider. Thanks, Tom On 2022-03-03 16:00, Paul Smith wrote:
Dear All, I am in need of generating artificial data for machine learning classification and regression analysis. What I am looking for is something similar to Python sklearn.datasets.make_classification and sklearn.datasets.make_regression: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html I have searched CRAN for something similar, but found nothing. Could someone please help me with this? Thanks in advance, Paul
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Sarah Goslee (she/her) http://www.sarahgoslee.com
On Thu Mar03'22 09:00:08PM, Paul Smith wrote:
From: Paul Smith <phhs80 at gmail.com> Date: Thu, 3 Mar 2022 21:00:08 +0000 To: "r-help at r-project.org" <r-help at r-project.org> Subject: [R] Looking for package for data generation for classification and regression Dear All, I am in need of generating artificial data for machine learning classification and regression analysis. What I am looking for is something similar to Python sklearn.datasets.make_classification and sklearn.datasets.make_regression: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html I have searched CRAN for something similar, but found nothing. Could someone please help me with this?
Not sure if this helps, but at least for classification and clustering, there is the MixSim package on CRAN which provides classification datasets according to an overall overlap measure. Hope this helps! Best wishes, Ranjan
On Fri, Mar 4, 2022 at 8:07 AM Ranjan Maitra <mlmaitra at gmx.com> wrote:
I am in need of generating artificial data for machine learning classification and regression analysis. What I am looking for is something similar to Python sklearn.datasets.make_classification and sklearn.datasets.make_regression: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html I have searched CRAN for something similar, but found nothing. Could someone please help me with this?
Not sure if this helps, but at least for classification and clustering, there is the MixSim package on CRAN which provides classification datasets according to an overall overlap measure.
Thanks, Ranjan, that is also quite helpful, since clustering is also a topic of the course! Paul
On Fri Mar04'22 10:41:24AM, Paul Smith wrote:
From: Paul Smith <phhs80 at gmail.com> Date: Fri, 4 Mar 2022 10:41:24 +0000 To: Ranjan Maitra <mlmaitra at gmx.com> Cc: "r-help at r-project.org" <r-help at r-project.org> Subject: Re: [R] Looking for package for data generation for classification and regression On Fri, Mar 4, 2022 at 8:07 AM Ranjan Maitra <mlmaitra at gmx.com> wrote:
I am in need of generating artificial data for machine learning classification and regression analysis. What I am looking for is something similar to Python sklearn.datasets.make_classification and sklearn.datasets.make_regression: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html I have searched CRAN for something similar, but found nothing. Could someone please help me with this?
Not sure if this helps, but at least for classification and clustering, there is the MixSim package on CRAN which provides classification datasets according to an overall overlap measure.
Thanks, Ranjan, that is also quite helpful, since clustering is also a topic of the course! Paul
The Clustering Algorithms Referee Package (CARP) uses the same codebase but is more general. https://jmlr.org/papers/v12/melnykov11a.html Unfortunately, it is written in C, so may not help. It is on www.mloss.org at: https://mloss.org/software/view/248/ but perhaps should also be moved to github. Best wishes, Ranjan
On Fri, Mar 4, 2022 at 5:03 PM Ranjan Maitra <mlmaitra at gmx.com> wrote:
I am in need of generating artificial data for machine learning classification and regression analysis. What I am looking for is something similar to Python sklearn.datasets.make_classification and sklearn.datasets.make_regression: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html I have searched CRAN for something similar, but found nothing. Could someone please help me with this?
Not sure if this helps, but at least for classification and clustering, there is the MixSim package on CRAN which provides classification datasets according to an overall overlap measure.
Thanks, Ranjan, that is also quite helpful, since clustering is also a topic of the course!
The Clustering Algorithms Referee Package (CARP) uses the same codebase but is more general. https://jmlr.org/papers/v12/melnykov11a.html Unfortunately, it is written in C, so may not help. It is on www.mloss.org at: https://mloss.org/software/view/248/ but perhaps should also be moved to github.
That is quite interesting, Ranjan! I hope you will have that on GitHub as a R package ready for installation. Best wishes, Paul