Looking for package for data generation for classification and regression
On Fri Mar04'22 10:41:24AM, Paul Smith wrote:
From: Paul Smith <phhs80 at gmail.com> Date: Fri, 4 Mar 2022 10:41:24 +0000 To: Ranjan Maitra <mlmaitra at gmx.com> Cc: "r-help at r-project.org" <r-help at r-project.org> Subject: Re: [R] Looking for package for data generation for classification and regression On Fri, Mar 4, 2022 at 8:07 AM Ranjan Maitra <mlmaitra at gmx.com> wrote:
I am in need of generating artificial data for machine learning classification and regression analysis. What I am looking for is something similar to Python sklearn.datasets.make_classification and sklearn.datasets.make_regression: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html I have searched CRAN for something similar, but found nothing. Could someone please help me with this?
Not sure if this helps, but at least for classification and clustering, there is the MixSim package on CRAN which provides classification datasets according to an overall overlap measure.
Thanks, Ranjan, that is also quite helpful, since clustering is also a topic of the course! Paul
The Clustering Algorithms Referee Package (CARP) uses the same codebase but is more general. https://jmlr.org/papers/v12/melnykov11a.html Unfortunately, it is written in C, so may not help. It is on www.mloss.org at: https://mloss.org/software/view/248/ but perhaps should also be moved to github. Best wishes, Ranjan