Message-ID: <YiJGVgbJ+uj/fhaW@iastate.edu>
Date: 2022-03-04T17:03:18Z
From: Ranjan Maitra
Subject: Looking for package for data generation for classification and regression
In-Reply-To: <CALS=5moiamk9tzDdrJC05mqA-aiNOw67p6wniyuc8fOhGvZv9A@mail.gmail.com>
On Fri Mar04'22 10:41:24AM, Paul Smith wrote:
> From: Paul Smith <phhs80 at gmail.com>
> Date: Fri, 4 Mar 2022 10:41:24 +0000
> To: Ranjan Maitra <mlmaitra at gmx.com>
> Cc: "r-help at r-project.org" <r-help at r-project.org>
> Subject: Re: [R] Looking for package for data generation for
> classification and regression
>
> On Fri, Mar 4, 2022 at 8:07 AM Ranjan Maitra <mlmaitra at gmx.com> wrote:
> >
> > > I am in need of generating artificial data for machine learning
> > > classification and regression analysis. What I am looking for is
> > > something similar to Python sklearn.datasets.make_classification and
> > > sklearn.datasets.make_regression:
> > >
> > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html
> > >
> > > https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html
> > >
> > > I have searched CRAN for something similar, but found nothing. Could
> > > someone please help me with this?
> >
> > Not sure if this helps, but at least for classification and clustering, there is the MixSim package on CRAN which provides classification datasets according to an overall overlap measure.
>
> Thanks, Ranjan, that is also quite helpful, since clustering is also a
> topic of the course!
>
> Paul
>
The Clustering Algorithms Referee Package (CARP) uses the same codebase but is more general.
https://jmlr.org/papers/v12/melnykov11a.html
Unfortunately, it is written in C, so may not help.
It is on www.mloss.org at:
https://mloss.org/software/view/248/
but perhaps should also be moved to github.
Best wishes,
Ranjan