Skip to content

test cases (data) for data based modeling

1 message · Immanuel

#
Hello all,

I'm working mostly with machine learning code in R and looking for a structured
way to check if my code is working properly.

For example if I train a classifier on some data. How do I know if the
good / bad results
are related to the data are not just an programming error that I
introduced somewhere.

results are to good: I might have used some part of the test data for training
results are to bad: could have any reason

I know that I can in principle generate data containing no information
at all or pure information to benchmark
my code but is there a more elaborate or easyer way to that?

I guess what I'm basically looking for is some kind of unit testing
framework to generate test data
for machine learning tasks, I read about the package RUnit but don't
really know how to proceed from
there.

Any ideas?
How do you test your data analysis code?

best regards,
Immanuel