Skip to content

what is this averaging function called ?, has R a built in function for it ?

3 messages · madr, Tal Galili

#
I know little of statistics and have created this function out of intuition.
But since this algorithm is so basic I wonder what is the proper name of
this function and is it build in R.

here is some code in PHP to illustrate what the function is doing, it uses
some function I created but the meaning is obvious:

#get csv file and interchange rows with columns to get two arrays
$csv = aic(getcsv(file_get_contents("out.csv")));
#now those arrays contained in one bigger array are sorted
array_multisort($csv[0],SORT_NUMERIC,$csv[1],SORT_NUMERIC);

#second array is created and values that will be put on x or 0 axis are made
unique with every y or 1
# value is going into array under x/0 it will be used after to make mean
arithmetic, geometric or harmonic
foreach ($csv[0] as $k=>$x) {
	$sum[$x][] = $csv[1][$k];
}

#the x values are put on other array for later use
$x = array_keys($sum);
$rang = $sum = array_values($sum);

#and here is the key feature, to smooth the line the function looks for (in
this case) 500 values above and beond given value
# if they exist of course, the search stops when search goes outside the
array
# the search also stop when number of gathered values goes beyond 500 or
next value that would be added will be making
# this value more than 500, you can imagine that there could be a large
spike in data and this would be affecting points near
# if this precaution haven't been conceived
foreach ($rang as $k=>&$v) {
	if (!($k % 100)) echo $k.' ';
	$up = $down = array();
	$walk = 0;
	while (true) {
		++$walk;
		if (isset($sum[$k-$walk]) and
count($v)+count($up)+count($sum[$k-$walk])<500)
			$up = array_merge($up,$sum[$k-$walk]);
		else break;
	}
	$walk = 0;
	while (true) {
		++$walk;
		if (isset($sum[$k+$walk]) and
count($v)+count($down)+count($sum[$k+$walk])<500)
			$down = array_merge($down,$sum[$k+$walk]);
		else break;
	}

	$rang[$k] = array_merge($up,$rang[$k],$down);
	# after gathering data for given point it makes a mean, in this case
arithmetic
	$rang[$k] = array_sum($rang[$k])/count($rang[$k]);
}
# now the array with x values can be added and fipped array is ready to go
to a file
$csv = aic(array($x,$rang));

# in php this is awfully slow but I like it because it is sensitive for the
densiti of the data and to not goes away in strange
# directions when data density becomes very low
1 day later
#
my input is from csv file:

fname= 'test'
csvdata = read.table(file=paste(fname,'.csv',sep=''),head=FALSE)
x = csvdata$V1
y = csvdata$V2

I know that this group is not about php , but I managed to make function
from above a lot faster, and I'm still cannot operate R at a sufficient
level to recreate in in this language. Fit function in R that kinda
resembles it is smooth.spline, but that function seems to go in a strange
directions when data density becomes low

But I think I would get desired behaviour in smooth.spline if for x axis I
put matrix of simple 1,2,3 sequence

here is revised php code:

function smooth($in,$smooth=500) {
	if (count(current($in))!=2) exit('wrong array');
	timer();

	foreach($in as &$v) {
		$v[0] = (string)$v[0];
		if (!isset($y[$v[0]])) {
			$y[$v[0]] = 0;
			$z[$v[0]] = 0;
		}
		$y[$v[0]] += $v[1];
		++$z[$v[0]];
	}
	unset($in);
	ksort($y, SORT_NUMERIC);
	ksort($z, SORT_NUMERIC);
	$x = array_keys($z);
	$y = array_values($y);
	$z = array_values($z);
	$count = count($z);
	echo n.$count.' : ';
	for ($k=0;$k<$count;++$k) {
		if (!($k % 1000)) echo $k.' ';
		$u = $d = 0;
		$usum = $dsum = 0;
		$walk = 0;
		while (true) {
			++$walk;
			if (isset($z[$k-$walk]) and $z[$k]+$z[$k-$walk]+$usum<$smooth) {
				$usum += $z[$k-$walk];
				$u += $y[$k-$walk];
			}
			else break;
		}
		$walk = 0;
		while (true) {
			++$walk;
			if (isset($z[$k+$walk]) and $z[$k]+$z[$k+$walk]+$dsum<$smooth) {
				$dsum += $z[$k+$walk];
				$d += $y[$k+$walk];
			}
			else break;
		}
		$out[$k] = ($y[$k]+$u+$d)/($z[$k]+$usum+$dsum);
	}
	echo ' : '.timer().n;
	return array($x,$out);
}