Skip to content

field significance test

5 messages · Jim Lemon, ani jaya

#
Dear r-list member,

I want to plot a histogram that shows a number of station that have a
significant statistic (positive or negative) based on the value itself
and its p-value. df3 shows the test statistic value (column shows the
station and rows show the result from the resample matrix
(repetition/bootstrap)) and df4 shows the p-value.

#the value
dput(head(df3,10))
structure(c(0.569535339474781, 1.02925697755861, 1.08125714350978,
0.50589479161552, -0.695827095264809, 0.455608022735733, 1.2552019505074,
0.981335144120386, 1.63020923423253, -0.424613279862939, 0.429207234903993,
1.99059339634301, -1.25731480224036, 0.64293796635093, 0.0189774621961392,
0.1163965630274, -1.41756397958877, 1.58945674395921, -1.2551489541395,
-2.84122761058959, -0.72446669544026, -0.719331298629362, -0.164045813998067,
0.444120153507258, -0.0845757313567553, -0.27732982718919, -0.166982066770785,
-0.193859909749249, 0.277426534878283, -0.0430460496295642, -0.0741475736028902,
-0.017026178205196, 0.732589091697401, 0.332813962514037, -0.0860983232517636,
0.155930932436498, -0.438635444604027, 0.046881008364722, -0.704876076807635,
-0.945506782070735, 0.662399207637722, -0.860903464600488, 1.06638547921749,
-0.462184163508299, 0.442447468362937, 0.145655792120232, 0.696309974316211,
1.84692085953474, 0.00841868461519582, -1.04408256815264, -0.548599461573869,
1.22352273108675, 0.0191993545723452, 1.26090162037733, 0.192106046362172,
-1.02864978106213, -0.0712068006002629, -0.674610175422543, -0.658383381010154,
-1.52779151484935, 0.479809528798632, -0.112078644619679, -0.19482661081522,
-0.192179943664117, -0.246553759113406, -0.563554156777087, -1.0236492805268,
0.0289772842372375, -0.274878506644853, 0.95578159001869, -0.27550722692588,
-0.66586322268903, 1.24703690613745, -0.00368775734780707, -0.0766884108214613,
-1.41610325144406, 0.518897523428314, -2.12289477996499, 0.968369305561191,
0.0766656793804207, 0.470712743077857, 0.241711948576043, 0.0636131491007723,
-1.13735866614159, 0.625015831730259, -0.234696421716696, 0.358555918256736,
-0.651761882852838, -0.236796663592383, 0.0421395303375618, 0.574747610964774,
-0.730646230622174, -0.20839489662388, -1.4832025994155, -0.366841536561336,
0.621868015281511, 0.945609952617796, 0.297055307072896, 0.737974050847397,
1.49862070675738), .Dim = c(10L, 10L))

#the p-value
dput(head(df4,10))
structure(c(0.560903574193679, 0.358019718822816, 0.320136568444488,
0.721538652049639, 0.419898899237915, 0.511481779449553, 0.208829636238898,
0.535905791761543, 0.252523383923989, 0.721538652049639, 0.487651926831611,
0.0281856103410957, 0.138370395238992, 0.639104270712721, 0.98503410973661,
0.955123383216192, 0.358019718822816, 0.138370395238992, 0.252523383923989,
0.0373292396736942, 0.302215769747998, 0.302215769747998, 0.807343273858921,
0.560903574193679, 0.955123383216192, 0.836526366120417, 0.807343273858921,
0.807343273858921, 0.693640621783759, 0.895532903167044, 0.895532903167044,
0.98503410973661, 0.159470497055087, 0.560903574193679, 0.925275729900227,
0.865936215436343, 0.441845502530452, 0.98503410973661, 0.358019718822816,
0.170893484254114, 0.586452625432322, 0.268412562734209, 0.102689728987727,
0.511481779449553, 0.666151798537229, 0.925275729900227, 0.358019718822816,
0.0581501553999165, 0.98503410973661, 0.170893484254114, 0.586452625432322,
0.464434476654839, 0.98503410973661, 0.252523383923989, 0.925275729900227,
0.377977518007105, 0.98503410973661, 0.586452625432322,
0.666151798537229, 0.284975267823252, 0.560903574193679,
0.721538652049639, 0.778425914188847,
0.836526366120417, 0.778425914188847, 0.511481779449553, 0.087825095630195,
0.98503410973661, 0.693640621783759, 0.208829636238898, 0.807343273858921,
0.222740206090239, 0.222740206090239, 0.98503410973661, 0.925275729900227,
0.0373292396736942, 0.586452625432322, 0.00322938266821475, 0.222740206090239,
0.865936215436343, 0.338738311334395, 0.639104270712721, 0.895532903167044,
0.0533495868962313, 0.268412562734209, 0.721538652049639, 0.721538652049639,
0.195559652706897, 0.778425914188847, 0.880692897134707, 0.398606385377039,
0.398606385377039, 0.693640621783759, 0.102689728987727, 0.666151798537229,
0.252523383923989, 0.358019718822816, 0.778425914188847, 0.284975267823252,
0.0633043080023749), .Dim = c(10L, 10L))

#find the positive significant station
df5<-df3
df5[df4>0.05|df5<0]<-NA
df5[df5>0]<-1
pos<-as.numeric(rowSums(df5, na.rm=T))
hist(pos)

#find the negative significant station
df6<-df3
df6[df4>0.05|df5>0]<-NA
df6[df6<0]<-1
neg<-as.numeric(rowSums(df6, na.rm=T))
hist(neg)

but above code is not correct because the 0 station (row when there is
no significant station detected) should be the same. The problem is
when the row produces significant positive and negative at the same
time. Is there any way to combine positive and negative significant
value and plot the histogram? or we can calculate the 0 station first
separately?

Any lead is really appreciated. Thank you.

Ani Jaya
#
HI Ani,
I would create these two matrices:

# matrix of logicals for positive stat values
posvalue<-df3 > 0
# matrix of logicals for significance
sigstat<-df4 < 0.05

Then you can identify the positive/negative and significant values:

which(posvalue & sigstat)
[1] 12
which(!posvalue & sigstat)
[1] 20 76 78

and as you note, column 2 has 2 significant results, one statistical
value positive and the other negative.
I'm not sure what sort of histogram you want, perhaps all ten columns
with groups of ten bars for each column (very messy and sparse). Maybe
a bit more info will enlighten me.

Jim
On Mon, Sep 6, 2021 at 4:37 PM ani jaya <gaaauul at gmail.com> wrote:
#
Hello Jim, thank you for your response. What I am trying to achieve is
like this:

#calculate the positive significant station for every row based on p-value
df5<-df3
df5[df4>0.05|df5<0]<-NA
      #remove the insignificant one or negative statistic value
df5[df5>0]<-1
            #change the positive value to be +1 so I can row sum later
pos<-as.data.frame(rowSums(df5, na.rm=T))                         #row
sum to see the total significant station (column) for each row
poss<-as.data.frame(table(pos))
   #get the frequency of each significant number (row that have only
1,2,3,.. significant station)
posss<-as.numeric(rep(poss$pos[-1],poss$Freq[-1]))-1          #create
the series based on frequency

#calculate the negative significant station for every row based on p-value
df6<-df3
df6[df4>0.05|df5>0]<-NA
df6[df6<0]<-1
neg<-as.data.frame(rowSums(df6, na.rm=T))
negg<-as.data.frame(table(neg))
neggg<-(as.numeric(rep(negg$neg[-1],negg$Freq[-1]))-1)*-1

ne<-sum(pos==0&neg==0)
#to see the 0 significant station, row that have no significant
station



after that I want to combine posss, neggg, and ne to be 1 column data
frame but not success yet. After that, I want to plot the histogram to
see the distribution of significant stations.
Any lead is appreciate. Thank you
Ani Jaya
#
and yes I can sleep well now. Thank you, Jim.

ne<-rep(0,ne)
total<-c(neggg,posss,ne)
hist(total)

Best,
Ani Jaya
On Tue, Sep 7, 2021 at 9:38 AM ani jaya <gaaauul at gmail.com> wrote:
#
HIi Ani,
 I think you are going to a lot of trouble to get a fairly simple result.

# matrix of logicals for positive stat values
possig<-df3 > 0 & df4 < 0.05
# now negative stat values
negsig<-df3 < 0 & df4 < 0.05
# very clunky plots of column counts
barplot(colSums(possig),
 names.arg=paste0("S",1:10),
 main="Positive significant")
barplot(colSums(negsig))
 names.arg=paste0("S",1:10),
 main="Negative significant")

You said something about displaying the values of the statistics. As
the positive and negative values are mutually exclusive, you may want
to do something like this:

allsig<-possig | negsig
allsig[!allsig]<-NA
plot(1:10,1:10,type="n",xlab="Station",ylab="Rep",
 main="Significant statistical values")
text(rep(1:10,each=10),rep(10:1,10),round(df3*allsig,2))

giving you a matrix-like plot of the stat values. You could also add
the p-values.

Jim
On Tue, Sep 7, 2021 at 10:51 AM ani jaya <gaaauul at gmail.com> wrote: