Why is an extreme opposite to what's predicted never statistically significant in one tail-test?

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

Acronym1

Full Member
5+ Year Member
Joined
Nov 17, 2016
Messages
57
Reaction score
3
If you predict a result in one direction and it goes extreme in the other direction, does that not by itself suggest the opposite of your hypothesis is true, by a statistical significance, since it's extreme and therefore by definition passing statistical randomness?

For instance, if one predicts that taking a certain pill will make people healthier (one way direction) and it shows that they get much worse, why isn't the extreme in the other direction sufficient for statistical significance in your experiment?

Members don't see this ad.
 
I'm going to attempt to answer this, but I'm still an undergraduate student so I would wait on an answer from a professor or someone with a lot more experience in stats.

As I understand it, the difference between a one-tailed and two-tailed test is how you distribute your alpha value on the two ends of the normal curve (in your example, much healthier or much worse). So in a two-tailed test with an alpha of .05, you are actually splitting that .05 between the ends. Therefore, each end has .025 area of significance.

In contrast, in a one-tailed test you are choosing which end you want that alpha to go. By doing this, you increase your ability to find significance. However, by doing so you increase the chance of making a type 1 error. Also, there is the chance that your predicted direction is wrong just like you described. In order to have the same power to detect significance of a one-tailed test in the opposite direction of what you predicted, you would have to run a two-tailed test with an alpha of .10, which is obviously not acceptable. Or you could just say, "I predicted it would go in that direction all along" which I believe might be unethical, but I'm not sure. I hope what I said is correct and answers your question.




Sent from my iPhone using SDN mobile
 
Is it because I have not set a numeric boundary for both directions (only one), it leaves opposite results open to alternative explanations besides the pill, which would not however be the case if I I actually had a two tailed distribution, in which the opposite threshold was passed?
 
Members don't see this ad :)
I think your results would be open to alternative explanations no matter which test you used. If you predicted a particular direction and you got significant results in the opposite direction, then there is something wrong. Was the literature used to come to that prediction not strong enough to justify predicting a particular direction? Is there a confound? If you think your results could go either way, then you could state your hypothesis/predictions as "Group A taking the pill will have a significantly different outcome compared to group B taking a placebo." You don't state which way it will go, because the literature doesn't justify a particular direction. Pragmatically speaking, I think it's better to almost always use a two-tailed test even if you predict a particular direction. That way your chances of making a type 1 error are smaller than a one-tailed.


Sent from my iPhone using SDN mobile
 
This feels like a homework question. Is this a homework question?

It is not a homework question. I will not need to know the answer to this particular question in any exam either, but I want to grasp it anway. The litterature does not explain why opposite results to what is predicted cannot be statistically significant in a one tail test.
 
Last edited:
I'm sure there is a really thorough and formalized philosophy of science explanation for why but just looking at it 'off the cuff':

If your empirical test/observation results in an extreme score (or average score) on some index on the completely OPPOSITE end of the normal curve than your theory predicted, science says you don't get credit for that :) and it's back to the drawing board for you and your theory.

Some famous authors have cogently criticized/contrasted empirical psychology with other (perhaps superior) sciences that are able to make 'point predictions' from their laws/hypotheses rather than just pointing in a direction and saying, 'we predict that the score will be...um...like way in THAT direction.'
 
If your empirical test/observation results in an extreme score (or average score) on some index on the completely OPPOSITE end of the normal curve than your theory predicted, science says you don't get credit for that :) .'

I didn't ask why one doesn't get credit, I asked why it's statistically irrelevant. For an example: if I hypothesize that a certain pill will make people happy and it turns out the pill makes people severely depressed with a correlation of 1, why can it still not be statistically significant in a one tail test?
 
Last edited:
I didn't ask why one doesn't get credit, I asked why it's statistically irrelevant. For an example: if I hypothesize that a certain pill will make people happy, and it turns out the pill makes people severely depressed with a correlation of 1, why can it still not be statistically significant in a one tail test?

It runs counter to the logical empirical philosophy that underlies the scientific enterprise and statistical tests (e.g., a 't-test' or testing the significance of a correlation coefficient) are only 'significant' (e.g., 'p< 0.05') in relation to a specific theoretically-driven hypothesis (e.g., that compound X will 'make people happy' or have antidepressant properties).

If your theoretical reasons cause you to predict that the pill will 'make people happy' and, in actuality, the pill 'makes people sad' then the specific inferential statistical technique (e.g., t-test) and associated 'p-value' is irrelevant, from a scientific point of view. Some further reading on 'logical positivism' or 'logical empiricism' and it's application to scientific psychology might be informative. Regarding the significance-testing enterprise, a favorite author of mine when I was in graduate school was Jacob Cohen. Good Luck!!!
 
So the crux of the matter is that a one tail-test has no probability for an opposite result?
 
So the crux of the matter is that a one tail-test has no probability for an opposite result?

In essence - yes. This is why one-tailed tests are very rarely used in practice (and when they are - it almost invariably seems to be questionable research). That is what you "sacrifice" for the sake of a more liberal Type I error cutoff.

The underlying statistical theory behind why that is gets very complicated and is far too much to explain here. If it makes you feel better - this is all pretty much arbitrary cutoffs that have been decided by convention and built into statistical packages. None of these are absolute laws. If you wanted and were very mathematically statistically/savvy, you could certainly design a statistical test that "erred" in the hypothesized direction giving you a more liberal cutoff there, but would pick up an extraordinarily significant effect in the opposite direction. The one-tailed t test as typically implemented just doesn't do it. A properly designed bayesian prior on an analogue t-test could achieve that pretty easily.
 
  • Like
Reactions: 1 users
Well, that one turned out to be fairly logical. My struggles are still in grasping certain rules, some appearing paradoxical. For instance in Analysis of Variance: why doesn't each in-group variance have any bearing on the null hypothesis? The final ratio of each group compared to each other will be what decides it, so how could it not matter which particular variance each in-group have?

Should I simply ignore the steps where the information doesn't make sense to me or is it crucial to understand why, not simply how something is calculated? The actual operations are actually the easiest part. You just feed them into SPSS.
 
Last edited:
I didn't ask why one doesn't get credit, I asked why it's statistically irrelevant.
Besides people trying to justify it in this thread, who's telling you that it's statistically irrelevant?
 
Members don't see this ad :)
We have no idea who you are or what you are doing. If you are a biostatistics professor, yes it is crucial to understand why. If you are working on an AA in psychology at your local community college with no plans to go further, it's not.

Some of this is not about grasping rules, it's about incorrect information. ANOVA does account for variance "within" groups too. It's part of the total variance and is embedded in your F statistic.
 
  • Like
Reactions: 2 users
Well, that one turned out to be fairly logical. My struggles are still in grasping certain rules, some appearing paradoxical. For instance in Analysis of Variance: why doesn't each in-group variance have any bearing on the null hypothesis? The final ratio of each group compared to each other will be what decides it, so how could it not matter which particular variance each in-group have?

Should I simply ignore the steps where the information doesn't make sense to me or is it crucial to understand why, not simply how something is calculated? The actual operations are actually the easiest part. You just feed them into SPSS.

The within group variance doesn't bear on the NULL hypothesis, because that's just a prediction of no difference. No data technically has any bearing on the hypothesis. If you're wanting to understand, then some clarity of language would be helpful. And of course Ollie is right, the within group variance goes into the denominator. ANOVA is called "analysis of variance" because it examines the between group variance over the within-group variance--it's a ratio of variances. The idea is that the differences between groups are BIGGER than the variability within the groups.

Getting a good stats book and trying to work though some of these examples by hand might help you if you want to understand. Doing an ANOVA by hand is a bitch, but you do see the within-group variance that way!!
 
  • Like
Reactions: 1 user
The within group variance doesn't bear on the NULL hypothesis, because that's just a prediction of no difference. No data technically has any bearing on the hypothesis. If you're wanting to understand, then some clarity of language would be helpful. And of course Ollie is right, the within group variance goes into the denominator. ANOVA is called "analysis of variance" because it examines the between group variance over the within-group variance--it's a ratio of variances. The idea is that the differences between groups are BIGGER than the variability within the groups.

Getting a good stats book and trying to work though some of these examples by hand might help you if you want to understand. Doing an ANOVA by hand is a bitch, but you do see the within-group variance that way!!

My confusion arises from a statistics book: "One of the most important things to to remember about this within-group estimate is that it is not affected by whether the null hypothesis is true. This estimate comes out the same wheter the means of the populations are all the same (the null hypothesis is true) or the means of the populations are not all the same( the null hypothesis is false). This estimate comes out the same because it focuses only on the variation inside each population"


Statistics for psychology (2014).
 
The idea is that the differences between groups are BIGGER than the variability within the groups.
!!

A variance difference between groups entails that Group X, Y and Zs respective variance are all different. Thus specific variance within groups is therefore relevant in my reasonings. It would have to be of a certain kind.
 
Last edited:
There is no such thing as specific types of variance within groups (well at least not as you seem to mean). Variance is a mathematical calculation with an exact mathematical definition. The book is simply indicating that the groups themselves can vary a lot or a little - what matters is whether there are differences between them (and the proportional difference between vs within). I agree it may be helpful to hand calculate things (and do so slowly - really paying attention to each step). There is nothing magical about any of this - it's just math and the logic is quite neat and simple. I love teaching ANOVA/regression because it really is much more accessible at that level. Once you get to the advanced stuff the math becomes completely impossible to do by hand even for serious mathematicians and it becomes more difficult to step through things and really peek under the hood.

That said, this is starting to get a little too close to homework help for my comfort. My advice is to talk to your stats professor and consider hiring a tutor if need be. It is good to think about these issues, but your reasoning seems misguided in some places and lacking sufficient information in others. They will be better equipped (and hopefully willing) to walk you through it than we can be here.

Stats is an area many students struggle with, especially those who arent used to thinking mathematically.
 
Last edited:
  • Like
Reactions: 1 user
There is no such thing as specific types of variance within groups.

I mean the numerical variance within groups. That number could hardly be irrelevant to the final conclusion. I am misreading something, but I don't know what.
 
Last edited:
A variance difference between groups entails that Group X, Y and Zs respective variance are all different. Thus specific variance within groups is therefore relevant in my reasonings. It would have to be of a certain kind.

Nope, it doesn't. Variance between groups is basically asking "are the means different from one another." Variance within groups is about all of the variability within each group. When you think about variance within groups, think about what the frequency distribution graph would look like for each of the conditions (shape). The means could be the same and the shapes all different. The means could be different and the shapes the same (this is the ideal case for ANOVA, as if the shapes are too different, one of the assumptions of ANOVA is violated).

The null and alternative hypotheses are really about the BETWEEN group variability (mean differences). But the actual calculation includes the within group variability too.
 
Lets take the example of IQ between groups. Group 1 is neurotypicals, Group 2 is anti-social, Group 3 is schizophrenics. In analyzis of variance we already know each groups mean IQ, thus the only determinative factor for if there are true differences in mean IQ is how variance is manifested in each group. Thus the data of in-group variance is key to between-group differences in variance.
 
Lets take the example of IQ between groups. Group 1 is neurotypicals, Group 2 is anti-social, Group 3 is schizophrenics. In analyzis of variance we already know each groups mean IQ, thus the only determinative factor for if there are true differences in mean IQ is how variance is manifested in each group. Thus the data of in-group variance is key to between-group differences in variance.

As was explained above...only to the extent that it factors into the denominator. Easiest way to understand is this. Imagine all three groups have exactly the same mean, but differ in standard error. Would you really expect these to be significantly different? That's all the passage you quoted is saying.
 
You can also think of this in terms of effect sizes with an understanding that a general formula for calculating a mean diff effect Size (ES) is:
ES = Mean1 - Mean2 / SD

Difference between the groups alone is not sufficient to determine an effect. They have to also consider the degree to which the error of true score estimation occurs within the groups. If we have M1 = 10 and M2 = 20 then we might expect differences (between group). However, if the SD of each are so broad (within group) that they substantially overlap, say SD of 15 for each, then how certain are you that those two groups actually differ? The point of a large effect (or, in the case with ANOVA, a significant F-test) is that the between group differences are larger than can be attributed to within group variance. This suggests that those groups actually differ according to their grouping variable.
 
But in the case of two group comparison - men and women, it has been established that they differ in variance. There are more gifted men than women, but also a substantial amount of dumber men than women. Yet their means are reportedly the same. Is it because this particular variance evens out with the dummies and gifted people? That is to say they cancel each other out among men? Or is it political correctness in play?
 
Last edited:
But in the case of two group comparison - men and women, it has been established that they differ in variance. There are more gifted men than women, but also a substantial amount of dumber men than women. Yet their means are reportedly the same. Is it because this particular variance evens out with the dummies and gifted people? Or is it political correctness in play?
I'm not touching that topical football except to say that your conceptualization of that topic is inaccurate.

As to the statistics question at play, means aren't variances. If the means are equal, reporting means the same is correct. The question is if the difference in error variability is sufficient to discuss/need to attend to. That is less a question of a strict mean-testing ANOVA and more an issue of measurement invariance, which includes a wider array of types of comparisons. The long and the short of it is that if the means are the same between groups, I'm attributing the variance differences more due to sampling error than true between group differences. Also, in your example it seems like you are confusing variance for numbers of individuals. greater variance is not a function of sample size so much as it is sample composition.

So, again. If M1 - M2 / SD is the formula for understanding Between group / Within group variability...

If M1-M2 = 0 then no matter what the within group variability is, you will not find sufficient differences in the mean structures.. which is what ANOVA is looking for because 0/any number is 0. Its a slightly different formula that the SSbetween / SSwithin used in ANOVA, but the idea is the same. If SSbetween is equal to 0, then F is going to be non-publishable (aka, not stat sig).
 
But in the case of two group comparison - men and women, it has been established that they differ in variance. There are more gifted men than women, but also a substantial amount of dumber men than women. Yet their means are reportedly the same. Is it because this particular variance evens out with the dummies and gifted people? Or is it political correctness in play?

An ANOVA would not reveal differences if the means are the same. Variance does not "even out" (I'm not even sure what that might mean) - there just aren't significant differences in the average values (what ANOVA is testing). If we wanted to directly examine variability itself within each group...that variability might differ. That is a separate question and not what ANOVA is testing.

The rest has nothing to do with statistics. I will guarantee you that your F values don't care about political correctness though.

I'll reiterate my point above again though. All of this seems to boil down to fundamental misunderstandings of basic material. A professor or tutor is going to be much better poised to address these issues, since I suspect there are more than just what we are seeing here. Are you enrolled in a statistics class right now?
 
  • Like
Reactions: 1 user
A professor or tutor is going to be much better poised to address these issues, since I suspect there are more than just what we are seeing here.

Clearly not since I am here, despite attending the classes. We are fed information and told that math is only used as tool for psychologists, which I understand. Would be nice to understand some of it beyond basic operations, though. I don't find the statistics book very pedagogical at all.
 
For an example, the "one-tail limitation" is something I deduced despite the books cryptic writing. Even though there is no probability distribution for opposite effects, it still isn't foreign to envision a result in which statistical significance could occur in the opposite direction, simply out of consistency.
 
For an example, the "one-tail limitation" is something I deduced despite the books cryptic writing. Even though there is no probability distribution for opposite effects, it still isn't foreign to envision a result in which statistical significance could occur in the opposite direction, simply out of consistency.

Statistical significance ONLY OCCURS BECAUSE OF DECISIONS MADE at the front end!!!! If you ahead of time decide to do a one-tailed test, that means you're determining the area in which statistical significance occurs. It's on one side. A mean could be 6 SDs in the tails on the other side, and it wouldn't be statistically significant because that's not the choice you made for determining the "critical region." So yes, of course a result could occur in the opposite direction than anticipated, this happens all the time. And it's why a two-tailed test is usually used, because we CHOOSE to allow for significant effects on both sides. [It's also why the use of one-tailed tests feels fishy, because it feels like the decision was made after the fact rather than a priori.]

Also, I think Ollie was suggesting a professor or tutor who you could talk to in the real world, because this stuff is often easier to explain with drawings and examples and such. The books or lectures aren't enough sometimes, I totally hear you. So go have a conversation about it!!!
 
  • Like
Reactions: 1 user
Lets take the example of IQ between groups. Group 1 is neurotypicals, Group 2 is anti-social, Group 3 is schizophrenics. In analyzis of variance we already know each groups mean IQ, thus the only determinative factor for if there are true differences in mean IQ is how variance is manifested in each group. Thus the data of in-group variance is key to between-group differences in variance.

Think about it this way.

1. In your example, you're talking about an independent anova. So we have individual differences being a factor on the numerator and denominator...because each treatment has totally different group of people. In repeated measures, the same people are re-tested multiple times.
2. Think about what true difference in IQ means as it relates to a significance test . What does it mean? If you get a significant result it means that the probability that you got this variation between groups by chance is unlikely. There is likely a real difference.
3. In Anova, you figure out the total variance, and then you figure out variance within each treatment (so what is the variance in the schizophrenics group, what is it in anti-social, and then neurotypicals)..and then you can take the total variance minus it from the within variance..and you'll get the left-over...the variance between groups.
4. Now consider what it means if we have a variance (a difference) between groups. If you have that data in-front of you, it's not uncommon to look at it and with just an eye-test say "these look very different from group to group... or they don't look different. Let's say that by just looking at it, you thought, there is a big difference here from group to group! But, what could put that in doubt? The only thing that can put it in doubt is variability being high within the different groups. If you have 10 anti social people and their IQ scores are all over the place..that suggests that individual differences are significant, and being anti-social has very little to do with IQ. That high variability within would give you a non-significant result..and tell you..there is no real difference here between these groups.
 
Last edited:
I'm not touching that topical football except to say that your conceptualization of that topic is inaccurate.
.

Really?

Some studies have concluded that there is larger variability in male scores compared to female scores, which results in more males than females in the top and bottom of the IQ distribution.[17][18] Additionally, there are differences in the capacity of males and females in performing certain tasks, such as rotation of object in space, often categorized as spatial ability.

The most culturally neutral IQ measures are spatial tests, and women are consistently outperformed by men there. But I wouldn't claim women are fundamentally less intelligent. I have attracted a Mensa girl a few years ago and known a few others too. They are in a minority, though.
 
Last edited:
Key word, some studies. I can find a bevy of literature that runs contrary to things such as the Scottish university cohort. It's way more complicated than what seems to be your understanding of the topic. If you want, break the question out in a new topic and see if anyone wants to engage. Plenty of us on here who could teach series of courses on assessment and intelligence who may be down for some didactic time.
 
  • Like
Reactions: 3 users
Key word, some studies. .

I am referring to studies on spatial reasoning tests between the sexes, since it's the only measure of raw intelligence. Verbal test measure education level, not intelligence. Take that out of the equation and you will find that neurotypical men have higher means than neurotypical women. There are more low functioning men due to the prevalance of autism, which has more men than women affected, for whatever reason.
 
Last edited:
Again, that there are more of them is not to say that they are fundamentally different. Just as there are more evil men than women, does not mean that the male sex is fundamentally evil.
 
I am referring to studies on spatial reasoning tests between the sexes, since it's the only measure of raw intelligence. Verbal test measure education level, not intelligence. Take that out of the equation and you will find that neurotypical men have higher means than neurotypical women. There are more low functioning men due to the prevalance of autism, which has more men than women affected, for whatever reason.
s.d is less for women when it comes to IQ...there is greater variation in men (more geniuses but also more ******s).
 
Where did you get that idea?

Because they are novel problem solving-skills. Verbal tests are knowledge based, and women are known to read more books than men early in their life. Spatial tests do not require any prior knowledge.
 
s.d is less for women when it comes to IQ...there is greater variation in men (more geniuses but also more ******s).

Because of autism (men outnumber women in that by a large margin, and autism is usually harmfull to overall IQ).

Among neurotypicals, the average would be higher for men in culture fair tests.
 
:whistle:

Why do I get the feeling you came to make some point and not ask a question? point of fact, you can't say 'if I ignored X fact then Y speculation would be true'

I was explaining why that is. 40% of individuals in the old autistic label fall in the intellectual disability range ( IQ below 70). I think neurotypical individuals are more accurate representations of the sexes.
 
OP, you came on here to ask a question about statistics and have delved into telling the forum about sex differences in IQ....which lack depth and nuance. Do as you will, it's a free internet, but recognize that you're making some strong statements toward people who've been in this field a lot longer than you. You're getting some snarky responses because you're doing some explaining that doesn't need to be done. Keep reading and asking questions, and try to stay open to answers that aren't consistent with your previous conceptions.
 
OP, you came on here to ask a question about statistics and have delved into telling the forum about sex differences in IQ....which lack depth and nuance. Do as you will, it's a free internet, but recognize that you're making some strong statements toward people who've been in this field a lot longer than you. You're getting some snarky responses because you're doing some explaining that doesn't need to be done. Keep reading and asking questions, and try to stay open to answers that aren't consistent with your previous conceptions.

Argument from authority won't cut it with me. Nor is it relevant in this case either.
 
Last edited:
OP, you came on here to ask a question about statistics and have delved into telling the forum about sex differences in IQ....which lack depth and nuance..

There is no nuance. Women as a group have lower fluid intelligence than men. Fluid intelligence is the capacity to reason and solve novel problems independent of any knowledge from the past. No argument from authority will save you from that fact.
 
This will be my last post in this thread. I wasn't trying to argue my own authority here, rather pointing out that this board is full of neuropsychologists who give and interpret tests for a living, and that there is always more nuance than appears in textbooks.

That said, if you want to look up a dictionary definition of something, try "mansplaining." (Note: no idea if you are a male or female, but that's kinda what's going on here).
 
  • Like
Reactions: 1 user
Again, that there are more of them is not to say that they are fundamentally different. Just as there are more evil men than women, does not mean that the male sex is fundamentally evil.

I feel like everyone missed this gem amongst his obtuse understanding of intelligence.
 
  • Like
Reactions: 1 users
Some indicators of intelligence are the abilty to integrate new information into one's existing knowledge and the ability to adjust ideas when there is evidence that one has misunderstood or misinterpreted something.
 
  • Like
Reactions: 1 user
Top