CHEERS Trial and Hardcore statistical discussion

This forum made possible through the generous support of SDN members, donors, and sponsors. Thank you.

Palex80

RAD ON
15+ Year Member
Joined
Dec 17, 2007
Messages
3,390
Reaction score
4,403


Do.
Not.
Show.
This.
Trial.
To.
Your.
Medical.
Oncologist.

Members don't see this ad.
 
  • Haha
  • Like
Reactions: 3 users


Do.
Not.
Show.
This.
Trial.
To.
Your.
Medical.
Oncologist.

My med oncs are strong believers that IO can replace anything including surgery/radiation for brain mets so I’m way ahead of the trend line.
 
  • Like
  • Haha
Reactions: 4 users


Do.
Not.
Show.
This.
Trial.
To.
Your.
Medical.
Oncologist.

Not sure that I ever bought abscopal effect as an actionable thing. (I'm sure it's real but maybe just exceptional.) However, how do you have 12/45 patients in the experimental arm remaining at risk at 20 months compared to 4/51 in the standard arm and not have a real effect going on (or at least have curves looking different)? Are they censoring the patients in some weird way or was there a stagger in enrollment for the different arms? Were they expecting real magic in such a small trial (like a 30% survival benefit). Is median survival really the right endpoint vs something like 2 year survival?
 
Members don't see this ad :)
Not sure that I ever bought abscopal effect as an actionable thing. (I'm sure it's real but maybe just exceptional.) However, how do you have 12/45 patients in the experimental arm remaining at risk at 20 months compared to 4/51 in the standard arm and not have a real effect going on? Are they censoring the patients in some weird way or was there a stagger in enrollment for the different arms?
Seriously. They see a negative trial. I see a 500% increase in OS at 2-years.
 
  • Like
Reactions: 1 user
I disagree… most would see this as a positive trial 😂!
But this is a radiation trial. You have to adjust your standards. If it were a drug, it'd be a multi-billion dollar per year boon to some pharma company.

Radiation? This just proves it sucks.
 
  • Like
Reactions: 1 user
But this is a radiation trial. You have to adjust your standards. If it were a drug, it'd be a multi-billion dollar per year boon to some pharma company.

Radiation? This just proves it sucks.
I agree, I’m just saying the idea now is to take our modality out the equation whether if it’s from other fields or from our own.
 
  • Like
  • Sad
Reactions: 1 users
Seriously. They see a negative trial. I see a 500% increase in OS at 2-years.

I hate small trials. I mean plug in some numbers into a sample size calculator. Lets assume something very bold like SBRT in this setting will provide a 20% absolute survival benefit (from 20% to 40% surviving at 2 years). You would need a sample size of 162 to reduce your risk of Type 1 error to 5% with 80% power or 20% chance of missing true signal.

Post-hoc power calc on something like this gives power of ~57%. That means a 43% likelihood of missing even a crazy big effect.

Oh well. Not like COMET stats all that great. Worst part is that people are always congratulated on their work and negative small trials very damaging (as are positive ones).

I understand that it is hard to do a trial.
 
  • Like
Reactions: 1 users
I hate small trials. I mean plug in some numbers into a sample size calculator. Lets assume something very bold like SBRT in this setting will provide a 20% absolute survival benefit (from 20% to 40% surviving at 2 years). You would need a sample size of 162 to reduce your risk of Type 1 error to 5% with 80% power or 20% chance of missing true signal.

Post-hoc power calc on something like this gives power of ~57%. That means a 43% likelihood of missing even a crazy big effect.

Oh well. Not like COMET stats all that great. Worst part is that people are always congratulated on their work and negative small trials very damaging (as are positive ones).

I understand that it is hard to do a trial.
Yes. Agree with all of that. My comment was a bit tongue in cheek.
 
  • Like
Reactions: 1 user
I hate small trials. I mean plug in some numbers into a sample size calculator. Lets assume something very bold like SBRT in this setting will provide a 20% absolute survival benefit (from 20% to 40% surviving at 2 years). You would need a sample size of 162 to reduce your risk of Type 1 error to 5% with 80% power or 20% chance of missing true signal.

Post-hoc power calc on something like this gives power of ~57%. That means a 43% likelihood of missing even a crazy big effect.
"That don't sound right." Pre-hoc, expecting a doubling of 2y OS from 20 to 40%... With a difference that big (the median OS is probably nigh double too) you wouldn't need ~80 patients in each arm to "catch the signal" via KM analysis.

EDIT: simple S(t) more appropriate

roL4OUK.png


FHIu0Mj.png
 
Last edited:
  • Like
Reactions: 1 user
That's not for a non-parametric actuarial analysis though; this is for coin flipping type analyses
(Specifically, we want the instantaneous survival function of one group vs another at 2 years; not a binary 20% probability vs 40% probability of yes/no alive at 2y... it's quite different math)

EDIT:
To rephrase, I bet a bitcoin that in randomly generated data (ie KM simulation) modeling events and censors of two groups (the random numbers constraining to achieve 17.5-22.5% and 37.5-42.5% 2 year survivals) that by the time N=80 in both groups the difference between the groups will be p<0.05 995 out of 1000 simulations or more. There is very very low chance of missing a difference at p<0.05 when the difference is this big and you've accrued 160 patients.
 
Last edited:
Members don't see this ad :)
"That don't sound right." Pre-hoc, expecting a doubling of 2y OS from 20 to 40%... With a difference that big (the median OS is probably nigh double too) you wouldn't need ~80 patients in each arm to "catch the signal" via KM analysis.
lUKZZuk.png
I'm using a simple sample size calc but I find it unbelievable that a sample of 20 with a null survival probability of 0.2 going to give you anything. This is expecting 4 vs. 8 survivors at a given time point, actually less than the difference in evaluable patients in this trial at the 2 year time point and claiming that in a population with significant variance in life expectancy that these numbers are meaningful.

Seems like I'm missing something big.

Are you sure that you are not calculating the sample size at the time point in question (for instance ~40 patients left at 2 years to evaluate)?
 
That's not for a non-parametric actuarial analysis though; this is for coin flipping type analyses
(Specifically, we want the instantaneous survival function of one group vs another at 2 years; not a binary 20% probability vs 40% probability of yes/no alive at 2y... it's quite different math)
Source Chapter 7, page 179 Perez and Brady Second Edition (yeah I am old)

Table 7-4
Patients required to Detect Improvement in Survival over Baseline Survival

Control Arm 20%
Experimental Arm 40%
Alpha=0.05
1-Beta=0.80
n=150
Alpha=0.05
1-Beta=0.90
n=200

????
 
I'm using a simple sample size calc but I find it unbelievable that a sample of 20 with a null survival probability of 0.2 going to give you anything. This is expecting 4 vs. 8 survivors at a given time point, actually less than the difference in evaluable patients in this trial at the 2 year time point and claiming that in a population with significant variance in life expectancy that these numbers are meaningful.

Seems like I'm missing something big.

Are you sure that you are not calculating the sample size at the time point in question (for instance ~40 patients left at 2 years to evaluate)?
What is "meaningful."

You can have a sample size of a million, or a sample size of 10.

For example, I do a study on a coin to see if it's a fair coin. I flip it 10 times (N=10) and get 10 heads. Do I need to do more coin flips to get a more meaningful result? Data that gives you a significant result is always meaningful (in theory at least) regardless the sample size.

0B4MCbm.png
 
  • Like
Reactions: 1 user
Source Chapter 7, page 179 Perez and Brady Second Edition (yeah I am old)

Table 7-4
Patients required to Detect Improvement in Survival over Baseline Survival

Control Arm 20%
Experimental Arm 40%
Alpha=0.05
1-Beta=0.80
n=150
Alpha=0.05
1-Beta=0.90
n=200

????
Seems like fake news! Consult your local biostatistician.
 
I flip it 10 times (N=10) and get 10 heads. Do I need to do more coin flips to get a more meaningful result?
Um yeah. I guess this is getting Bayesian but if we expect an equally weighted coin (which we usually do), not that rare to get 10 heads in a row but damn rare to get 100 and the larger the sample the closer it approximates the true odds of heads.

I'm not going to do the binomial calc to figure out the odds.

Again, are you sure you are not calcing a sample at a point in time with your KM calcs above? Not passing the eyeball test to me. I'm sure you can do a log-rank test on 2 hypothetical samples with expectation values of 0.2 and 0.4 and confirm.

I'm going with the Wombat!!
 
I made survival curves to meet the criterion of 2y OS of 20% and 40% respectively.

I "enrolled" 39 patients in the study. As predicted by the calc's above, it was "just" significant.

You guys are still not dealing actuarially!

If you enroll >100 patients in this "study" at these expected 2y OS's you will have MUCH greater power than 0.8.

eXsR0wW.jpg
 
Last edited:
I think the issue is that you constructed these "knowing" a priori the outcome. Sample size estimates make no a priori assumptions (I think).

Every single resource i look at gives a sample size between 150-200.
 
  • Like
Reactions: 1 users
"knowing" a priori the outcome
Bingo! you are running a simulation not calculating a sample size necessary to account for variability in real outcomes.

Actuarially means to me that you account for prior outcomes going forward, allowing you to account for censored data or later accrued study subjects who haven't experienced the end point in question (often death). This is why life expectancy may be 78 but for a 70 year old it's 86-87.

In an idealized clinical trial where no patient's are censored for non-endpoint reasons and all patients were accrued at the same time, our binary thinking would come to approximate KM modeling at a given point in time. Things may differ a little for things like median survival but not very much. (I think.)
 
Last edited:
Sample size estimates make no a priori assumptions
Eh? Sample size calcs always have to make assumptions. "Lets assume something very bold like SBRT in this setting will provide a 20% absolute survival benefit (from 20% to 40% surviving at 2 years)."

If we "assume" the experimental arm will have 2y OS of 40% vs 20% for the control, we do not need 160 patients for a 0.8 power in a time failure analysis.

Think of what the p-value would be on these curves if the N were 160 instead of 39.

RfUFkur.png
 
I think
Eh? Sample size calcs always have to make assumptions.

If we "assume" the experimental arm will have 2y OS of 40% vs 20% for the control, we do not need 160 patients for a 0.8 power in a time failure analysis.

Think of what the p-value would be on these curves if the N were 160 instead of 39.

RfUFkur.png
I think you are missing the point. You have performed the experiment as if you knew the result. Assumptions for sample size are based on estimates. We cannot know what the results before the experiment.

Frequently our estimates for the control (and experimental arm) are way off which usually leads to underpowered studies. See many recent prostate cancer studies
 
If you do a sample size calc *correctly* for 2y OS of 20% vs 40%, the answer is *not* 160. That is the take home message for today ;)

I think we all have done enough survival (LC, DFS, etc) analyses to know when differences are that big, we usually find a significant result with less than 100 patients.

This doesn't lessen communitydoc's greater point: that increasing sample sizes to find differences *just in case* one is there we would like to find is laudable.

"I think you are missing the point. You have performed the experiment as if you knew the result."
I performed the "experiment" as one possible set of curves among many which could be possible to meet communitydocs a priori that SBRT would be better at 40% 2y OS vs 20% 2y OS. Sure in the real world we would not see (exactly) that, but those are the numbers we have to plug into our calculator so to speak. On the first try, it was insignificant at 37 patients (P about 0.09). Then added a patient (P about 0.06). Then I added another patient. Then it crossed into significance.

 
Last edited:
If you do a sample size calc *correctly* for 2y OS of 20% vs 40%, the answer is *not* 160. That is the take home message for today ;)

I guess you know more than all the cooperative groups that do Phase III trials with control of 20% and experimental of 40% that enroll 175-250 patients. To think we could do these studies with only two dozen patients.

Count me completely unconvinced. I will admit that don't know what your widget is doing but "one sample" is concerning.
 
when differences are that big

We never know what the difference is going to be, we guess and then make power calcs using tools like Wombat has linked to. Admittedly, I don't have a great insight into the impact of censoring on power but I'm sure massive censoring causes big time problems and does not increase statistical power. This is evident graphically (which is sort of non-parametric) where at the end of your KM curves a single event can take curves from divergent to convergent.

I'll look into this but pretty confident this trial very underpowered and may in fact be showing some meaningful signal (but because it is underpowered we can't make strong inferences).
 
If you do a sample size calc *correctly* for 2y OS of 20% vs 40%, the answer is *not* 160. That is the take home message for today ;)

I think we all have done enough survival (LC, DFS, etc) analyses to know when differences are that big, we usually find a significant result with less than 100 patients.

This doesn't lessen communitydoc's greater point: that increasing sample sizes to find differences *just in case* one is there we would like to find is laudable.

"I think you are missing the point. You have performed the experiment as if you knew the result."
I performed the "experiment" as one possible set of curves among many which could be possible to meet communitydocs a priori that SBRT would be better at 40% 2y OS vs 20% 2y OS. Sure in the real world we would not see (exactly) that, but those are the numbers we have to plug into our calculator so to speak. On the first try, it was insignificant at 37 patients (P about 0.09). Then added a patient (P about 0.06). Then I added another patient. Then it crossed into significance.

You said
I performed the "experiment" as one possible set of curves among many which could be possible to meet communitydocs a priori that SBRT would be better at 40% 2y OS vs 20% 2y OS. Sure in the real world we would not see (exactly) that, but those are the numbers we have to plug into our calculator so to speak. On the first try, it was insignificant at 37 patients (P about 0.09). Then added a patient (P about 0.06). Then I added another patient. Then it crossed into significance.

I say
Exactly! This is a spurious example. You kept "shaking the box" until you got the answer you wanted. This is why we use preplanned interim analysis instead of "looking at the data" in real time. This is inflating Type I error.

As an example let's take a simple roulette wheel. For the purpose of this example there are only red and black (no house). You use a random generator to create a long series of RBRRBBRRRBBB....I can easily "choose" a series of ten spins that would guarantee I win money.

You are gaming the system.

Same reason why you should switch doors in the Monty Hall fallacy.
 
I guess you know more than all the cooperative groups that do Phase III trials with control of 20% and experimental of 40% that enroll 175-250 patients. To think we could do these studies with only two dozen patients.

Count me completely unconvinced. I will admit that don't know what your widget is doing but "one sample" is concerning.
I am not out to convince anyone. Math is math. I am just a walrus what do I know. (And it's not *my* widget!)

Have we ever seen a trial where they were talking 20% failure rate vs 40% failure rate at 2y and they wanted 160 patients??? You have to read between the lines here a bit, but here is Slotman's ES-SCLC PCI trial. (It's all I can think of at the moment.) They were looking at big differences (brain met occurrence vs not) and calc'd needing just 52 patients. But of course they had to figure in death. Above, we have been "figuring in death" all along (ie death was the event).

02iISz1.png
 
Last edited:
I am not out to convince anyone. Math is math. I am just a walrus what do I know. (And it's not *my* widget!)

Have we ever seen a trial where they were talking 20% failure rate vs 40% failure rate at 2y and they wanted 160 patients??? You have to read between the lines here a bit, but here is Slotman's ES-SCLC PCI trial. (It's all I can think of at the moment.) They were looking at big differences (brain met occurrence vs not) and calc'd needing just 52 patients. But of course they had to figure in death. Above, we have been "figuring in death" all along.

02iISz1.png
Separate issue competing risks.

I still think your example is p-hacking
 
Separate issue competing risks.

I still think your example is p-hacking
Well. I have shown simulated data to match communitydoc's hypothetical. And calculated data with references which perfectly match the simulated data/the hypothetical. And trials where biostatisticians looking at large differences estimated needed N's <<< 160.

That's not really p-hacking.
 
I am not out to convince anyone. Math is math. I am just a walrus what do I know. (And it's not *my* widget!)

Have we ever seen a trial where they were talking 20% failure rate vs 40% failure rate at 2y and they wanted 160 patients??? You have to read between the lines here a bit, but here is Slotman's ES-SCLC PCI trial. (It's all I can think of at the moment.) They were looking at big differences (brain met occurrence vs not) and calc'd needing just 52 patients. But of course they had to figure in death. Above, we have been "figuring in death" all along (ie death was the event).

02iISz1.png
The 287 number is key not the 52:D Look at the censoring for the SBRT trial in question!! We don't need people running 30-50 person trials! Unless your Patchell :p
 
  • Haha
  • Like
Reactions: 1 users
The 287 number is key not the 52:D
I knew you would say that! Brain mets were the event. Death was not. To get enough patients to study the event, they assumed a death rate and "bumped" the number way up. Sans deaths, they would need 52 patients based on the assumed/predicted differences in brain met events PCI vs not, and that was a HR of 0.44. For the curves I showed above which met your a priori of 2y OS of 20 vs 40%, the HR I'm getting for the between curves is 0.38 (95% CI 0.16-0.91). This explains why we are seeing a sample size calc of about 38 or 39 vs 52 when Slotman did his thing.

U6rlENN.png
 
This statistical power discussion feels very:

top gear ladies GIF
 
  • Haha
  • Like
Reactions: 1 users
I knew you would say that! Brain mets were the event. Death was not. To get enough patients to study the event, they assumed a death rate and "bumped" the number way up. Sans deaths, they would need 52 patients based on the assumed/predicted differences in brain met events PCI vs not, and that was a HR of 0.44. For the curves I showed above which met your a priori of 2y OS of 20 vs 40%, the HR I'm getting for the between curves is 0.38 (95% CI 0.16-0.91). This explains why we are seeing a sample size calc of about 38 or 39 vs 52 when Slotman did his thing.

U6rlENN.png
I think that the widget is telling you the lowest possible number of patients needed with the perfect distribution of events. It is, so to speak, the minimum needed to show an effect.

In real life time to event is not predictable.

Yes, many statisticians are conservative and we do need more Bayesian methods but I don't believe that a reputable statistician recommend a sample size this small in a prospective trial; even with the large effect size.
 
  • Like
Reactions: 1 user
I think that the widget is telling you the lowest possible number of patients needed with the perfect distribution of events. It is, so to speak, the minimum needed to show an effect.
NB: A sample size calc is always the minimum number to meet the criteria (usually α=0.05 1-β=0.8).
I don't believe that a reputable statistician recommend a sample size this small in a prospective trial; even with the large effect size.
I have never seen such a large effect size hypothesized for the experimental arm for a randomized trial before; again, the closest I could come was Slotman's guess at the effect of PCI in ES-SCLC. To reiterate, he guessed a HR=0.44. Communitydoc guessed (although neither he nor I knew it when he did guess what he initially guessed) a HR=0.38. Both of these are really, really large between-group differences if you look at any other prospective trial in comparison.

Source Chapter 7, page 179 Perez and Brady Second Edition (yeah I am old)

Table 7-4
Patients required to Detect Improvement in Survival over Baseline Survival

Control Arm 20%
Experimental Arm 40%
Alpha=0.05
1-Beta=0.80
n=150
Alpha=0.05
1-Beta=0.90
n=200

????
I co-authored 5th and 6th editions (I am old too). That fuzzy* math pre-dated me. I'd write a humorous letter to Luther now were he still alive.

*Freedman's table in Brady and Perez doesn't account for time
 
Last edited:
NB: A sample size calc is always the minimum number to meet the criteria (usually α=0.05 1-β=0.8).

I have never seen such a large effect size hypothesized for the experimental arm for a randomized trial before; again, the closest I could come was Slotman's guess at the effect of PCI in ES-SCLC. To reiterate, he guessed a HR=0.44. Communitydoc guessed (although neither he nor I knew it when he did guess what he initially guessed) a HR=0.38. Both of these are really, really large between-group differences if you look at any other prospective trial in comparison.


I co-authored 5th and 6th editions (I am old too). That fuzzy* math pre-dated me. I'd write a humorous letter to Luther now were he still alive.

*Freedman's table in Brady and Perez doesn't account for tim
NB: A sample size calc is always the minimum number to meet the criteria (usually α=0.05 1-β=0.8).

I have never seen such a large effect size hypothesized for the experimental arm for a randomized trial before; again, the closest I could come was Slotman's guess at the effect of PCI in ES-SCLC. To reiterate, he guessed a HR=0.44. Communitydoc guessed (although neither he nor I knew it when he did guess what he initially guessed) a HR=0.38. Both of these are really, really large between-group differences if you look at any other prospective trial in comparison.


I co-authored 5th and 6th editions (I am old too). That fuzzy* math pre-dated me. I'd write a humorous letter to Luther now were he still alive.

*Freedman's table in Brady and Perez doesn't account for time
The important number is the number of events that inform the hypothesis tested.
Humor me more please; serious questions

I think your sample size only works if all participants experience the event by the time of analysis.

This never occurs

We never have complete follow-up and this is why K-M is used.

In the brain met example the events (n=52) and the HR (0.44) are driving the "math" but more than 250 patients are required because patients die or are censored BEFORE experiencing the event of interest.

So if we knew that all participants would experience the event then only 52 would be required.

Do you agree?
 
I think your sample size only works if all participants experience the event by the time of analysis... We never have complete follow-up and this is why K-M is used.

So if we knew that all participants would experience the event then only 52 would be required.

Do you agree?

I don't think so. I am... inelegant in explaining the KM things. But I have a good sense of the numbers, theory, assumptions. (Btw I agree of course re: "We never have complete follow-up and this is why K-M is used.") If all participants experience an event, then the survival for a group is 0%/the failure rate is 100%. So, no, we don't know all participants experience an event in a sample size calc. "By the time of analysis" is a moot point per the parameters of Communitydoc's a priori:

vsod4n5.png


He is specifying the instantaneous survival function at t=2 years. As I showed above, this 20% vs 40% difference at 2 years with nice, benign, slopey KM curves implies a HR of 0.38 between the groups. And let's say we had lots of retrospective data showing some huge improvement difference for a non-standard treatment that implied HR=0.38. Yeah, I would not say we would need a N>100 trial. Look at another calculator:

hvu4dT7.png
 
I don't think so. I am... inelegant in explaining the KM things. But I have a good sense of the numbers, theory, assumptions. (Btw I agree of course re: "We never have complete follow-up and this is why K-M is used.") If all participants experience an event, then the survival for a group is 0%/the failure rate is 100%. So, no, we don't know all participants experience an event in a sample size calc. "By the time of analysis" is a moot point per the parameters of Communitydoc's a priori:

vsod4n5.png


He is specifying the instantaneous survival function at t=2 years. As I showed above, this 20% vs 40% difference at 2 years with nice, benign, slopey KM curves implies a HR of 0.38 between the groups. And let's say we had lots of retrospective data showing some huge improvement difference for a non-standard treatment that implied HR=0.38. Yeah, I would not say we would need a N>100 trial. Look at another calculator:

hvu4dT7.png
If you do a straight simple 0.2 vs 0.4 survival* then yes of course you get a different sample size needed:
JVMy0wJ.png


* I am cheating, it's the wrong word here. "Survival" means something very specific at least pertaining to KM curves.
 
Last edited:
If you do a straight simple 0.2 vs 0.4 survival* then yes of course you get a different sample size needed:
JVMy0wJ.png


* I am cheating, it's the wrong word here. "Survival" means something very specific at least pertaining to KM curves.
OK I get the proportion survival difference. Thanks for teaching me

From the survival portion you come up with 34 events (for HR 0.38).

(I was assuming HR 0.5 which is wrong for the exponential reasons you allude to)

Let's go with 34 events. Is this the sample size required? My understanding is that this is the number of events necessary for the power. Sample size would then depend on how many people it would take to get that many events (baseline risk of control). Or is this wrong as well?
 
I have moved these pots to its own thread to keep the loop together.

In regards to the actual trial, 8Gy x 3 is too low. Trying to induce the abscopal effect is a loser, lot of H&N patients where it's been proven to be a loser (MSKCC trial). Either RT EVERYTHING or RT nothing.

In regards to the statistical discussion between the Wallnerus and the Wombat:

nerds GIF
 
  • Haha
  • Like
Reactions: 3 users
Actually the trial design was flawed. This only works if you use Radscopal doses on a 2nd met, 1 - 2 Gy, combined with abscopal-range doses in the primary met. Obviously, this stuff is bulletproof and works every time, and patients love hearing these cool sounding names. :rolleyes:

Low-dose radiation treatment enhances systemic antitumor immune responses by overcoming the inhibitory stroma | Journal for ImmunoTherapy of Cancer "Ultimately, we propose that our radiation strategy with H-XRT and L-XRT (which we call ‘RadScopal’ technique) in combination with checkpoint inhibitors, modulates the tumor microenvironment (TME) of both primary and secondary tumors to maximize systemic antitumor effects in solid tumors."
 
Let's go with 34 events. Is this the sample size required? My understanding is that this is the number of events necessary for the power. Sample size would then depend on how many people it would take to get that many events (baseline risk of control). Or is this wrong as well?
You are correct, good pick up....

Which is why don't pay me to be your biostats guy. But you can pay me to be a harsh critic! From my simulated KM example above in 39 patients I forecasted 21 events. So the probable sample size is going to lie in the 80-max range from that UCSF calculator, but again depending on which calculator and time points and assumptions you choose can be as low as the upper-30s for a 0.8 power.
 
You are correct, good pick up....

Which is why don't pay me to be your biostats guy. But you can pay me to be a harsh critic! From my simulated KM example above in 39 patients I forecasted 21 events. So the probable sample size is going to lie in the 80-max range from that UCSF calculator, but again depending on which calculator and time points and assumptions you choose can be as low as the upper-30s for a 0.8 power.
I will try to summarize what I take away from this thread.

1) Rely on professionals. Believe it or not I have a master's in Clinical Epidemiology but i have always believed that the professional statistician should be involved with trial design etc. Don't use my example. In real life this is not the way to do things (and rely on web-based calculators). Over the years I have had many residents who say they do their own stats and I politely decline to be involved (on multiple occasions).

2) Effect size is frequently overestimated and frequently leads to underpowered studies. The conversation goes like this...
New PI: BAsed on my review of the literature it is plausible to hypothesize a HR for the new wonderdrug of 0.85.
Stats Pro: OK that will require a sample size of 1200 patients
New PI: That is not feasible...we can't expect more than 400 patients to be accrued

...Weeks pass

New PI: Based on my review of the literature it is plausible to hypothesize a HR of 0.6.
Stats Pro: Ok that will require a sample size of 400 patients
New PI: that's the ticket!!

...Years pass

Stats Pro: The final analysis finds that the HR is 0.82 (95%CI 0.45-1.25). There were fewer events than we expected
New PI: So there is a trend since the 95%CI includes 0.6
Stats Pro: Ugh...i need a new job

I have been a witness to this conversation on multiple occasions

3) If someone puts e^(iπ) + 1 = 0 in their info don't engage in a math contest

Cheers
 
  • Like
  • Haha
Reactions: 4 users

Reminds me of the PEMBRO-RT trial - another negative positive trial (or positive negative trial?)...

"A statistical analysis indicated that with a sample of 74 patients, 37 in each arm, the trial would have a power of 82% with an odds ratio of 4 to detect the difference between a response rate of 20% in the control arm and 50% in the experimental arm at a 2-sided significance level of P < .10."
1630459909094.png


1630459869198.png
 

Attachments

  • 1630459784527.png
    1630459784527.png
    88.1 KB · Views: 40
  • Like
Reactions: 2 users
Actually the trial design was flawed. This only works if you use Radscopal doses on a 2nd met, 1 - 2 Gy, combined with abscopal-range doses in the primary met. Obviously, this stuff is bulletproof and works every time, and patients love hearing these cool sounding names. :rolleyes:

Low-dose radiation treatment enhances systemic antitumor immune responses by overcoming the inhibitory stroma | Journal for ImmunoTherapy of Cancer "Ultimately, we propose that our radiation strategy with H-XRT and L-XRT (which we call ‘RadScopal’ technique) in combination with checkpoint inhibitors, modulates the tumor microenvironment (TME) of both primary and secondary tumors to maximize systemic antitumor effects in solid tumors."
I agree that it was flawed and I don’t think it was just related to dose or stats (though I agree both were a bit flawed). Here is my logic. What are the possible responses to IO therapy? Progressive disease, stable disease, PR, or CR. What percentage of patients achieve any response to IO? It’s right in the graph: 75% progression within 10 months. What are the chances that anyone who doesn’t even achieve stable disease is going to get an augmented systemic response to IO therapy? Very tiny. In other words, in unselected patients IO is unlikely to work and RT is unlikely to help. We can argue about powering but any effect we see is arguably small and again I think that just reflects the fact we were probably asking too much of radiation with that design. If you consider the biology, this makes sense. So many things have to go right to stimulate systemic immunity. In an individual tumor radiation might be able to help with a couple but It can’t do much more than that.

I think the real question we want to know is does SBRT help in patients who achieve at least stable disease or a partial response to IO therapy? In other words, if we know IO is doing something, can RT give it the little push it needs to get over the hump.

Basic science experiments largely support this hypothesis. What cell lines consistently give good abscopal responses? B16F10 (a highly mutagenic murine melanoma model) and MC38 (and MLH1 deficient MSI mutated colorectal model) to name a few. Both also have about a 30% response rate (+\- a bit) to IO alone. What happens if you add SBRT dosing to an IO resistant cell line like CT26 (an MSS colorectal line)? Nothing.
 
Basic science experiments largely support this hypothesis. What cell lines consistently give good abscopal responses? B16F10 (a highly mutagenic murine melanoma model) and MC38 (and MLH1 deficient MSI mutated colorectal model) to name a few. Both also have about a 30% response rate (+\- a bit) to IO alone. What happens if you add SBRT dosing to an IO resistant cell line like CT26 (an MSS colorectal line)? Nothing.
Totally agree. We are shooting blindly with most of the current trial designs. If the best we have is 'we know it works sometimes but we have no predictive capacity' then perhaps we need dedicated, high-enrolling trials (with multiple current generation genomic & immunologic & imaging metrics) to get closer to an answer. Tons of work to be done, and inarguably foolish to declare IO+SBRT a failed idea by Ost.

Post you quoted was mostly humor obviously.
 
I heat alot the argument „3 x 8:Gy is too low“, but please think about the aim of the trial.

This was not an oligometastatic trial or a trial that was supposed to show that local control of irradiated mets played a role in the disease course. This was a „let‘s enhance systemic effect of IO by irradiating one lesion“ trial. And in this setting, 3 x 8 Gy is, based on preclinical data, an acceptable concept. It‘s the mainly advocated regime of the Formenti research group.

Why is this trial important?
It is certainly flawed because numerous patients with different tumors and associated biologies were enrolled and the sample size was small. Abscopal effect is a rare fact that may occur only in few of the patients, thus showing its efffect of PFS / OS with a small trial with a diverse population will be difficult.

The bad thing is, that some of us - I presume - get referrals from med oncs for radiotherapy during IO, sometimes justified with an argument like „The mets are not really symptomatic, but the patient is on IO, maybe RT will enhance his response rate.“ Those referrals will drop as more and more of these phase II trials come back negative, for instance:

I do not really believe in the concept, the number-needed-to-treat is probably high, but I do believe that treating some of those mets earlier may be good for symptoms avoidance / local control.
 
Last edited:
  • Like
Reactions: 3 users
Top