Descriptive statistics summarize and describe the characteristics of a data set, while inferential statistics use sample data to make predictions or inferences about a larger population. Descriptive statistics include measures of central tendency (mean, median, mode) and measures of variability (standard deviation, variance), while inferential statistics involve hypothesis testing and estimation techniques.
To calculate the standard deviation:
The formula for population standard deviation is:
σ = √(Σ(x - μ)² / N)
Where σ is the standard deviation, x is each value in the data set, μ is the mean, and N is the number of values.
The coefficient of variation (CV) is a measure of relative variability that expresses the standard deviation as a percentage of the mean. It's calculated as:
CV = (Standard Deviation / Mean) * 100
The CV is useful for comparing the variability of data sets with different units or vastly different means.
Both t-tests and ANOVA (Analysis of Variance) are used to compare means, but:
T-tests are appropriate for comparing two means, while ANOVA is more efficient for multiple group comparisons and reduces the risk of Type I errors.
Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It involves:
This process helps researchers determine if there is enough evidence to support a particular claim about a population parameter.
Both Pearson and Spearman correlation coefficients measure the strength and direction of association between two variables, but:
Spearman correlation is more robust to outliers and non-linear relationships compared to Pearson correlation.
The normal distribution, also known as the Gaussian distribution, is crucial in statistics because:
Popular statistical software packages include:
Each software has its strengths and is used in various fields for different types of statistical analyses.
Citations: [1] https://www.cuemath.com/data/descriptive-and-inferential-statistics/ [2] https://pmc.ncbi.nlm.nih.gov/articles/PMC6362742/ [3] https://datatab.net/tutorial/descriptive-inferential-statistics [4] https://uk.indeed.com/career-advice/career-development/descriptive-statistics [5] https://statisticsbyjim.com/basics/descriptive-inferential-statistics/ [6] https://www.qualtrics.com/en-gb/experience-management/research/descriptive-statistics/ [7] https://careerfoundry.com/en/blog/data-analytics/inferential-vs-descriptive-statistics/ [8] https://www.scribbr.co.uk/stats/descriptive-statistics-explained/
Clinical studies aim to gain evidence that a treatment leads to an improved health outcome. Patients and diseases are complex, and patients respond in different ways to treatment.
The value of a treatment cannot be judged from a single patient, as there are many factors other than the treatment that determine the outcome, and it is only by measuring the effect on many people that an understanding is achieved.
The most powerful method is the randomized trial, where patients are randomly allocated to different groups to receive different treatments. This helps to balance these other factors between groups and eliminates bias; for example, there may be a belief by the clinicians, that certain patients will benefit more from the new treatment and if these patients are chosen to have the treatment, then this could create bias.
It might be that these patients were going to improve without the treatment being tested and if these results are analysed, then a spurious conclusion that patients benefit form the treatment could be reached. Other means of eliminating bias include blinding of the patients and the doctors treating the patient, as to who received the trial treatment, where it is possible.
In the case of a drug this could be an identical appearing tablet or injection and is called a placebo. Some of the outcomes of trials can be subjective and belief in the intervention by the clinician or the patient could improve the outcome; this is called the placebo effect.
The number of patients needed depends on the magnitude of the effect of the treatment on the variable being measured relative to the magnitude of other factor that lead to variability. The question that is usually asked is; is one treatment better than the other either way? but better may not be enough if the difference is very small.
A very small difference may not be very worthwhile as there may be other downsides to a treatment, including safety, practicality, and cost. The question asked is the upside, the measured benefit, is clinically significant. This is different to being statistically significant.
Statistically significant mean that any difference, no matter how small, was unlikely to have happened by chance. An intervention could be clinically significant but not statistically significant, meaning that it looks promising but not enough information has been gathered yet to know that what has been measured so far is genuine and it still may have arisen by chance.
The most used threshold in medicine is 5%; the probability that the measured difference arising purely be chance with no treatment effect is 5% or less. If this threshold is passed then it is accepted that the treatment has an effect.
It is possible to ask other questions when assessing a novel treatment. Instead of asking is one treatment different to the other, it is possible to ask is the treatment no worse by a margin equivalent to a minimal clinically significant difference, which is called a non-inferiority trial.
The rationale for this is that a new treatment may not be better than the existing treatment for the main or primary outcome, but it may have some other advantages like fewer side effects or cost benefits. With the passage of time, it may be harder to prove that newer treatments are better than existing ones.
An alternative question is; is the new treatment better than an existing one but by a minimally clinically significant margin? This question is explored in the paper posted here: Power for Clinical and Statistical Significance.
The aim is to design studies which are both clinically and statistically significant. One of the difficulties with the latter two questions is that in practice it may not be clear what constitutes the minimally significant difference for patients.
It may not have been previously addressed and there may be no universal answer to this value as it might differ between healthcare systems, cultures and change with time. However, if this difference is proven then the standard question- 'is there any difference at all?' is also proven.