When you have collected data on your system or process, the next step is to determine what type of probability distribution one has. The types of probability distributions are: discrete uniform, Bernoulli, binomial, negative binomial, Poisson, geometric, continuous uniform, normal (bell curve), exponential, gamma and beta distributions. Narrowing even a few from the list of possibilities makes determining which is the closest R squared value much faster.
Plot the data for a visual representation of the data type.
One of the first steps to determining what data distribution one has - and thus the equation type to use to model the data - is to rule out what it cannot be.
• If there are any peaks in the data set, it cannot be a discrete uniform distribution.
• If the data has more than one peak, it is not Poisson or binomial.
• If it has a single curve, no secondary peaks, and has a slow slope on each side, it may be Poisson or a gamma distribution. But it cannot be a discrete uniform distribution.
• If the data is evenly distributed, and it is without a skew toward one side, it is safe to rule out a gamma or Weibull distribution.
• If the function has an even distribution or a peak in the middle of the graphed results, it is not a geometric distribution or an exponential distribution.
• If the occurrence of a factor varies with an environmental variable, it probably is not a Poisson distribution.
After the probability distribution type has been narrowed down, do an R squared analysis of each possible type of probability distribution. The one with the highest R squared value is most likely correct.
Eliminate one outlier data point. Then recalculate R squared. If the same probability distribution type comes up as the closest match, then there is a high confidence that this is the correct probability distribution to use for the data set.
Plot the data for a visual representation of the data type.
One of the first steps to determining what data distribution one has - and thus the equation type to use to model the data - is to rule out what it cannot be.
• If there are any peaks in the data set, it cannot be a discrete uniform distribution.
• If the data has more than one peak, it is not Poisson or binomial.
• If it has a single curve, no secondary peaks, and has a slow slope on each side, it may be Poisson or a gamma distribution. But it cannot be a discrete uniform distribution.
• If the data is evenly distributed, and it is without a skew toward one side, it is safe to rule out a gamma or Weibull distribution.
• If the function has an even distribution or a peak in the middle of the graphed results, it is not a geometric distribution or an exponential distribution.
• If the occurrence of a factor varies with an environmental variable, it probably is not a Poisson distribution.
After the probability distribution type has been narrowed down, do an R squared analysis of each possible type of probability distribution. The one with the highest R squared value is most likely correct.
Eliminate one outlier data point. Then recalculate R squared. If the same probability distribution type comes up as the closest match, then there is a high confidence that this is the correct probability distribution to use for the data set.