home / probability and statistics / inferential statistics / random

Random

Statistical randomness refers to a process of selection in which the intent is for each item in a given set to have an equal probability of being chosen. An object is said to be statistically random when there are no recognizable patterns or regularities. Statistical randomness is important because a large part of statistics involves the use of smaller samples to represent an entire population. Formally, the definition of statistical randomness involves the use of random variables: numerical values are assigned to each potential outcome in a given sample space (the set of all possible outcomes of the experiment).

Random sampling

Random sampling refers to specific, rigorous procedures for selecting a subset of individuals (where each individual is chosen randomly) from a larger set (the population) that is intended to be an unbiased representation of said population. Ideally, when conducting a study, data from the entire population of interest would be collected, but this is often very difficult or prohibitively expensive. As such, samples are typically used instead to make inferences or conclusions about the population as a whole. Even if samples are properly collected, because the sample can never fully represent the population, there will be some degree of sampling error, since samples are an estimation of the population.

In order for a sample to most effectively represent a population, the sampling methods used need to ensure that the selection process is random. Note that random sampling refers to the selection process, not the particular observations made about the sample. If the sampling methods used are not adequate, any inferences or conclusions made based on a flawed sample may not be valid.

Generating random samples

The most basic sampling technique is referred to as simple random sampling. Simple random sampling refers to the various ways in which a group of subjects (the sample) can be selected from a larger group (the population) in such a way that each member of the population has an equal chance of being selected for the sample.

One of the simplest (but potentially tedious) ways to obtain a simple random sample is by using a lottery. For example, if there are 15 students in a classroom, and we wanted to select 5 of them at random, we could write each of their names on a separate piece of paper, and put all the names in a hat. We can then select 5 names from the hat to generate our random sample. In this case, there are only 15 students in the classroom, so it wouldn't have been difficult to just use the entire population, or have a larger sample size. However, as the population and sample get larger and larger, this method of selecting a sample becomes more tedious. In such cases, it is common to use computers or calculators to generate the random sample.

Using the same example as above, to select a sample using computers and random number generators, we would assign each of the 15 students a number. One way to do this would be to alphabetize the student's last names, then assign a number (01-15) to each of the students. In the figure below, numbers are assigned to the first 5 students on the list:

Cline	01
Gonzalez	02
Ibarra	03
Morse	04
Nguyen	05
⋮	⋮

We would continue assigning a number to the last 10 students, then use a random number generator that generates values from 01-15 to select 5 numbers, then select the 5 students with the corresponding numbers. This method does not have the same limitations as the previous one since computers can handle much larger populations than humans can manually.

Sampling error

Sampling error is error that occurs when the statistical characteristics of a population are estimated using a sample of the population. It arises because a sample does not include all the members of the population being studied, so statistical characteristics of the sample generally differ from that of the population as a whole. Because the true population values are typically not known, it is not really feasible to exactly measure sampling error.

It is possible to reduce sampling error by increasing the size of the sample. Generally, the larger the sample size, the more representative the sample is of the population, and the smaller the sampling error. However, it is possible for a sample size to be too large. In some cases, a sample size that is too large could result in the detection of a very small effect in the population that is not important to the study, but may result in the rejection of the null hypothesis.