Wikipedia defines statistics as Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. They differentiate between two statistical methods: Descriptive statistics Inferential statistics Descriptive statistics summarizes a set of data by indexes. Commonly known items that can be derived are items like the mean, mode, variance and standard deviation. A descriptive statistic (in the count sense) is a summary statistic that quantitatively describes or summarizes features from a collection of information. Inferential statistics goes further than descriptive statistics, as we infer items from the data. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population. Notation and the problem As we're only humans and finite, it is the case that we cannot observe every object , and every event there is to observe in the universe. We only observe a finite amount of items in our lifetime. Let's take the example of apples. Let's say you go to the store and buy all the apples there is to buy (don't you just love when people start math problems in this way?) Each apple will have a specific colour, and weight (size). Let's designate the weight of each apple as And the colour of the apple as Hence, each apple (for our use case) can be described as And we can describe the set of all the apples we bought as Hence, the i at the end of each x above signifies the ith apple of the n number of apples we bought. If we bought 4 apples, we'll have four xi observations. The mean as an example The mean (or average) is rather straightforward. It is the sum of all the weights of the apples we bought, divided by the number of apples.
The way the mean (statistic) of the sample we bought is represented, is via an x with a bar on the top. This is descriptive statistics. The other letter you see, the u with a hat on top , indicates that we're using the x bar as an estimator for the actual mean of all the apples in the world. Think about it. Will you ever be able to buy all the apples in the world to sum their weights and divide by the total number? Of course not. The best we can do is get a sample, take the sample average and infer from this average that the mean size of the world's apples is x bar . Hence, Where the u with no hat signifies the actual population mean (that is the true mean weight of all the world's apples). What about categorical issues? We now note that the colour of an apple is not as easily quantifiable as the weight. How can we describe the apple's colour? Consider the below function.
If we were to sum the x i 1 elements across our n number of apples. We'll determine how many red apples we have. So, now r indicates how many red apples we have in our sample, and p hat is an estimator for p (the true proportion of red apples across all the apples in the world), in the same way that u hat was an estimator for u . r is therefore a descriptive statistic, and p hat is an inferential statistic. Conclusion Pretty neat, hey? Now I want your mind to start working on the assumptions that underlies the items discussed above, especially with regards to inferential statistics. We won't discuss these items yet - but we'll do so later on in the series. Coming up: Probability distributions Common probability distributions Limitations of statistics and underlying assumptions The problem of induction Chance and sovereignty of God ... Statistics, likelihoods, and probabilities mean everything to men, nothing to God.