Wednesday, January 11, 2017

Mastering Social Media Mining 2 - Mining Twitter Hashtags, Topics, and Time Series

The curve that we can observe represents an approximation of a power law (https://en.wikipedia.org/wiki/Power_law).

In statistics, a power law is a functional relationship between two quantities; in this case, the frequency of a term and its position within the ranking of terms by frequency.

This type of distribution always shows a long tail (https://en.wikipedia.org/wiki/Long_tail), meaning that a small portion of frequent items dominate the distribution, while there is a large number of items with smaller frequencies.

Another name for this phenomenon is the 80-20 rule or Pareto principle (https://en.wikipedia.org/wiki/Pareto_principle), which states that roughly 80% of the effect comes from 20% of the cause (in our context, 20% of the unique terms account for 80% of all term occurrences).


No comments:

Post a Comment

Blog Archive