Tools like SAS Visual Analytics and similar use terms like outlier and maximum and I wasn’t sure of the difference.
Today, I was looking at Box plots and while I could see the difference between an outlier and a maximum, I didn’t really know how each is calculated. Here goes….
While mean, medium and percentile are fairly clear, I’m not sure how the Maximum and outlier are calculated.
SAS defines the maximum and outlier as:
Outlier: an observation outside the lower and upper fences. The fences are located at a distance 1.5 times the Interquartile Range (IQR = Q3 – Q1) above and below the box
Max: maximum data value less than or equal to the upper fence.
Where:
IQR (inter quartile range) = Q3 – Q1
Q1 – 1st quartile (25th percentile). The data must contain a nonmissing value for this quartile.
Q3 – 3rd quartile (75th percentile). The data must contain a nonmissing value for this quartile.
I’m doing a part-time course in Data Analytics at the University of Brighton. Seems like a good excuse to post some stuff here and see if my site gets a new lease of life.
I’ll probably post things on here that I need to know.
This is an example of a plot of the GA (google analytics) data stored in Json format. The data is plotted using d3.js. Have a look at the gallery. I do not think I have done d3js justice with my line charts.
Using data from google analytics (GA) (see other posts) a Fast Fourier Transform is performed on the time series data. The absolute value is saved for plotting using d3.js
I’m doing a talk at Big Data Brighton on April 25th. I will add the slides after but for now I’ll put up the matlab m file scripts. All at: https://github.com/espeecat/big-data-brighton-april-2013
Slides for Time Series Data and FFT
This shows a single sinusoid and Fourier Transform
This is the example which takes several sinusoids, sums them and shows their Fourier Transforms
This is the filtering example