If I have data with three dimensions (2 sample locations, 2 treatment, and 2 sources) and I need to make it easy to compare the two treatments to one another in the context of paired sample locations and sources, what's the best way to do it. Oh, yeah, the data spans 6 logs.
I currently have 12(!) bar graphs to separate out the high, low, and middle of the dynamic ranges, with a set containing treatments within one location and a set comparing treatments withing another location, so 3x3x2 graphs. There's gotta be a better way.
- Mr. Gunn
what do your data look like ? can we see one or two rows of data ?
- Pierre Lindenbaum
sure, let me throw it into google spreadsheet
- Mr. Gunn
looking at your data... first (stupid) observation: it wouldn't it be easier to handle in a table with four columns Treatment,Property-Name,Place,Result ?
- Pierre Lindenbaum
... with those four columns you could upload your data in a SQL engine a make aggregates (group by,sum,mean...)
- Pierre Lindenbaum
Thanks, Pierre. I guess I asked a vague question, maybe because I'm not really sure what I want. I want to display the quantititive differences between the treatment groups for each property, and it's hard to display the differences in a compact manner with the values spanning such a large range. I've tried taking the log of the data, but that makes it harder to read and not nearly as clear.
- Mr. Gunn
linear model/anova with terms for the categories you are looking at? Or PCA across the matrix, zero out component(s) specifying location, home in on treatment-informative axes (tracey-widom stats are useful here). LM is certainly simpler: upload as table into R then go: lm(prop1 ~ location + source + treatment, data=myTable). Look at the qqnorm() for the residuals, which should lie on the diagonal if you're modelling things correctly (ie your predictors are linear). Otherwise you will have to do adaptive fitting of the explanatory variables (using something like gam() from the mgcv package).
- Chris Cotsapas
If you then want to compare properties apples-to-apples, you can convert them into Z scores (ie standardize) as foreach (property P) foreach(sample i) (P_i - mean(P))/sd(P)
- Chris Cotsapas
Chris, I thought about PCA, but I thought it might be a bit overkill, since I already know the main important dimensions. I'm a bit limited on how I can do transformations of the data, too, since they all have their own characteristic units and the numbers will be interpreted in the context of the known normal ranges.
- Mr. Gunn
Lm should still work. Each prop is handled differently so units aren't a problem.
- Chris Cotsapas
from Nambu
Props to @JATetro for pointing out that I just needed to take the relative differences, which collapses everything into the same range, making it graphable on the same chart.
- Mr. Gunn
I was going to suggest taking logs to achieve the same end, but relative diffs is better. Again (http://friendfeed.com/the-lif...), kudos all round: this is a great example of community in action.
- Bill Hooker
Yes, I was thrilled to have so many good suggestions from skilled people. Bill - I tried the log thing, but it makes the comparisons less visually apparent. Taking %diff messes up the units, so I'll have to put that in as table or something, but the good thing about it is cuts the number of data points in half.
- Mr. Gunn
@chris I think the R function scale(matrix) does a similar approach of standising data using root mean square instead of standard deviation. You can use it on a whole data matrix rather than looping each variable.
- Michael Barton
I agree with Chris to recommend to standardising the data if you doing any modelling, as this can prevent over-variation of individual parameters biasing model estimation.
- Michael Barton