Thursday June 15, 2006
Jan de Leeuw on Psychometrics. Lots of disciplines, generically discipline foo, have Foometrics. Classic Jan. Most psychometricians use SPSS, some SAS; not a lo of interactino between psychometrics and statistics (“…historical reasons, most of them silly”). Psychometric software distributed by incorporating it as modules in standard packages (SPSS, SAS, Stata); guarantees good distribution, some money, but certainly not efficient computation (e.g., CATEGORIES for CA in SPSS, CALIS for SEM in SAS, gllamm in Stata). Standalone: SEM packages like LISREL, EQS, M_PLUS, AMOS, or MLA packages like HLM, ML-WIN (standalone companies, by psychometricians, who work for universities, and so are poor, and are trying to make some money). Black-box, proprietary; machinery completely hidden (because it is proprietary); but models/programs complicated, many parameters, complicated optimizations, doubtful standard errors (simple stuff in SAS/Stata/SPSS, hence harder stuff in commerical products). Hence, R in psychometrics has advantages:
- distance to academic statistics becomes smaller
- software is more transparent, driven by interpreted code. Reproducible results are more likely
- one can teach with R. One teach SAS, but one cannot teach with SAS, or LISREL
- Software should be free
psychoR; “let a thousand flowers bloom”. http://www.cuddyvalley.org/psychoR. Special issues of Journal of Statistical Software.
- simple and multiple correspondence analysis: MASS, FactoMineR, homals. ade4, PTAk; homals to become gifi. distaccoc, scalassoc, singlepeaked, logithom (IRT variants).
- ltm simple Rasch model (and extensions). mprobit. logistic IRT in VGAM. simple unifying algorithm for doing lots of latent variable models, including most IRT models.
- factor analysis: factanal in stats. MCMCpack (!) ordinal and mixed factor models. Related to IRT. See also homals
- three-mode analysis: PTAk, various form of k-mode component…
- SEM: sem (R package) using the RAM specification. Needs a lot of add-ons to compete with the stand-alone SEM packages.psychoR does least squares SEM.
- multidimensional scaling: non-metric MDS in MASS, labdsv, ecodist, vegan and xgobi/ggobi. Kruskal-type least squares loss functions. Classic Torgerson metric MDS, and principal coordinate analysis (Gower) in …. psychoR has metric and non-metric least squares multidimensional scaling.
- HLM and LogLin not discussed because they are mostly outside Psychometrics…
Brian Everitt: Cluster analysis. I.J.Good 1965, why/what is cluster analysis? Last given reason, “for fun”…! Role of cluster analysis as a clinical, diagnostic tool in psychology. Thorndike, Psychometrika 1953 18:267-276. Pretty funny. Friedman and Rubin: JASA, 1968? Data-mining once a perjorative!
Break-out: Olivia Lau doing Zelig. Hmmm.
Claudio Agostinelli, Circular data, cool pictures.
Kevin Quinn, on MCMCpack. Why not use WinBUGS, JAGS, OpenBugs? Black-box feel of WinBUGS, slow, warning messages about WinBUGS giving wrong answers. [SDJ: sure, but isn’t MCMCpack black-box-ish — this is kind of a generic tension in statistical software design, easy to use tends to “black box”…] Kevin shows the IRT setup. Example 2: work in progress, Bayes Factors for Model Comparison, different methods for marginal likelihood computation (e.g., Chib95). BayesFactor function, nice. Example 3: single block Metropolis sampling for a user-defined model; user supplies a function that returns log-posterior density, their function then does Metropolis (worked example was negative binomial regression)
Get some IRT data from http://work.psych.uiuc.edu/irt/downloads.asp
Pat Burns: random portfolios, back-testing. Permutation tests, an amount of money in each asset (usually a constraint, such as only 100 stocks you can have a position in). Permute the amounts in each asset. 6 lines of R. Permuted portfolios have volatilities that are too high; real portfolios have constraints. E.g., non-negative weights (no shorting), weights lie in some bounds, liquidity constraints, limit constraints, threshold constraints. Random portfolios now amount to random sampling in the set of constrained portfolios. “Easy to say, hard to do”. Use a genetic algorithm, typically. 6 lines of R to several thousand lines of C code. R used as a visualization tool on the back-end.
and this guy just came up to me, saw that I had been blogging earlier in the day, and inquired as to whether I was into blogging aggregators. planet planet, he mentioned. looks interesting.