Brian proposed his dissertation today. How do you indicate “doctoral candidate”? “PhD(c)” ? “PhD ABD” ?
I have been working on a reliable optimization method for this crazy function.
f.egg<-function(x,y){
(2+cos(x)+cos(y))/(100+x^2+y^2)
}

I noticed that if I had a large variance in the random normal generator, the optimizer would jump all over the place but would not settle down in the best optimum. So I added in a second step that takes the best of the attempted values and uses them as a second set of start values. Then, a second smaller variance is used that does not allow for major jumps. The two step anneal function can be seen below.
twostep.anneal<-function(f,mu,n=1000,sig=c(1,.1),t=1,g=0.999){
xm = mu[1]
ym = mu[2]
fm = f(xm,ym)
x = rep(NA,n*2)
y = rep(NA,n*2)
fx = rep(NA,n*2)
for(k in 1:2){
s=sig[k]
if(k==2){
xm=best[1]
ym=best[2]
}
for (i in 1:n){
dxm = xm+rnorm(1,0,s)
dym = ym+rnorm(1,0,s)
fdm = f(dxm, dym)
t = t*g
if (runif(1) < (fdm/fm)^(1/t)){
xm = dxm
ym = dym
fm = fdm
}
x[(i+(k-1)*n)] = xm
y[(i+(k-1)*n)] = ym
fx[(i+(k-1)*n)] = fm
}
ii = which(fx==max(fx, na.rm=TRUE))[1]
best=c(x[ii], y[ii])
}
list(x = x, y = y, fx = fx, best = best , fbest = fx[ii], t=t)
}
To run the function and see the path of the anneal function use this code.
x<-seq(-30,30,.5) y<-seq(-30,30,.5) z<-outer(x,y, f.egg) aa<-twostep.anneal(f.egg, n=1000, sig=c(2, 2), mu=c(5,5), t=2, g=.99) contour(x,y,z) lines(aa$x, aa$y, col=2)
I have always been fascinated by unique experiences. During last night’s Super Bowl I remarked to the other people in the room after the punter ran around in the end zone waiting for someone to tackle him so he could give the other team 2 points: “I bet that is the first time that has happened in a Super Bowl, and maybe the last…”
It occurs to me that assessment provides many unique opportunities as well. In my master’s thesis, “A Confirmatory Factor Analysis of the Latent Structure and Measurement Invariance in the University of Utah’s Student Course Feedback Instrument” I was the first person to do a thorough analysis of the factor structure of the instrument. I found four factors that best explained students’ responses to the 14 questions, and lo and behold, we suddenly know something that no one has bothered to discover before.
The labels I gave to the factors that emerged were Organization, Creating an Effective Learning Environment, Instructor Skills, and Course Outcomes. I also found that responses to these factors were invariant across gender, ethnicity, and type of college. So, we now know something that was probably important to know years ago, which is that our instrument is invariant - that a “5” for a male professor means the same thing as a “5” for a female professor. Very cool. And, quite a relief to find out…
This is the poster that coined our company name. It posits that these great figures showed us how to read minds by reading data.More on that theme in later posts.
People pictured (from top-left): Rosalind Picard (mother of affective computing), Noam Chomsky (father of modern linguistics), Bing Liu (father of sentiment mining), George Gallup (father of opinion polling), Ralph Elliott (father of wave theory), Adam Smith (father of economics), Sigmund Freud (father of psychoanalysis), danah boyd (mother of new media scholarship), Friedrich von Hayek (father of neo-liberalism), Edward Turing (father of artificial intelligence), and Rene Descartes (father of modern philosophy).
p.s., I created it for an invited presentation at an academic conference on Business Analytics.
News: Tyler (JackStat) wrote a paper that will be published in the prestigious (16% acceptance rate) Psychometrika!
photo credits:
http://www.psychometrika.org/journal/PMjSubscriptionInst.html
At the time of the creation of this blog, Cronbach’s 1951 piece on coefficient alpha has 18,132 citations according to google scholar. The main use of coefficient alpha is to assess internal consistency reliability of a test or survey.
Although it may have been forgotten, the proof Cronbach demonstrated established that coefficient alpha is the mean of all split half reliabilities that have an equal number of items on both splits. The proof is often criticized and it has been said that the proof is only valid when the items exhibit tau equivalence (all of the factor loadings are equal in the population) and unidimensionality (all items load onto only one factor). I argue that the proof is still valid if items do not fit with the two assumptions described above but the estimate of reliability will be off.
To demonstrate this I put together a web application that shows the value of alpha, the mean of a number (can be increased or decreased) of split-half reliabilities, and the population parameter of reliability for the particular data structure that is selected by you. Notice that in the 1 factor models the both estimates (alpha and the mean of the split-half reliablities) are close to the population value but when you go to the 3 factor and 5 factor models there is a lower bound bias present.
I have been conducting several simulations that use a covariance matrix. I needed to expand the code that I found in the psych package to have more than 2 latent variables (the code probably allows it but I didn’t figure it out). I ran across Joreskog’s 1971 paper and realized that I could use the confirmatory factor analysis model equation to build the population covariance matrix.
The code below demonstrates a 5 factor congeneric data structure
fx is the factor loading matrix, err has the error variances on the diagonal of an empty matrix, and phi is a matrix of the correlations between the latent variables.
#######################################
###---Population Covariance Generation
#######################################
###---Loadings
fx<-t(matrix(c(
.5,0,0,0,0,
.6,0,0,0,0,
.7,0,0,0,0,
.8,0,0,0,0,
0,.5,0,0,0,
0,.6,0,0,0,
0,.7,0,0,0,
0,.8,0,0,0,
0,0,.5,0,0,
0,0,.6,0,0,
0,0,.7,0,0,
0,0,.8,0,0,
0,0,0,.5,0,
0,0,0,.6,0,
0,0,0,.7,0,
0,0,0,.8,0,
0,0,0,0,.5,
0,0,0,0,.6,
0,0,0,0,.7,
0,0,0,0,.8), nrow=5))
###--Error Variances
err<-diag(c(.6^2,.7^2,.8^2,.9^2,
.6^2,.7^2,.8^2,.9^2,
.6^2,.7^2,.8^2,.9^2,
.6^2,.7^2,.8^2,.9^2,
.6^2,.7^2,.8^2,.9^2))
###---5x5 matrix of factor covariances
phi<-matrix(c(rep(.3, 25)), nrow=5)
diag(phi)<-1
sigma<-(fx%*%phi%*%t(fx)+err)
######################################
For sample data I used the mvrnorm() function from the MASS package
library(MASS) mvrnorm(100, nrow(fx),sigma)
To simulate parallel form data the values in the fx matrix need to be the same and the diagonal in the err matrix need to be the same. One could also manipulate the phi matrix and thus change the correlations between the latent variables.
For the last year I have been developing a package “Lambda4” to improve internal consistency reliability estimation. In the package’s conception my primary concern centered on H.G. Osburn’s maximized lambda4 estimator. Despite a very thorough search I could not find a stats package that could utilized Osburn’s method. I wanted to learn R and so I jumped in and tried to make the function. The original function has changed dramatically as I learned methods to speed up the code and tweaks to the original method that improved the precision of the estimator. That function is now called cov.lambda4() and provides a modern perspective on reliability estimation. The package is slowly developing into a set of function that I have developed as well as a collection of some of the classics and forgotten estimators of internal consistency reliability. A major update is on the way that will include all 6 of Guttman’s lambdas, and a couple of other relatively unknown estimators. Follow the blog if you want to hear more about the specific functions in the package. I will be adding posts for each of them. If you want to download the package you can use the code in R.
install.packages(“Lambda4”)
for further documentation go here
http://cran.r-project.org/web/packages/Lambda4/index.html
If you have any ideas or comments please post them.