How are errors terms calculated in GARCH model by rugarch package?

Hi,

I am fitting a GARCH(1,1) model to the data and want to look at the 
innovation distribution.

###################
  #generate the data
     set.seed(1)
     N = 5000
     omega = 0.5
     alpha = 0.08
     beta = 0.91
     X1 = rep(0,N)
     X2 = rep(0,N)
     sig1 = rep(0,N)
     sig2 = rep(0,N)

     for(i in 2:N){
       sig1[i] = sqrt(omega + alpha * X1[i-1] + beta * sig1[i-1]^2)
       X1[i] = sig1[i] * rnorm(1)
       sig2[i] = sqrt(omega + alpha * X2[i-1] + beta * sig2[i-1]^2)
       X2[i] = sig2[i] * rt(1, df = 8)
     }

     X1 = X1[-c(1:1000)]
#################

I first generate the data and fit it to GARCH(1,1) model with t innovation.

########################
     spec = ugarchspec(mean.model=list(armaOrder=c(0,0),include.mean=F),
                   distribution.model="std") # GARCH(1,1) model
     myfit = ugarchfit(spec, X1)
########################

Suppose the fitted model is called `myfit` then I can get the error 
terms by

###################
z1 = myfit at fit$z
###################

However, if I only extract the parameters omega, alpha, and beta 
estimated by the `rugarch` package and calculate the error terms manually as

$\sigma_t^2 = \omega + \alpha X_{t-1}^2 + \beta \sigma_{t-1}^2$

$z_t = X_t / \sigma_t$

The code is:

###########################
     omega1 = myfit at fit$coef[1]
     alpha1 = myfit at fit$coef[2]
     beta1 = myfit at fit$coef[3]

     z2 = rep(0,(N-1000))
     sighat1 = rep(0,(N-1000))
     sighat1[2] = 1
     sighat1[1] = 1
     for(i in 2:(N - 1000)){
       sighat1[i] = sqrt(omega1 + alpha1 * X1[i-1] + beta1 * sighat1[i-1]^2)
       z2[i] = X1[i]/sighat1[i]
     }
###########################

I got very different results between these two approaches by comparing 
the Q-Q plot of `z1` and `z2`.

###########
     qqnorm(z1)
     qqline(z1)
     qqnorm(z2)
     qqline(z2)
###########

`z1` seems to be normally distributed following the data generating 
process, while `z2` has a heavy tail following the model specification. 
I was wondering why they are so different? I arbitrarily chose 1 for for 
first two terms of $\hat{\sigma}$, so does the difference come from the 
initial values?

And more generally, how does `rugarch` package fit the GARCH model and 
choose initial values? My understanding is that we need to find 
parameters using either QMLE or MLE and then find error terms 
iteratively using my second approach. But I am not sure how is the 
initial value chosen.

Thanks!
Patrick