Modeling truncated counts with glmer

Hi,

In my experiment 20 participants did a word-pairs learning task in two  
conditions (repeated measures):
40 pairs of nouns are presented on a monitor, each for 4s and with an  
interval of 1s. The words of each pair were moderately semantically  
related (e.g., brain, consciousness and solution, problem). Two  
different word lists were used for the subject?s two experimental  
conditions, with the order of word lists balanced across subjects and  
conditions. The subject had unlimited time to recall the appropriate  
response word, and did three trials in succession for each list:

Condition 1, List A > T1, T2, T3
Condition 2, List B > T1, T2, T3

No feedback was given as to whether the remembered word was correct or not.

I've seen some people go at this with anova, others subtract the total  
number of correct pairs in one condition from the other per subject  
and run a t-test. Since this is count data, a generalized linear model  
should be more appropriate, right?

head(data)
   subjectNumber expDay      bmi treatment tones       hour abruf  
correctPair incorrectPair
           <dbl>  <chr>    <dbl>    <fctr> <dbl>     <time> <dbl>       
  <dbl>         <dbl>
1             1     N2 22.53086   Control     0 27900 secs     1        
    26            14
2             1     N2 22.53086   Control     0 27900 secs     2        
    40             0
3             1     N2 22.53086   Control     0 27900 secs     3        
    40             0
4             2     N1 22.53086   Control     0 27900 secs     1        
    22            18
5             2     N1 22.53086   Control     0 27900 secs     2        
    33             7
6             2     N1 22.53086   Control     0 27900 secs     3        
    36             4

I fitted a model with glmer.nb(correctPair ~ I((abruf - 1)^2) *  
treatment + (1|subjectNumber), data=data). The residuals don't look so  
good to me http://imgur.com/a/AJXGq and the model is fitting values  
above 40, which will never happen in real life (not sure if this is  
important).

I'm interested in knowing if there is any difference between  
conditions (are the values at timepoint (abruf) 1 different? do people  
remember less in one one condition than in the other (different number  
of pairs at timepoint 3?)

If the direction I'm taking is completely wrong please let me know.

Best,
Santiago