Basically, when user is over a word you can do an ajax call to a controller action passing word where he is on.
This approach could be slow, so you could parse every words in the text to prepare before glossary of each words; but this evaluation can be done only if you know number of words that a text could contain.