GRAM: Fast Fine-tuning of Pre-trained Language Models for Content-based Collaborative Filtering

Applying pre-trained language models(PLM) to Knowledge Tracing(KT) is an important task in order to improve student assessment accuracy for cold-start items. Simply put, natural language processing(NLP) complements Knowledge Tracing(KT) in analyzing questions that have no response data yet and the KT model is not trained for. However, the inherent hurdle is that training the model from end-to-end(E2E) is expensive.

Definition:

Knowledge Tracing models attempt to accurately determine a student’s current knowledge state, such as whether a student may respond correctly to a question

GRadient Accumulation for Multi-modality in CCF(GRAM) has been introduced to reduce the time spent in training such models, and empirical evidence suggests that GRAM drastically reduced the amount of GPU memory and improved training efficiency to up to 146× on several datasets.