Code-switching (CS) speech recognition is drawing increasing attention in recent years as it is a common situation in speech where speakers alternate between languages in the context of a single utterance or discourse. In this work, we propose Hierarchical Attention-based Recurrent Decoder (HARD) to build a context-aware end-to-end code-switching speech recognition system. HARD is an attention-based decoder model which employs a hierarchical recurrent network to enhance model’s awareness of previous generated historical sequence (sub-sequence) at decoding. This architecture has two LSTMs to model encoder hidden states at both the character level and sub-sequence level, therefore enables us to generate utterances that switch between languages more precisely from speech. We also employ language identification (LID) as an auxiliary task in multi-task learning (MTL) to boost speech recognition performance. We evaluate the effectiveness of our model on the SEAME dataset, results show that our multi-task learning HARD (MTL-HARD) model improves over the baseline Listen, Attend and Spell (LAS) model by reducing character error rate (CER) from 29.91% to 26.56% and mixed error rate (MER) from 38.99% to 34.50%, and case study shows MTL-HARD can carry historical information in the sub-sequences.