Abstract—Image captioning helps the reader to understand the context that is to be conveyed. It is highly used in the applications like finding the missing or hidden patterns, in guiding the self driving cars, in robots which are automated using Artificial Intelligence and Machine Learning. It is also used to scrutinize large amount of images. This paper includes different image captioning models by using the concept of deep learning. Here we discuss about the frameworks like CNN-CNN, CNN-RNN and some more with their implementation overview with their advantages and disadvantages.