Automatic video summarization aims to provide brief representation of videos. Its evaluation is quite challenging, usually relying on comparison with user summaries. This study views it in a different perspective in terms of verifying the consistency of user summaries, as the outcome of video summarization is usually judged based on them. We focus on human consistency evaluation of static video summaries in which the user summaries are evaluated among themselves using the consistency modelling method we proposed recently. The purpose of such consistency evaluation is to check whether the users agree among themselves. The evaluation is performed on different publicly available datasets. Another contribution lies in the creation of static video summaries from the available video skims of the SumMe datatset. The results show that the level of agreement varies significantly between the users for the selection of key frames, which denotes the hidden challenge in automatic video summary evaluation. Moreover, the maximum agreement level of the users for a certain dataset, may indicate the best performance that the automatic video summarization techniques can achieve using that dataset.