Assuming the local descriptor to be d-dimensional, the
dimension D of our representation is D = k × d. In the
following, we represent the descriptor by vi,j , where the
indices i = 1 . . . k and j = 1 . . . d respectively index the
visual word and the local descriptor component. Hence, a
component of v is obtained as a sum over all the image descriptors:
where xj and ci,j respectively denote the j
the descriptor x considered and of its corresponding visual
word ci. The vector v is subsequently L2-normalized by
v := v/||v||2 .
Experimental results show that excellent results can be
obtained even with a relatively small number of visual
th component of words k: we consider values ranging from k=16 to k=256.