VLAD: vector of locally aggregated descriptors

Assuming the local descriptor to be d-dimensional, the

dimension D of our representation is D = k × d. In the

following, we represent the descriptor by vi,j , where the

indices i = 1 . . . k and j = 1 . . . d respectively index the

visual word and the local descriptor component. Hence, a

component of v is obtained as a sum over all the image descriptors:


螢幕快照 2016-05-27 下午7.16.05(1)

where xj and ci,j respectively denote the j

the descriptor x considered and of its corresponding visual

word ci. The vector v is subsequently L2-normalized by

v := v/||v||2 .

Experimental results show that excellent results can be

obtained even with a relatively small number of visual

th component of words k: we consider values ranging from k=16 to k=256.

本篇發表於 未分類。將永久鏈結加入書籤。



WordPress.com 標誌

您的留言將使用 WordPress.com 帳號。 登出 /  變更 )

Google photo

您的留言將使用 Google 帳號。 登出 /  變更 )

Twitter picture

您的留言將使用 Twitter 帳號。 登出 /  變更 )


您的留言將使用 Facebook 帳號。 登出 /  變更 )

連結到 %s