Я пытаюсь изменить размер (уменьшить) серое 8-битное изображение с коэффициентом 36 = 6x6. Я хочу использовать инструкции ARM NEON. Мой код выглядит следующим образом:
//I deinterlace 3 8-pixel on the first line (named line0) and
//I add them. So I have resized horizontally by a factor 3
//Line 0
vld3.u8 {d0, d1, d2}, [line0]!
vaddl.u8 q3, d0, d1
vaddw.u8 q3, q3, d2
vld3.u8 {d3, d4, d5}, [line0]!
vaddl.u8 q4, d3, d4
vaddw.u8 q4, q4, d5
//I do this for six successive lines
//So virtually, I have reduced by a factor 3x6=18
//Line 1
vld3.u8 {d0, d1, d2}, [line1]!
vaddw.u8 q3, q3, d0
vaddw.u8 q3, q3, d1
vaddw.u8 q3, q3, d2
vld3.u8 {d3, d4, d5}, [line1]!
vaddw.u8 q4, q4, d3
vaddw.u8 q4, q4, d4
vaddw.u8 q4, q4, d5
.....
//Line 5
vld3.u8 {d0, d1, d2}, [line5]!
vaddw.u8 q3, q3, d0
vaddw.u8 q3, q3, d1
vaddw.u8 q3, q3, d2
vld3.u8 {d3, d4, d5}, [line5]!
vaddw.u8 q4, q4, d3
vaddw.u8 q4, q4, d4
vaddw.u8 q4, q4, d5
//At this point, I want to add two adjacent pixels
//to give my last factor by 2.
//I also want to merge two successive q registers
//In other words, I want to do the following:
/*
q5[0] = q3[0] + q3[1]
q5[1] = q3[2] + q3[3]
q5[2] = q3[4] + q3[5]
q5[3] = q3[6] + q3[7]
q5[4] = q4[0] + q4[1]
q5[5] = q4[2] + q4[3]
q5[6] = q4[4] + q3[5]
q5[7] = q4[6] + q3[7]
*/
//This code doesn't seem to work as expected...
vpaddl.u16 q3, q3
vpaddl.u16 q4, q4
vext.u16 q5, q4, q3, #4
//Now, I want to divide by 36.
//In other words, I want to do the following:
/*
d0 = q5 / 36
*/
//The best I can do is to divide by 32
vshrn.i16 d0, q3, #5
Кажется, у меня есть две проблемы: Как я могу сложить соседние значения регистров q? Как разделить на 36?