[PATCH] USB: add zr364xx V4L2 driver
- From: Trent Piepho <xyzzy (at) speakeasy.org>
- Date: Wed, 14 Feb 2007 14:57:09 -0800 (PST)
On Wed, 14 Feb 2007, Alan wrote:
> > > My comment is not very good, in fact on some cameras I need to swap the bytes
> > > to have correct JPEG data (so this is not an endianness issue I think).
> > > Maybe there is a macro to swap bytes in a buffer? I cannot find it.
> >
> > Sorry, there's a swab32, but no swab16. I misremembered.
>
> Its just called "swab" for 16bit values and is a gcc builtin/string
> function.
The C library function swab() isn't usable in the kernel, as it's not part
of the kernel's C lib.
Gcc doesn't have a builtin swab/bswap16 yet, maybe it will someday:
http://gcc.gnu.org/ml/gcc-patches/2006-07/msg00496.html
The kernel does have swab64, swab32, and yes, swab16 macros! They're all
defined in the same place in asm/byteorder.h. There are architecture
optimized versions for some cases, but not for x86 and swab16 as gcc
supposedly does ok (or does it? *).
There are three versions of the swabXX functions, a normal one, one that
takes a pointer to the data, and one that swaps the data in-place. The
more specialized versions might be faster in some cases. I don't see any
version of that swaps an array of data, like C-lib swab(), which be a lot
more useful that swab16 vs swab16p vs swab16s, IMHO.
uint16_t *p)
*p = swab16(*p); // one way
*p = swab16p(p); // maybe better
swab16s(p); // best
>> + /* swap to good indian if camera needs it */
>> + if (cam->method == 0)
>> + for (i = 0; i < BUFFER_SIZE; i += 2) {
>> + swap = cam->buffer[i];
>> + cam->buffer[i] = cam->buffer[i + 1];
>> + cam->buffer[i + 1] = swap;
>> + }
+ /* swap to good endian if camera needs it */
+ if (cam->method == 0)
+ for (i = 0; i < BUFFER_SIZE/2; i++) {
+ swab16s((uint16_t*)cam->buffer +i);
+ }
or
+ /* swap to good native american if camera needs it */
+ if (cam->method == 0) {
+ uint16_t *buf = cam->buffer;
+ for (i = 0; i < BUFFER_SIZE/2; i++)
+ swab16s(buf++);
+ }
*** Does gcc really optimize swab16() well?
Compiled this with gcc 4.0.1 for athlon (using 2.6.20's compiler options):
void bar(uint16_t *p)
{
int i;
for(i=0;i<127;i++)
swab16s(p + i);
}
Resulting asm code does not look that good to me. gcc does a copy, two
shifts, and then an or to effect the swab16. Surely rotating a 16-bit
register would be faster? There shouldn't be any partial register stalls.
I don't see why gcc decides to add two the pointer, then offset it by -2
when it uses it. What's the point of that?
bar:
pushl %ebx #
movl $1, %ebx # %ebx = i
leal 2(%eax), %ecx # %ecx = p+2, why add 2? just use eax
.p2align 4,,7
.L21:
movzwl -2(%ecx), %eax # Have to offset by -2
incl %ebx #
movl %eax, %edx # do the swab16
sall $8, %edx #
shrl $8, %eax #
orl %eax, %edx #
movw %dx, -2(%ecx) #
addl $2, %ecx # why not skip this and use (%ecx,%ebx,2)
cmpl $128, %ebx # counting from -128...0 would avoid this
jne .L21
popl %ebx # used too many registers
ret
Anyway, surely this would be faster:
bar:
movl $-128, %ecx # start at -128, count to 0
add $256, %eax # (p+256)[-128] == p[0]
.p2align 4,,7
.L21:
movzwl (%eax,%ecx,2), %edx
rorw $8, %dx # all that's needed for swab16
movw %dx, (%eax,%ecx,2)
inc %ecx
jnz .L21
ret
Ok, the loop optimization is a little hard for gcc, but isn't it supposed
to be able to figure out "rorw $8, %reg"?
--
video4linux-list mailing list
Unsubscribe mailto:video4linux-list-request (at) redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/video4linux-list