unsignedchar* image = ....;
for (int i=0; i<height; i++)
{
for (int j=0; j<width; j++)
{
// Calculate value
[...]
unsignedchar value= ......
[...]
// Finish calculating value
image[i*widthStep+j]= value;
}
}
If I remove the line image[i*widthStep+j]= value; or change value to a constant, it would be very fast. Wonder if there is anyway I can play around to optimize the above code?
Hi Zhuge, the line image[i*widthStep+j]= value; is where I want to optimize. It takes 80% of the processing time of this method. So I wonder if there is a faster way to write an unsigned in to an unsigned in array..
80%? Don't trust that value, wherever you got it.
At most, that line translates to two instructions (a mov and an add), so unless the calculation of the value is very simple, most of the time will be spent there. Besides, you'll have to do the actual assignment one way or another.
However, depending on what you're calculating and how you're doing it, it might be possible to vectorize the loop (i.e. calculate and assign several values at once) or make it easier for the compiler to do it (note that at least for g++, automatic vectorization is turned off at the standard optimization level -O2, so you have to compile with -O3).
Thanks Athar. I got the profiling by un-commenting that statement and measure the execution time.
Sorry I made a mistake, the instruction was image[ii*widthStep+jj]= value; where ii, jj are deduced from i, j. So they are not as "continuous" as i,j. I don't know about vectorization, Is it still possible to do so in this case? I'm using Visual Studio 2005..
If the pixels processed aren't adjacent to each other, vectorization might be difficult.
But again, that depends on the exact calculations you're doing. It might be possible to restructure the code so vectorization is still possible.
As for VS05, it's fairly old, so I don't know if it supports automatic vectorization. Try to see if you can enable "Streaming SIMD instructions" somewhere in the optimization settings. If the compiler can't do it for your loop, you'll have to fall back on doing it manually using SSE2 intrinsics.
Thanks Athar. I got the profiling by un-commenting that statement and measure the execution time.
If you commented out the image[index] = value line, then any compiler with common sense would realise that value isn't actually used, and if the calculation of value had no side effects, it would cut out the whole loop. Use a proper profiler.
Depending on the situation, you could speed that up using an OpenGL fragment (pixel) shader, but I doubt it's necessary, because I don't think that's the problem.