Use script ``poly1.py`` to check how much time it takes to evaluate the next polynomial:
y = .25*x**3 + .75*x**2 - 1.5*x - 2
with x in the range [-1, 1], and with 10 millions points.
The expression below:
y = ((.25*x + .75)*x - 1.5)*x - 2
represents the same polynomial than the original one, but with some interesting side-effects in efficiency. Repeat the computation for numpy and numexpr and draw your own conclusions.
The C program ``poly.c`` does the same computation than above, but in pure C. Compile it like this:
gcc -O3 -o poly poly.c -lm
and execute it.
Be sure that you are on a multi-processor machine and repeat the last computation in poly1.py but increasing the number of threads one by one (change the number in the ``for nt in range(1):`` loop).
With the same multi-processor, recompile the above poly.c, but with OpenMP support:
gcc -O3 -o poly poly.c -lm -fopenmp # notice the new -fopenmp flag!
and execute it for several numbers of threads:
OMP_NUM_THREADS=desired_number_of_threads ./poly
Compare its performance with the parallel numexpr.
With the previous examples, compute the expression:
y = x
That is, do a simple copy of the `x` vector. What's the performance that you are seeing? How does it evolve when using different threads?
Look into the sources of carray-eval.py and run it. For the first expression evaluation, i.e.:
((.25*x + .75)*x - 1.5)*x - 2
Repeat your reasoning with the second expression:
((.25*x + .75)*x - 1.5)*x - 2 < 0
Look into the sources of 'carray-ctable.py' script and run it.
Enter the ipython console and generate the big `t` ctable (just copy and paste the appropriate statements from the previous 'carray-ctable.py').
ca.set_nthreads(your_number_of_threads)