The following post was first written in Russian ~1 year ago.

In most big Ukrainian cities, when you rent a flat, you have to pay for the first month, for the last month (deposit) and to pay one-half to the real estate agent. So if you see an advertisement of flat for 12000 UAH (hryvnias) per month, you'll have to pay 12000*2 + 12000/2 = 30000 UAH during signing the contract. This is: $2x + \frac{x}{2}$, where x = price per month.

While I was browsing advertisements, I calculated that I would need to pay 30k UAH for a flat that is offered for 12k UAH per month. Then I found another advertisement, where a flat was offered for 12500 per month. How much should I pay at the beginning? I was too lazy for mental calculation (particularly, to divide 12500 by 2) and I asked myself, can I calculate this (re)using the information about previous flat? We do understand that with the rise of "per month" price, the "total" price is also rising. And if a cost (per month) risen by 500 UAH, how much we will add to the "total" price? It's easy to guess that with each hryvnia (1 UAH) in "per month" cost, the "total" price will rise by 2.5 UAH. And if the "per month" price risen by 500 UAH, then the "total" price will rise by 1250 UAH. If you got 12500 now instead of 12000, then the "total" price will not be 30000 (old result), but 31250 UAH (the old result plus 1250).

To calculate the "total" price from scratch, you have to do 3 operations -- multiplication, division, addition. To calculate it in my case, you have do multiply the difference by 2.5 and add it to the old "total" price. If you're too lazy to run the calculator, and calculate in your mind instead, the second way is easier.

And so, 2.5 is the derivative of the function $2x + \frac{x}{2}$. It shows how the result of function rises with the rise of (input) $x$.

$2x + \frac{x}{2} = 2.5x$, and the derivative of the $nx$ function is just $n$.

We have to plot a function graph for the $7x$ function, or to calculate a table of its values (also known as 'tabulating'). We write something like:

#!/usr/bin/env python3 for x in range(15): print (x, x*7)

And it works. But how to make it faster? If to think harder? We can calculate difference each time (this is 7) instead of calculating $7x$ each time. Because we calculate with the step of 1, in the 1..14 range, i.e., sequentially.

$7$ is the derivative of $7x$.

#!/usr/bin/env python3 a=0 for x in range(15): print (x, a) a = a + 7

Of course, addition works faster than multiplication.

Now for something harder: $x^2$:

#!/usr/bin/env python3 for x in range(15): print (x, x*x)

This can be a good interview problem -- try to make this function faster. Is it possible at all? We don't need to calculate $x^2$ each time, but to calculate a difference at each step.

Find a "difference function" using Wolfram Mathematica:

In[]:= f[x_] = x^2 In[]:= ExpandAll[f[x + 1] - f[x]] Out[]:= 1 + 2x

(ExpandAll reduces the expression, making it simpler/shorter.)

By the way, doesn't it remind you of anything? $2x$ is the derivative of $x^2$.

We write a such code:

#!/usr/bin/env python3 a=0 for x in range(15): print (x, a) a = a + 2*x + 1

It works in the same manner. But is it faster? Will rewrite it to pure C. For test, we will calculate the sum of all these values: $\sum_{i=1}^{...} i^2$

#include <stdint.h> #include <stdio.h> #include <time.h> int main() { uint64_t _max=0x1ffffffff; uint64_t sum=0; time_t start=time(NULL); for (uint64_t i=0; i<_max; i++) { sum=sum+i*i; }; printf ("sum=%lx\n", sum); printf ("time=%ld\n", time(NULL)-start); sum=0; start=time(NULL); uint64_t prev=0; for (uint64_t i=0; i<_max; i++) { sum=sum+prev; prev=prev+(2*i+1); }; printf ("sum=%lx\n", sum); printf ("time=%ld\n", time(NULL)-start); };

Finally:

sum=aaaaaaaeffffffff time=15 sum=aaaaaaaeffffffff time=12

On my ancient Intel Core 2 Duo CPU T9400 @ 2.53GHz locked at 2.13 GHz, first version ran for 15 seconds, second version -- 12. Maybe this is not very impressive result, but as they say, "byte here, byte there and oops, we ended up saving a lot".

Of course, it's faster because calculation of $2x+1$ is just a bit shift left, then increment. It's faster than multiplying $x$ by $x$.

In fact, the correct result is 0x2aaaaaaa4aaaaaaaeffffffff, but our code calculating the sum by modulo $2^{64}$ (width of a 64-bit register). Anyway, we don't need the correct result, we only need to get assured that two methods provide the same result.

Now what if we will calculate $x^3$?

In Wolfram Mathematica find a "difference function":

In[]:= f[x_] = x^3 In[]:= ExpandAll[f[x + 1] - f[x]] Out[]= 1 + 3x + 3x^2

Something familiar again. The derivative of $x^3$ is $3x^2$.

Of course, it works:

#!/usr/bin/env python3 for x in range(0, 15): print (x, x*x*x) print ("") a=0 for x in range(0, 15): print (x, a) a = a + 3*x*x + 1 + x*3

The pure C version again:

#include <stdint.h> #include <stdio.h> #include <time.h> int main() { uint64_t _max=0x1ffffffff; uint64_t sum=0; time_t start=time(NULL); for (uint64_t i=0; i<_max; i++) { sum=sum+i*i*i; }; printf ("sum=%lx\n", sum); printf ("time=%ld\n", time(NULL)-start); sum=0; start=time(NULL); uint64_t prev=0; for (uint64_t i=0; i<_max; i++) { sum=sum+prev; uint64_t tmp=3*i; prev=prev+(tmp*i + tmp + 1); }; printf ("sum=%lx\n", sum); printf ("time=%ld\n", time(NULL)-start); };

sum=fffffffa00000001 time=21 sum=fffffffa00000001 time=17

Not significant, but something. The second case has two multiplying operations, but it works a bit faster. It's probably because calculating of $3x$ is faster than to multiply $x$ by a random value -- $3x = 2x + x$, and $2x$ is a bit shift left -- usually, compilers optimize it in such way.

But it all works that good only for integers. It may be very different for floating point numbers.

First of all, if you use 'float' instead of 'int', out optimization tricks works even slower -- FPU may behave differently than CPU. Second (nasty) problem, rounding errors appears.

You can compare this with woord board cutting. Your customer bring a woord board to you and asking you to make a board of the same size. To 'copy' it. But you can't measure it for some reason, because someone took your measuring tape. (Maybe kids?)

You put customer's board to your (blank) board and cut a piece. Your customer asking for another one, but he's taking away his "reference" board, because he need it urgently for some reason. You put your "copy" to another (blank) board and repeat all. Your customer taking away the "copy", because he need it urgently too, leaving you with the "second copy". It is easy to see that if you continue to cut boards like this with such a customer, their length will gradually increase.

If the error at each step us 1-2mm (barely noticable), after 10 boards, the error may be 1-2cm. After 100 copies, the error may be 10-20cm, and this is completely unacceptable.

This is what we do here -- we add values to accumulator. The result is the (very) long chain of addition operations, and this is very bad for floating point numbers.

#include <math.h> #include <stdio.h> #include <stdint.h> #include <time.h> int main() { uint64_t _max=500000; float sum1=0; time_t start=time(NULL); for (uint64_t i=0; i<_max; i++) { float _i=(float)i; sum1=sum1+_i*_i*_i; }; printf ("sum1=%f\n", sum1); printf ("time=%ld\n", time(NULL)-start); float sum2=0; start=time(NULL); float prev=0; for (uint64_t i=0; i<_max; i++) { float _i=(float)i; sum2=sum2+prev; float tmp=3*_i; prev=prev+(tmp*_i + tmp + 1); }; printf ("sum2=%f\n", sum2); printf ("time=%ld\n", time(NULL)-start); double true_error=fabs((double)sum1-(double)sum2); printf ("true error=%f\n", true_error); printf ("relative true error=%f\n", (true_error / (double)sum1)); };

I use 'float' consciously (so that problems would surface early), but for relative error measuring -- 'double'.

sum1=15625012601280660504576.000000 time=0 sum2=15625137576170320035840.000000 time=0 true error=124974889659531264.000000 relative true error=0.000008

Error in the 6th place. However, the first result is also far for correct. According to Wolfram Mathematica, the correct value: 15624937500062500000000 (error in the 5th place).

Increasing _max to 1000000:

sum1=249984565122584337711104.000000 time=0 sum2=249976044312089352732672.000000 time=0 true error=8520810494984978432.000000 relative true error=0.000034

The error is already noticable -- in the 5th sign. But correct value (Wolfram Mathematica): 249999500000250000000000 (and again, the error in the 5th place).

Increasing _max to 10000000:

sum1=2493939293444969611153899520.000000 time=0 sum2=2487728496076280489639411712.000000 time=0 true error=6210797368689121514487808.000000 relative true error=0.002490

The error in 3rd place. Wolfram Mathematica: 2499999500000025000000000000.

Increasing _max to 100000000:

sum1=20282409603651670423947251286016.000000 time=0 sum2=10141204801825835211973625643008.000000 time=1 true error=10141204801825835211973625643008.000000 relative true error=0.500000

We can't talk about correctness here. Wolfram Mathematica: 24999999500000002500000000000000 (you can easily see some regularities in these numbers).

But we'll measure speed for _max=10000000000:

sum1=21267647932558653966460912964485513216.000000 time=32 sum2=166153499473114484112975882535043072.000000 time=40 true error=21101494433085539482347937081950470144.000000 relative true error=0.992188

Alas, works slower.

And now, how come that this "difference function" produced by Mathematica is *almost* like derivative,
but with some additional coefficients?

Let's find the "difference function" for $x^2$ gradually decreasing step from 1 to 0.001:

In[]:= f[x_] = x^2 In[]:= ExpandAll[f[x + 1] - f[x]/1] Out[]= 1 + 2x In[]:= ExpandAll[(f[x + 0.5] - f[x])/0.5] Out[]= 0.5 + 2x In[]:= ExpandAll[(f[x + 0.1] - f[x])/0.1] Out[]= 0.1 + 2x In[]:= ExpandAll[(f[x + 0.01] - f[x])/0.01] Out[]= 0.01 + 2x In[]:= ExpandAll[(f[x + 0.001] - f[x])/0.001] Out[]= 0.001 + 2x

You see, the "coefficient" at left is decreasing gradually, but $2x$ is left as is.

And what about $x^3$?

In[]:= f[x_] = x^3 In[]:= ExpandAll[f[x + 1] - f[x]/1] Out[]= 1 + 3x + 3x^2 In[]:= ExpandAll[(f[x + 0.5] - f[x])/0.5] Out[]= 0.25 + 1.5x + 3x^2 In[]:= ExpandAll[(f[x + 0.1] - f[x])/0.1] Out[]= 0.01 + 0.3x + 3x^2 In[]:= ExpandAll[(f[x + 0.01] - f[x])/0.01] Out[]= 0.0001 + 0.03x + 3x^2 In[]:= ExpandAll[(f[x + 0.001] - f[x])/0.001] Out[]= 1.*10^-6 + 0.003x + 3x^2

There are 2 polynomial terms at left that decreasing gradually, but $3x^2$ left as is.

And so, derivative function is "difference function" as if this step would be "infinitely small" and these coefficients would finally disappear. (This is why "infinitely small" are present in calculus.) But a step can't be zero, otherwise it will all make no sense.

Derivative is a limit of "difference function" with step approaching to zero:

In[]:= f[x_] = x^2 In[]:= Limit[(f[x + step] - f[x])/step, step -> 0] Out[]= 2x In[]:= D[x^2] Out[]= 2x In[]:= f[x_] = x^3 In[]:= Limit[(f[x + step] - f[x])/step, step -> 0] Out[]= 3x^2 In[]:= D[x^3] Out[]= 3x^2

*Limit*, because the step will never be zero, it only approaching it.

List of my other blog posts. My company.

Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.