
Let's talk about Realtime
--------------------------------------

As of this point we haven't really done anything in realtime. Let's now 
show the realtime nature of QNX.

How does QNX handle events?

Review the section in the System Architecture Guide on realtime performance.
It is contained in the chapter about the Microkernel. Look at the charts
detailing interrupt and scheduling latency.

What does all this really mean?

Essentially what happens is that the microkernel schedules all processes 
based on the following order from highest to lowest:

          - highest priority interrupt routine runs to completion
          - other interrupts finish based on their priority
             (interrupts are fully nested and prioritized)
          - the highest priority process that is READY to run will run

Note that as far as the microkernel is concerned, even system processes such
as Proc32, Dev, Net etc. all follow the same rules as your own processes. There
is nothing special about these processes according to the microkernel.

Let's look at a real world example.
------------------------------------------------

Problem:
Suppose I need to poll a device each millisecond (1000 Hz) and read in 5
values from some software ports. 
Then I need to do some floating point math on the input values.

If the readings don't happen on time then the device could fail. I need them
happening exactly at 1 millisecond intervals.

Let's look at some code... 
edit the program 'problem1.c'

Let's see what it does:
    - it creates a timer with a 1 msec repeat rate to return a proxy
    - upon reception of the proxy it reads 5 values from the 8254 chip
      and then does some math on them.
    - repeat

First off, in order to have a timer with a granularity of 1msec we need to
change our ticksize to be less than this value.

** Change your ticksize to be 0.5 milliseconds. ** This is important!
e.g.
   ticksize 0.5        (you must be root in order to do this)

Now let's see what the exact timing of the timer chip is:
type  'ticksize -e'
It will print something like:
       Tick size: 499504 nanoseconds.
This is the true value of the tick. It is almost 0.5 milliseconds but not
quite.. close enough for our purposes based on the granularity of the timer
chip.

You should be on a Pentium or better machine in order to continue.. a 386 or
486 may not have enough CPU cycles to keep up with these interrupt rates.
A Pentium 100Mhz is a good minimum system.

Make the program and run it.  (be sure to run it as root ... it needs root
privilege in order to execute the inp() instructions)

e.g. problem1   <ENTER>

It will run on the current console. Of course there is no output from the
program.

So how do we know what our timings are really like? Is our tick really
happening?  What is happening?

There are several possible approaches to analyze what is going on. Different
engineers each may have their favourite approaches:

1. As part of your loop, outp() a byte to a serial port where you have a
   scope hooked up. Then you can watch for repeat rates and variances on
   the scope.

2. Use a logic analyser.

3. Use an instrumented kernel that records all interrupt/context switching.

4. Use the precise timing values of the Pentium chip (the rdtsc instruction).

For the purposes of this evaluation we will use option #4. Option #3 is
also possible .. we'll talk about that approach a little later.

The Pentium processor has a free running 64-bit counter that increments at
the granularity of your clock speed. For example if I have a 100Mhz Pentium
then each tick of the timer represents 1/100M or 10 nanoseconds.

Before we use this counter let's take a further look into what our problem1.c
program was doing.

   - the timer_create code sent a message to the Process manager (Proc) telling
     it to make a timer for me.
   - once the timer is created then the microkernel is the component that
     will do the Trigger on my proxy every 1 millisecond.
   - after the Trigger, as problem1 is already in the Receive loop, it
     will bump out of the Receive and be marked as READY. 
   - if problem1 is the highest priority ready to run process then it will run
     until it blocks again (until we call Receive).

So what are possible things that could affect our timing?
A. any hardware interrupts will be handled first
B. any higher priority READY processes will be allowed to run first

The interrupt handlers that QNX provides in their drivers are designed to be
as fast as possible. We take great pains in avoiding any disabling of 
interrupts (cli/sti) as well as keeping the handlers short. For example, a 
network driver will read bytes from the network chip upon an interrupt within
the handler and then will do further processing (assemble/parse the packet)
at process level time.

Higher priority processes are easier to control. In QNX you can set your
priority to be as high as 29. So we can sit above the filesystem, above the
network, above the serial ports (Dev.ser) etc...

But wait a minute... Proc32 is at priority 30. It will always preempt me
whenever it needs to run.
Why would Proc32 ever need to run?
Proc32 handles messages from other processes when they ask it for these
services:
         - system information (sin)
         - process information (sin/ps)
         - create/destroy a timer
         - spawn a new process, fork, kill a process
         - memory operations (mmap, allocate chunks of memory to its 
           free pool etc.)  

Any time Proc32 is scheduled to run when it is asked to do something, Proc32
can preempt me and then my timing could be off.
What can we do?

The answer is to drop the priority of the Process manager down so that we
can sit above it.

First let's illustrate what happens if you don't drop Proc32 down in priority
and your realtime process is preempted by Proc.

Let's take our problem1.c program and add some timing analysis to it.
We will use Trace calls in order to log variances.

Take a look at problem1_analyse.c
(You need to be on a Pentium class machine to continue with this analysis)

This uses the rdtsc instruction to snap the 64-bit free running Pentium
counter.
Make the program.
Take a look at the usage message:

type 'use problem1_analyse'. you will see:

problem1_analyse [ -u uprate (nsec)  -p priority  -m max_variance (nsec)]

priority   = priority to run at (0 to 29) Default=10
e.g. 
problem1_analyse -p 29 -u 499504 -m 1000000

Our ticksize is still 499504 nanoseconds. 
(make sure this is still true... ticksize -e will tell you this)
Run it as root at priority 2:

problem1_analyse -p2 -u 499504 -m 1000000
We've told it to run at priority 2, do some inp and floating point calculations
every 499504 nanoseconds.
If the variance on timing is out by 1000000 nanoseconds (1 usec) then we
write a Trace message.

The program prints out information on your Pentium processor and then starts
doing the repetitive calculations.
Note in the code that we do Trace calls in order to log variances. As well,
every 10000 iterations (log_rate) we will write a Trace call showing the 
current count and sample data.

In order to see the trace output we use the traceinfo program.
Switch to a free console.
Make sure you're in the example6 directory.

type:
       traceinfo -e trace.txt -M0xfffff

This tells traceinfo to only print trace events for code 0xfffff (which is
our TRACE_TEMPORARY code) and to use the text string described in trace.txt

So, our ticksize is 499504 nanoseconds.
problem1_analyse is running at priority 2.

From this directory at a shell prompt type:
       traceinfo -e trace.txt -M0xfffff

You will see all the Trace() calls made by the program.
Something like:

Sep 12 15:44:46 3 fffff001 counter=10666799 lastoverrun was 12 counts ago:
   diff= 0.002938905, var= 0.002454405, bench= 0.000484500
Sep 12 15:44:46 3 fffff001 counter=10666810 lastoverrun was 11 counts ago:
   diff= 0.002619040, var= 0.002134540, bench= 0.000484500
Sep 12 15:44:46 3 fffff001 counter=10666840 lastoverrun was 30 counts ago:
   diff= 0.002571593, var= 0.002087093, bench= 0.000484500

What does this mean... well, the first entry indicates that at counter
10666799 we had an overrun of 0.002454405 seconds... we were supposed to
be at bench=0.000484500   (our half a millisecond update rate).

Kill off the existing problem1_analyse program (hit ctrl-C on its console
or type 'slay problem1_analyse' from another console).

Now let's run it at a higher priority.

Run it as root at priority 29:

problem1_analyse -p29 -u 499504 -m 1000000

Switch to a shell prompt on another console and type:
       traceinfo -e trace.txt -M0xfffff

Your output may now look like:

Sep 12 16:20:41 3 fffff002 counter=10000 Values are good: diff= 0.000498681,
     var= 0.000014533, bench= 0.000484149
Sep 12 16:20:45 3 fffff001 counter=16950 lastoverrun was 16950 counts ago:
     diff= 0.001869709, var= 0.001385560, bench= 0.000484149
Sep 12 16:20:45 3 fffff001 counter=17520 lastoverrun was 570 counts ago:
     diff= 0.002783637, var= 0.002299489, bench= 0.000484149
Sep 12 16:20:46 3 fffff002 counter=20000 Values are good: diff= 0.000499494,
     var= 0.000015345, bench= 0.000484149

We're getting better... overruns are occurring every 500 to 1000 iterations
now instead of every 15 iterations.
Proc at priority 30 is still in our way.

What we need to do is drop Proc down in priority to 28... then we can run
our program at priority 29 and have exact timings.

Let's do this now...

Dropping the Priority of the Process Manager
--------------------------------------------
To do this we must customize our boot image. (This is examined in more depth
in example 11). For the cases of this realtime section we will quickly make
an change to our boot image rather than go into detail.

login as root

cd to /boot/build
type 'ls'

you will see a number of files here. These are text files that we can edit.
*** check this *** You should have a file called 'hard.1'

Edit the file. It will look like:

sys/Proc32
$ Proc32 -l 1 

sys/Slib32
$ Slib32

sys/Slib16
$ Slib16

/bin/Fsys
$ Fsys 

/bin/$d
$ 1000 $d $p

/bin/mount
$ 1000 mount -p $h $h$t /

/bin/sinit
$ sinit TERM=qansi

What we want to do is add an extra option to the Proc line.
Change:

sys/Proc32
$ Proc32 -l 1 

to:

sys/Proc32
$ Proc32 -l 1 -P 28

Save the file.

Now we'll build a new OS image.
Type the following:

cd /boot
make b=hard.1 

(note...if you do not have an IDE drive then you may need to specify your
driver on the 'make' command line. (driver defaults to Fsys.eide))
  e.g. make b=hard.1 d=Fsys.aha7scsi
If you have a non-IDE drive then check out example 11 first...

You will have made a new boot image.. you will see build information printed
to the screen.

Now we want to:
     - make a copy of our current boot (in case of problems)
     - copy in the new boot image

First we'll make a copy of our existing boot environment:
Type the following:
   cp /.boot /.altboot
   cp /etc/config/sysinit.1 /etc/config/altsysinit

Now we'll make the new image active.
Type the following:
   cp /boot/images/hard.1 /.boot

Wait for 5 seconds and then reboot your machine.

Two things will happen:

1. you will boot successfully
   if you do so, login then type 'sin'.
   check the priority of Proc32. It should be at 28.
   If you have success, proceed to the 'Let's get Really Realtime' section
   below.

2. you can't boot   (hangs or boots unsuccessfully)
   Check first that you had removed the floppy from the drive before trying
   to reboot.
   We can boot using our backup image..

   Reboot the machine.
   When you see the message about 'Press ESC for alternate boot' hit the
   ESC key.
   The /.altboot image will load (we copied our original one to it) and 
   it will execute /etc/config/altsysinit  (which was a copy of our original
   sysinit file)
   After doing this you should be able to login. 
   Go back to the beginning of the 'Dropping the Priority of the Process
   Manager' section and try again. 
   ** don't do the steps about copying the /.boot to /.altboot. Our /.altboot
   is the good image.
   If you still have problems, check out example 11 now. Give us a call if
   things aren't working out.

Let's get Really Realtime
------------------------------------

So, Proc32 is now at priority 28. We can run programs at priority 29.

Let's redo our problem1_analyse example at priority 29.

Set your ticksize to 0.5 msecs.
e.g. ticksize 0.5

(check by typing 'ticksize -e' to ensure we are at 499504 nanoseconds)

Run problem1_analyse as root at priority 29:

problem1_analyse -p29 -u 499504 -m 1000000

Switch to a free console.
Make sure you're in the example6 directory.

type:
       traceinfo -e trace.txt -M0xfffff

You may still see some overruns occasionally. Why?
If you examine the timing differences you will see that sometimes you will
see a variation of 2 times the desired time. This is because we chose a timer
value (499504) which is an exact multiple of the ticksize.
We want to ensure that our timer to expire is slightly less than the exact
ticksize of the machine. The kernel calculates our expiry time for us and
will trigger a repetitive timer if it is less than but not equal to the
current time.

Try the test again with a nanosecond value of 10 or 20 nanoseconds less than
the exact ticksize.
e.g.
problem1_analyse -p29 -u 499490 -m 1000000

Switch to a free console.
Make sure you're in the example6 directory.

type:
       traceinfo -e trace.txt -M0xfffff

You should now see no overruns... all calls are being made within 1ms
intervals. You will just see the entries indicating good values, such
as:

Sep 12 16:24:11 3 fffff002 counter=10000 Values are good: diff= 0.000498662,
   var= 0.000015054, bench= 0.000483609
Sep 12 16:24:16 3 fffff002 counter=20000 Values are good: diff= 0.000499264,
   var= 0.000015655, bench= 0.000483609
Sep 12 16:24:21 3 fffff002 counter=30000 Values are good: diff= 0.000499505,
   var= 0.000015896, bench= 0.000483609
Sep 12 16:24:26 3 fffff002 counter=40000 Values are good: diff= 0.000500859,
   var= 0.000017250, bench= 0.000483609
Sep 12 16:24:31 3 fffff002 counter=50000 Values are good: diff= 0.000524968,
   var= 0.000041360, bench= 0.000483609
   
So what did we learn about realtime?
-----------------------------------------
1. you can run your highspeed code at priorities higher than Proc.
2. if you are at high priority, make sure you don't use calls that will
   block on lower priority processes.
   For example, a printf() will block on Dev in order to print to Dev.con
   If you use a printf then you are dependent on Dev.
   Only use calls that are non-blocking:
         - write to shared memory 
         - use Trace calls for analysis
3. Using regular POSIX functions (timer_create) you can do high level
   realtime work. Repetitive timers are triggered by the kernel, not done
   by Proc, so will happen at the lowest level in the kernel.

A further note:

Most systems don't poll at high speeds. They are interrupt driven by
hardware. The same design considerations exist. You want to keep your
interrupt handler nice and short, and simply do the work that you need
to do at interrupt time.
For further work, return a proxy from your handler and do more processing
at process-level time.

