
QNX 4.24 Raw Network Packet Interface (PRELIMINARY)

***WARNING***
	This is a system level interface!  All code/data
	is trusted -- there are no integrity checks.  It
	is _very_ easy to corrupt and bring down the entire
	system.


Introduction

With QNX 4, the Net manager is responsible for managing
network packets.  Net needs to interface with the network
drivers, Proc, and with user applications.  The interface
between Net and a user application is the focus of this
document and is presented with the aid of code snippets
from the examples if_info and netraw.

Allowing a user application access to receive and transmit
'raw' packets enables the design of new protocols, or the
implementation of an existing proprietary protocol.  It is
encouraged that new protocol designs be
designed on top of an existing protocol (eg. IP or even UDP)
eliminating the need for using the 'raw interface'.  The
intentions for the 'raw interface' is to accomodate the rare
needs of non-standard networking to existing devices/services.

To create a raw net application, you will need the header files
net41msg.h net_nq.h and net_drvr.h along with the object file 
netq_gen.o (these can be found under the examples).  A raw 
application must be compiled and linked small model with stack 
checking disabled (see the example Makefile).  Privity 1 (-T1)
is required for transmitting (disable/enable).


Drivers and Adaptors

The Net manager doesn't really know a lot about the physical
adaptors used to transmit (tx) or receive (rx) packets.  Each
adaptor must have a driver controlling it before Net can use it.
Also, each adaptor (or driver) must be on a unique logical LAN id.
This LAN id is the main key that Net uses to identify the interface.
Physically, 2 adaptors may share the same media (be part of the same
LAN), but Net perceives them as being 2 unique logical LAN's.

	NOTE: by default, Net (and thus applications) are only given
	packets destined for the adaptor including broadcasts.
	To recieve all packets transmitted on the network, regardless
	of where they are destined, the driver must support a promiscuous
	mode, usually -P option.  Additionally, to recieve packets with
	a mult-cast addresses, the -M option must be used, if supported
	by the driver.

Getting Adaptor Information

Acquiring information about network adaptors involves sending Net
2 types of messages: _NET_DRVR_INFO and _NET_RNODEMAP (defined in
net41msg.h).

The _NET_DRVR_INFO message is used to find information on network
drivers.  The most interesting pieces of information returned are
the presence of drivers, their LAN ids and what type (sub-type is
rarely interesting) of drivers they are (types and sub-types are
defined in net_drvr.h).

The _NET_RNODEMAP message is used to find information on the network
adaptors themselves.  The most interesting piece of information
returned is the physical address of the adaptor.

NOTE: Both of these messages return variable length data which 
starts at an empty array.  These empty arrays are defined just 
for the offset into the message, so it is important to make the
reply buffer larger than the reply message structure.

When matching information from these 2 messages, you need to key
off the LAN id (lan_addr field respectivly).

See the example "if_info"


Interfacing Policy

The raw packet interface focuses primarily on performance and not ease
of use, resulting in a slightly complex interface which yields little
overhead.  To achieve high performance, Net and a raw application share
not just data, but also code.  This method introduces a system stability
issue -- a raw application can easily corrupt Net's data space and also
cause Net to execute 'bad' code -- resulting in network traffic to cease
(best case) or become corrupted (worst case).  

Note that because code/data is shared, the application must permit access 
to its segments so Net can alias them into its address space.  This is done 
via qnx_segment_arm() and must be performed before registration begins.  Net
will give the application these aliases in the reply.  It is very important 
to make all far pointers referenced by Net using Net's ds alias.  It is very 
important to make all far pointers referenced by Net using Net's ds alias.

Also note, that once your data segment is armed, it cannot grow, so _all_
significant data objects should be allocated before calling qnx_segment_arm().
(A free(malloc(XXX)) is useful to pre-grow your heap before arming it.)


Shared Code

A raw application specifies 3 functions for Net (and only Net, the application
itself should never call these) to execute.  These routines should be very
small and not call any blocking or kernel calls.

The first 2 are for receiving packets, which is done in 2 steps.  First, 
Net will far call an allocation routine specifying the size of the packet 
being received.  This function should not call malloc(), but instead acquire
a buffer from a pre-allocated list of buffers.  The return value is the
address of the buffer to copy the packet into (a return of 0:0 will cause
Net to simply drop the packet) directly from the adaptors receive buffer.
Net does not buffer any raw packets (or duplicate for multiple raw applications)
to save the overhead of the memory copy -- it just looks at the protocol and
determines what should be done with the packet.  It is the responsibility of
the raw application to buffer recieved packets.  After Net has copied a packet
into the specified buffer, it will far call the second routine which indicates
the buffer now contains a valid recieved packet.  This routine should simply
put the buffer on a recieve queue and return.  The return value of this
function is a proxy id, which Net will Trigger() giving the raw application
sychronous notification (via Recieve()) that a packet was recieved.

The third function is used to indicate that an attempted transmission is
complete.  The return value of this function is also a proxy id, which Net 
will Trigger().


Shared Data

There are 2 types of memory objects which are shared.  The first type is
user defined and is used for reception.  As mentioned above, this memory
is accessed by Net by far calling into the raw application.  The second
memory object is used for transmission and is managed by Net.  This object
is known as a qpkt, and is really just a header which provides such 
information as which adaptor to use to transmit and pointers to the actual
packet data (in the raw application's data space). 

There are 3 qpkt lists shared between Net and a raw application.  Net provides
2 of these lists -- a free list which contains recycled qpkt's, and a 
transmit list (queue) which contains qpkts ready for transmission.  The third qpkt
list is specified by the raw application, and contains qpkts which have
already been transmitted (or attempted).  These lists are accessed by the
functions q_put/get_first/last() from the netq_gen.o object file (see the TX'ing
Raw Packets section).


Proxies

Proxies are used between Net and the application for synchronous notification
of events.  The application is given a proxy id that is owned by Net.  This
proxy should be Trigger()'d after putting a qpkt on the transmit list only
if the q_put_last() function returned 1 -- meaning there is now 1 entry and
Net was not transmitting.  Trigger()'ing every time is not necessary.

The application is expected to acquire at least one proxy for Net to Trigger(),
after the tx_done_fn and rx_done_fn functions are called, specified by their
return value.  While a single proxy can be used, using two makes the code easier.


Raw Application Registration

Before a an application can tx/rx raw packets, it must register with
the Net manager.  Registration is done by sending a message of type
_NET_RAW_REG to Net.  This message contains:

struct _net_raw_reg {
    short int		type;				_NET_RAW_REG
    short int		subtype;			unused
	short int		rpt_len;			0-4; valid parts of rtp[]
	short int		zero1;
    unsigned char	rpt[_RPT_LEN];		rx protocol type; upto 4 bytes
    short int		rx_lan_addr;		specific lan; all lans=(-1)
	short int		zero2;
									NOTE: functions must have the same segments
	short unsigned	rx_alloc_fn_seg;	far pointer to 
	long			rx_alloc_fn_ofs;		rx buffer allocate function
	short unsigned	rx_done_fn_seg;		far pointer to
	long			rx_done_fn_ofs;			rx done function
	short unsigned	tx_done_fn_seg;		far pointer to
	long			tx_done_fn_ofs;			tx done function
    short unsigned	ds;					data segment to be shared with Net 
    short int		zero3;
	short unsigned	txd_q_seg;			far pointer to
	long			txd_q_ofs;				tx'd qpkt list head
    long			zero4;
    long			zero5;
};



The 3 far function pointers (rx_alloc_fun_*, rx_done_fn_*, and tx_done_fn_*)
were discussed in the Shared Code section.  The far qpkt pointer (txd_q_*)
was discussed in the Shared Data section.  What hasn't been discussed is the
rpt* and rx_lan_addr fields.

The rpt field specifies which protocol type you want to rx.  This protocol 
is usually specified using 2 bytes, however 0-4 bytes may be used -- the rpt_len
field specifies how many bytes were used in rpt[].  

For example, to rx the IP protocol (0x0800), rpt_len would be set to 2 and 
rpt would be set to 0x0800.  To rx the ARP protocol (0x0806), rpt_len would
again be set to 2 and rpt would be set to 0x0806.  To rx both IP and ARP, 
two separate messages could be used (register twice), or because both protocols
start with 0x08 you could register once with rpt_len set to 1 and rpt set to
0x08.  A rpt_len of 0 means all packets; no filtering on type.  

The rx_lan_addr field specifies which adaptor you want to rx on.  This can
be a specific lan or the wild card _ANY_RLA (-1) which specifies all adaptors.
You cannot specify 2 out of 3 adaptors in one message -- you could either
register twice (on the desidered 2 adaptors), or register once with rx_lan_addr
set to _ANY_RLA and discard packets rx'd on the third adaptor. 

Upon successful registration, Net will reply to the application with
the following response:

struct _net_raw_reg_reply {
    short int		status;			EOK
    short unsigned	handle;			opaque handle 
    short unsigned	cs;				unused
	short unsigned	zero1;
    short unsigned	ds;				alias of _net_raw_reg.ds	
	short unsigned	zero2;
    short unsigned	free_q;			free qpkt list head
	short unsigned	zero3;
    short unsigned	work_q;			transmit qpkt list head
	short unsigned	zero4;
    short unsigned	net_qseg;		segment for qpkt lists
	short unsigned	zero5;
    short unsigned	net_mid;		proxy id to get Net to transmit
	short unsigned	zero6;
};


The handle field is like a file descriptor -- its used to deregister
with Net (like a close()), and it is used to tx a packet (like a write()).

The qpkt lists (free_q and work_q) were discussed in the Shared Data section.
The ds and cs fields are aliases to the segments specified in the registration
request message.  The cs is not needed, but the ds field is important -- all
shared data is specified as far, and any data given to Net, must be addressed
with this alias and not the applications ds.  The ds field is important -- all
shared data is specified as far, and any data given to Net, must be addressed
with this alias and not the applications ds.

The net_mid field is a proxy owned by Net, which the application Trigger()'s
to let Net know something was put on its transmit queue (work_q). 

RX'ing Raw Packets

After registration, rx'ing packets is quite seemless.  Because of the
far called functions (alloc a buffer and rx_done) an application simply
needs to wait for Net (via rx_done) to trigger a proxy letting the application
know a packet was recieved.  The application should be in a Recieve() loop
at this point, recieve the proxy, and then take the packet off a list for
processing.

As an example, lets use the following to store recieved packets:


typedef char mac_t[6];
typedef struct _rx_pkt rxpkt_t;

#define RX_LEN	1514	/* sizeof data field -- largest pkt	*/
						/* to receive (1514 is ethernet MTU	*/
						/* but may be set smaller to save	*/
						/* memory.							*/

struct _rx_pkt{
	rxpkt_t	*next;		/* linked list						*/
	int		lan;		/* interface identifier				*/
	mac_t	from_addr;	/* sent from this remote address	*/
	mac_t	to_addr;	/* sent to this address				*/
	int		len;		/* length of data received			*/
	char	data[RX_LEN];		 
	};


We are going to need 2 rxpkt_t lists -- a free list and an rx list.
The free list contains pre-malloc'd rxpkt_t's which may be recycled.
The rx_alloc_buf_fn() will drain the list as needed, and the application
will return them once the rx packet has been processed.
The rx list will need a tail pointer, so that when adding to the rx list,
newly rx'd packets can be put at the end so packets will be processed in
rx'd order.  The rx list is produced by the rx_done_fn(), and is consumed
by the application for processing (and then back to the free list).
To manage these lists, we will need the following:

rxpkt_t *rxhead, *rxtail;	/* list of rx'd pkts -- head & tail	*/
rxpkt_t *rxfree;			/* list of free pkts				*/
rxpkt_t	*rxcur;				/* pkt being rx'd -- not on any list
										and only used by Net callbacks	*/

Before registering with Net, we must pre-malloc the free list with
the following:

/* called during initialization before registering with Net	*/
/* these must be in the ds which will be registered with Net */
int
rx_init( int n )
{
	int i;
	rxpkt_t *r;
	
	for(i=0; i<n; i++){
		r= (rxpkt_t *)malloc(sizeof(rxpkt_t));
		if(!r)
			return -1;
		memset(r, 0, sizeof(rxpkt_t));
		r->next= rxfree;
		rxfree= r;
		}
	return 0;
}
		


Taking a free rxpkt from the list (which is really executed by Net 
when it needs a buffer to copy a raw packet into) would look like:

/* a far function which returns a far pointer.				*/
/* passed during registration as .rx_alloc_fn_ofs field in 
	_net_raw_reg message									*/
/* returns an address to copy the pkt into--NULL to drop it	*/
/* parameters m1,m2,m3 represent the 6 byte physical 
	address of the destination								*/
void far * far
rx_getbuf(int length, short m1, short m2, short m3)
{
	if(rxcur){
		/* already rx'ing -- should not happen	*/
		/* log error							*/
		rx_dropped++;
		return NULL;
	}

	if(length > RX_LEN){
		/* data too large -- log error			*/
		rx_dropped++;
		return NULL;
	}

	disable();		/* must protect -- keep it short	*/
					/* grab a free rxpkt				*/
		rxcur= rxfree;
		if(rxcur)
			rxfree= rxfree->next;
	enable();

    if(rxcur){
        short *p;
        void far *fp;

        rxcur->len= length;             /* store length of packet   */
        p= (short *)rxcur->to_addr;     /* store physical address   */
        *p++ = m1;                      /*   of destination         */
        *p++ = m2;
        *p++ = m3;
            /* has to be addressable by Net */
        fp=(MK_FP(net_ds, &(rxcur->data)));
        return fp;
    }else{
        /* no buffer availiable -- log error    */
        rx_dropped++;
        return NULL;
    }
}

... and putting a new packet on the rx list (also executed by Net):

/* passed during registration as .rx_done_fn_ofs field in 
	_net_raw_reg message									*/
/* rx'd packet now in rxcur->data							*/
/* store lan and mac info in rxcur, put rxcur into end of rx list	*/
/* return rx_done_proxy -- Net will trigger this			*/
int far
rx_done(short lan, short m1, short m2, short m3)
{
	short *p;

	rxcur->lan= lan;			/* interface id packet was recieved on */
	p= (short *)rxcur->addr;
	*p++ = m1;					/* m1, m2, m3 make the 6byte MAC address */
	*p++ = m2;					/* 	 of the sender of the packet		 */
	*p++ = m3;
	rxcur->next= NULL;

	disable();		/* put at end of rx list */
		if(rxtail)
			rxtail->next= rxcur;
		else		/* if no tail, must be empty (no head either) */
			rxhead= rxcur;
		rxtail= rxcur;
		rxcur= NULL;
	enable();
		
		/* wake up application */
	return rx_done_proxy;
}




The main Recieve() loop of the application looks like:

	for(;;){
		pid= Receive(0,0,0);
		if(pid == -1)
			exit(-1);
		else if(pid == rx_done_proxy){
			rxpkt= get_rxpkt();		/* take it off the rx list */
			if(rxpkt==NULL){
				printf("RX error\n");
				continue;
				}
			handle_rx(rxpkt);		/* process packet */
			put_rxpkt(rxpkt);		/* recycle packet */
			rx_cnt++;
		}else
			handle_msg(pid);
		}



... where get/put_rxpkt are:


/* called by application to grab oldest rx packet	*/
rxpkt_t *
get_rxpkt()
{
	rxpkt_t *r= NULL;

	disable();
		if(rxhead){
			r= rxhead;
			rxhead= rxhead->next;
			if(rxhead==NULL)	
				rxtail= NULL;
		}
	enable();
	return r;
}

/* called by application to re-use rx packet */
void
put_rxpkt( rxpkt_t *r )
{
	disable();
		r->next= rxfree;
		rxfree= r;
	enable();
}
			



TX'ing Raw Packets

Transmitting packets is done using qpkt's, and basically involves getting
a free qpkt from Net, filling it in with a pointer to the packet, and putting
the qpkt on Net's transmit queue.  Net will process the qpkt, put it on the 
applications tx'd qpkt list, and then far call the tx_done_fn() function.

Putting and getting from qpkt lists is performed by the functions found in
the netq_gen.o file, namely: 

struct _net_nq_pkt far * q_get_first(struct _net_q far *list, short irq_off, short unsigned list_segment );
struct _net_nq_pkt far * q_get_last (struct _net_q far *list, short irq_off, short unsigned list_segment );

int q_put_first(struct _net_q far *list, struct _net_nq_pkt far *qpkt, short irq_off, short unsigned list_segment );
int q_put_last (struct _net_q far *list, struct _net_nq_pkt far *qpkt, short irq_off, short unsigned list_segment );


Note: struct _net_nq_pkt is known as a qpkt object, and struct _net_q is a
qpkt list (queue) head.  The exact definitions of these structures is not
important -- _net_q can be treated as opague, and raw applications only need
to set a couple fields of _net_nq_pkt (after memset'ing to 0).

These functions each take a qpkt list (segment must be addressable by Net)
and a irq_off flag.  Raw applications must always set irq_off to 1.

The q_get* functions return a qpkt pointer while the q_put* functions return
the number of elements in the list the qpkt was put on.

As mentioned in the above note, raw application need only fill in a qpkt partially.
This is because qpkts were used before the raw interface was designed into Net and
are used extensivly for QNX native networking.

The required qpkt fields are overloaded for raw applications and include:
	
	qpkt->type;					_VC_RAW
	qpkt->dst_vid;				LAN id to tx packet
	qpkt->src_vid;				handle return from registry
								packet stored in an mx array: 
	qpkt->mx_seg;					segment (must be Net's ds alias)
	qpkt->mx_off;					offset
	qpkt->mx_parts;					number of parts in mx entry
	qpkt->phys_dst_nid;			destination address

All other field members of the qpkt (struct net_nq_pkt) should be set to 0.

There are 2 things which need mentioning here.  The first is the mx entry.
The mx needs to beThe mx entry must be in the registered ds and made with Net's alias.  The
second item is the lack of a back pointer to a user defined structure containing
state about the packet.  Boths of these issue can be solved in the following 
manner.  Consider:

typedef struct _tx_pkt txpkt_t;

struct _tx_pkt{
	struct _mxfer_entry	far mx[5];	/* packet data; qpkt->mx_off points here,
										and thus to the _tx_pkt				*/
	txpkt_t				*next;		/* link list							*/
	struct _net_nq_pkt far	*qpkt;		/* back pointer to the qpkt -- optional	*/
		/* other private data can be stored here as needed */
	};


Now, by casting the mx pointer contained in the qpkt to a txpkt_t, the application
can access and store any information needed.  Also, a pool of txpkt_t's can
be pre-malloc'd and managed like the rxpkt_t's - similar code without the need
to protect the lists because they will not be accessed by Net.  Of course, there
is a small draw back -- the mx entry array must be a fixed size.

A simple tx routine could look like:

void
tx( char *data, int len, int lan, mac_t addr ){
	struct _mxfer_entry far *mx_ptr;
	txpkt_t	*txpkt;

	txpkt= get_txpkt();			
	txpkt->qpkt= q_get_first(net_free_q, 1, net_q_seg);
	_fmemset(txpkt->qpkt, 0, sizeof(struct _net_nq_pkt));
	txpkt->qpkt->type= _VC_RAW;
	txpkt->qpkt->dst_vid= lan;
	txpkt->qpkt->src_vid= net_handle;
	txpkt->qpkt->mx_seg= net_ds;
	txpkt->qpkt->mx_off= FP_OFF(txpkt->mx);
	txpkt->qpkt->mx_parts= 1;
	_fmemcpy(txpkt->qpkt->phys_dst_nid, addr, 6);
	mx_ptr= MK_FP(net_ds, txpkt->mx);
	_setmx(txpkt->mx, MK_FP(net_ds, buffer), BUF_LEN);
	if(q_put_last(net_tx_q, txpkt->qpkt, 1, net_q_seg) == 1)
		Trigger(net_proxy);
}

By default, the physical address of the adaptor will be used for the
source address field of the frame of the packet.  You can override 
this default by using the qpkt->zero1 and qpkt->remote_seg fields.
If qpkt->zero1 is non-zero, then the fields remote_seg, remote_off,
and ext_remote_off will be used to form the 6 byte physical address.
An example tx routine that specifies both the source and destination
physical addresses would look like:

void
tx( char *data, int len, int lan, mac_t to_addr, mac_t from_addr ){
	struct _mxfer_entry far *mx_ptr;
	txpkt_t	*txpkt;

	txpkt= get_txpkt();			
	txpkt->qpkt= q_get_first(net_free_q, 1, net_q_seg);
	_fmemset(txpkt->qpkt, 0, sizeof(struct _net_nq_pkt));
	txpkt->qpkt->type= _VC_RAW;
	txpkt->qpkt->dst_vid= lan;
	txpkt->qpkt->src_vid= net_handle;
	txpkt->qpkt->mx_seg= net_ds;
	txpkt->qpkt->mx_off= FP_OFF(txpkt->mx);
	txpkt->qpkt->mx_parts= 1;
	_fmemcpy(txpkt->qpkt->phys_dst_nid, to_addr, 6);
	if(from_addr){
		txpkt->qpkt->zero1= 1;
		_fmemcpy(&(txpkt->qpkt->remote_seg), from_addr, 6);
		}
	mx_ptr= MK_FP(net_ds, txpkt->mx);
	_setmx(txpkt->mx, MK_FP(net_ds, buffer), BUF_LEN);
	if(q_put_last(net_tx_q, txpkt->qpkt, 1, net_q_seg) == 1)
		Trigger(net_proxy);
}


Once Net has processed the qpkt, it will place it in the applications
txd_q list and update the status (overloaded type field) and finally
call the tx_done_fn() specified in the register message.  This function
should simply return a proxy.  The main Recieve() loop now looks like:


	for(;;){
		pid= Receive(0,0,0);
		if(pid == -1)
			exit(-1);
		else if(pid == rx_done_proxy){
			rxpkt= get_rxpkt();
			if(rxpkt==NULL){
				printf("RX error\n");
				continue;
				}
			handle_rx(rxpkt);
			put_rxpkt(rxpkt);
			rx_cnt++;
		}else if(pid == tx_done_proxy){
				tx_cnt++;
					/* get qpkt from the txd_q */
				qpkt= q_get_first(&txd_q, 1, net_q_seg);
				if(qpkt==NULL){
					printf("TX error\n");
					tx_failed++;
					continue;
					}
					/* check the status */
				if(qpkt->type == EOK){
						/* packet was sent ok... */
						}
				}else if(qpkt->type == EIO){
						/* log error */
						/* maybe try to retransmit ? */
					tx_failed++;
				}else{
						/* should not happen ? */
					tx_failed++;
				}

					/* free up txpkt and release qpkt */
				txpkt= (void *)qpkt->mx_off;
				put_txpkt(txpkt);	/* calls q_put_last() to free qpkt */
		}else
			handle_msg(pid);
		}



See the netraw example for complete source.
