Scholarly article on N-PHENYLCARBAMIC ACID METHYL ESTER 2603-10-3 from Justus Liebigs Annalen der Chemie p. 148

DOI: 10.1109/35.819911

Source and publish data:

Justus Liebigs Annalen der Chemie p. 148 (1897)

Update date:2022-08-11

Topics:

Authors:

Knoevenagel

Schuerenberg

Read Full Text PDF DownLoad Join now for total 90,000,000 free articles

Article abstract of DOI:10.1109/35.819911

Full text of DOI:10.1109/35.819911

HIGH-PERFORMANCE PROTOCOL STACKS

Approaches to Im proving Perform ance

of STREAMS-Based Protocol Stacks

Vimal K. Khanna, Cabletron Systems

ABSTRACT

grams and modules. Modules can be dynamically

inserted and removed, and reused in different

STREAMS kernel mechanisms are being used to

implement networking protocols in a number of

operating systems, like UNIX, Windows, and

pSOS. STREAMS provides a number of desir-

able features for modular protocol stack imple-

mentation by defining a high degree of

standardization for implementing protocol layer

modules and for message transfer between the

modules. But this strict layered and modular

approach prevents flexibility in implementing

techniques that can result in high performance.

The original STR E AMS-based stacks have

shown very poor protocol performance. A num-

ber of improved techniques for implementing

STREAMS-based protocol stacks have been sug-

gested in the literature in the past few years.

These techniques have resulted in high-perfor-

mance STREAMS-based stacks that match the

performance of BSD UNIX-based stacks. This

article makes a study of these new approaches

and discusses the performance gains achieved by

them. These approaches can be used for imple-

menting high-performance protocol stacks in the

STREAMS kernel, over both uniprocessor and

multiprocessor systems.

protocol stacks. These STREAMS features facil-

itate easy and fast development of protocol

stacks. The standard interfaces facilitate easy

reusability and migration to different platforms

and applications.

Although these STREAMS mechanisms ease

the implementation of protocols, due to their

lack of flexibility to optimize the implementa-

tion, they introduce inefficiencies in protocol

performance. Traditional STREAMS-based pro-

tocol stacks have been known to have very low

performance on both uni- and multiprocessor

systems. The performance of the stacks has been

much slower than similar implementations on

BSD UNIX Sockets-based stacks [4]. These orig-

inal STREAMS mechanisms prevent transfer of

flow control information across nonadjacent pro-

tocol layer modules, have inefficient memory

buffer handling capabilities, suffer from schedul-

ing delays, and lack parallelism (required for

high performance on multiprocessors).

The past few years have seen innovative tech-

niques being suggested to improve the perfor-

mance of STR E AMS mechanisms. These

techniques have provided solutions to the above

problems. This has resulted in protocol stack

implementations that give high performance

matching the performance of similar stacks

implemented in the BSD environment. This arti-

cle makes a study of these techniques in the lit-

erature, including a technique suggested by the

author. It should be noted that some of the tech-

niques discussed in this article suggest improve-

ments over a version of STREAMS earlier than

that in [1], but the techniques give the same ben-

efits when they are implemented on this recent

STREAMS version. Knowledge of these tech-

niques can help implementers achieve high per-

formance for protocol stacks being implemented

in the STREAMS environment. These gains are

coupled with the other usual benefits of the

STREAMS environment (e.g., the ease of devel-

opment, reusability, and the availability of stan-

dardized interfaces).

INTRODUCTION

Networking protocol stacks follow a layered

model. When a protocol stack is implemented on

a host, the transport and underlying layers are

implemented in the host kernel. These layers are

made to run over a subnetwork protocol driver

and the corresponding network interface card

(NIC). STREAMS kernel mechanisms are being

used to implement protocols in a number of

operating systems. Traditionally STREAMS were

used in AT&T UNIX [1] and are now also being

used in Windows [2]. Besides being used in these

host operating systems, STREAMS is also being

used in the pSOS [3] embedded operating sys-

tem. A large base of networking protocol imple-

menters is using STREAMS to implement their

protocol stacks in these operating systems.

STR EAMS provide a number of desirable

features for implementing layered protocol

stacks. STREAMS follow a very strict layered

modular approach for protocol implementation.

This facilitates implementing protocol layers as

independent modules. STREAMS provide well

defined standard interfaces for message transfer

between modules, and between application pro-

Since a comparison between STREAMS and

Sockets mechanisms has already been made in [5]

and mechanisms for porting programs between

these two subsystems have been discussed in [6],

these issues are not discussed in this article. This

article concentrates only on STREAMS architec-

ture and its implementation approaches.

Editorial liaison:

A. Pakstas.

This article has been arranged as follows.

IEEE Communications Magazine • February 2000

164

The STREAMS architecture will be discussed.

The inefficiencies of this architecture are exam-

ined. Different techniques being used to address

these inefficiencies have been studied, and it

has been shown that these techniques result in

high-performance implementations; this is also

discussed.

Stream # 1

Application

Stream # 2

Application

User space

Stream

head

Stream

head

STREAMS MECHANISMS

Kernel space

A typical protocol implementation on a host

consists of the transport layer and underlying

layers residing in the kernel, and the higher lay-

ers residing as user-level applications.

STR EAMS mechanisms reside in the kernel.

This kernel-based protocol implementation

approach provides performance advantage over

some user-applications-based protocol imple-

mentation techniques being suggested in recent

literature (e.g., Java-based implementations). In

a timesharing system like UNIX, the code in the

kernel executes at the highest priority and can-

not be preempted. Thus, protocol processing is

much faster in the kernel than if the protocol

was executing as a low-priority preemptible (with

context switching overheads) user application

process.

TIMOD

Module # 1 Module # 2

Stream # 3

Driver # 1

Transport driver

Internet driver

Driver # 2

Driver # 3

Subnet driver

In a typical STREAMS-based implementa-

tion technique, a user application establishes a

transport connection to the remote by accessing

the underlying transport layer through an access

point. The transport layer of the protocol stack

resides over the Internet layer. These communi-

cate through an access point, as shown in Fig.

1. Below the Internet layer, a separate access

point is required for each underlying subnet-

work interface (e.g., Ethernet) protocol driver.

Let us consider the case of data transmission.

The user application sends the data down to

the underlying transport layer through the

appropriate access point. The transport layer

sends this data down to the Internet layer mod-

ule through its access point. The internet layer

interprets the remote network address, chooses

an appropriate access point over that subnet-

work interface driver, and transmits the data

over it. Exactly the opposite process occurs on

the receive side.

STREAMS mechanisms allow the protocol

layers to run in the kernel as STREAMS “drivers”

or “modules.” Under STREAMS, each layer runs

as an independent driver/module. These

drivers/modules have well defined mechanisms to

pass data messages and commands between each

other. The duplex path from the application to

the underlying drivers/modules, until the NIC, is

called a “stream” (lower case letters used to dif-

ferentiate from the overall concept, STREAMS).

We briefly discuss some important

STREAMS mechanisms. The overall architec-

ture is as in Fig. 1, and a detailed look is shown

in Fig. 2. To present these mechanisms, the orig-

inal STREAMS architecture as defined in [1]

has been chosen.

ꢀ Figure 1. Multiplexing drivers result in the creation of multiple unconnected

streams.

Transport application

User area

Stream head

Kernel area

High water

mark

Pointer to

Message

queue

Message

queue

Low water

mark

trans-ursrv()

trans-uwput()

URQ

Transport driver

LRQ LWQ

UWQ

trans-lrput()

net-ursrv()

trans-lwsrv()

net-uwput()

Internet driver

ꢀ Figure 2. A STREAMS-based stack.

tions between messages being transferred from

the application to the underlying hardware. A

protocol layer can be implemented as a module.

Each module supports a set of data structures

called queues. A queue structure facilitates mes-

sage passing between adjacent modules and acts

as a placeholder for local messages of the driver.

Each queue has a pointer to a linked list of mes-

sages being stored by the module.

MODULARITY

STREAMS allow modular development of pro-

tocols. Each layer is implemented as a separate

module or driver. We will first discuss modules,

and then drivers. A module provides transforma-

Drivers are similar to modules with the addi-

tional facility of having more than one stream con-

IEEE Communications Magazine • February 2000

165

nected to it. In Fig. 1 the transport driver has mul-

tiple streams coming from above and one underly-

ing stream. This facility allows drivers to provide

multiplexing of the message. Two user applica-

tions each have independent streams, # 1 and # 2,

running from applications to the transport driver

through the intervening modules. For example, in

stream # 1 the application interacts with the ker-

nel at a STREAMS module called stream head

that converts user application layer buffers to ker-

nel messages. The head interacts with a module

timod that handles transport connection related

functions, like maintaining the state of the connec-

tion. The stream ends at the transport driver. The

transport driver has an independent single stream

to the Internet driver, stream # 3. The transport

layer multiplexes data from both streams # 1 and

# 2 to this stream. The received data on stream # 3

is demultiplexed to the respective upper stream as

per the message header.

In this article, the terms driver and module

will be used interchangeably, unless the refer-

ence is specifically to a multiplexing driver,

where the term driver will always be used.

The drivers can be dynamically loaded and

linked at runtime. Thus, users can create a com-

plete protocol stack at runtime by linking appro-

priate protocol layer drivers over each other.

These mechanisms give ease of configuration

and flexibility to the users. A more detailed look

at streams is shown in Fig. 1; Fig. 2 shows the

queues in a stream.

from sending more data. Thus, flow control mech-

anisms flow across different layers.

STREAMS also

provide improved

data buffer han-

dling capabilities

for reducing the

number of

As the lower layer is able to transmit its data,

its total queue size falls below its low water mark.

The STREAMS kernel procedures then cause a

predefined service() procedure of its higher-layer

driver to automatically be scheduled. In this ser-

vice() procedure, the higher layer gets the stored

message from its queue by calling a procedure

called getq() and sends more data down.

Let us look at the architecture shown in Fig.

2 (to simplify understanding, timod is not shown)

and see how the above mechanisms work. The

transport driver has four queues associated with

it: upper read queue (URQ), upper write queue

(UWQ), lower read queue (LR Q), and lower

write queue (LWQ). The queues have associated

message buffer queues which store the messages

and have their own high water marks (HWMs)

and low water marks (LWMs). The application

sends the data to the stream head, which sends it

down to the UWQ through its put procedure,

trans-uwput(). The transport driver stores the

data in its message queue and sends a copy to

the Internet driver by invoking the net-uwput()

procedure on the LWQ. When the Internet driv-

er exerts flow control, the message is not sent

down, but only stored in the transport message

queue. The application is put to sleep when the

HWM of UWQ is crossed. The trans-lwsrv() ser-

vice procedure is automatically invoked when

Internet driver relaxes the flow control. The ser-

vice() procedure picks the nontransmitted mes-

sages from the UWQ message queue and sends

these down. The messages stored in the UWQ

message queue are cleared when their acknowl-

edgments are received. When the LWM of the

message queue is reached, the application is

awakened and can send more data. Similar pro-

cedures are followed on the read side, where the

Internet driver sends data up through LRQ, and

the transport driver sends it to the application

through URQ.

memory copies.

These

mechanisms are

called extended

buffer handling

mechanisms. This

allows STREAMS

messages to

point directly to

client-supplied

non-STREAMS

data buffers.

Adjacent driver modules interact with each

other through a pair of queues: read queue

(RQ) and write queue (WQ). The RQ is for the

messages coming from the underlying drivers

and being sent to the higher layers. The WQ

handles messages coming from the higher layers

and being sent down.

FLOW CONTROL AND

SCHEDULING MECHANISMS

STREAMS mechanisms allow high and low water

marks to be defined for each queue. User calls to

send messages are made to sleep when the byte

count in the message list of the WQ between the

user application and transport layer module

exceeds its high water mark. Thus, the user can-

not send any more messages. The transport layer

frees the acknowledged messages when packets

are acknowledged. When the byte count falls

below the low water mark, the user application is

automatically awakened by the STREAMS mech-

anisms, and new messages can be sent.

When the higher layer is a STREAMS driver

rather than a user application, the flow control

mechanisms work as follows. The higher-layer

driver uses a procedure called canput() to check if

the high water mark of the lower driver WQ has

been reached. If the mark has not been crossed, it

keeps sending the data down by a standard proce-

dure, called putnext(). This invokes the put() pro-

cedure of the lower driver, which keeps storing

the data in the WQ. When its high water mark is

reached, the higher-layer driver does not send any

more data down, but stores the data in its own

WQ using a procedure called putq(). This storing

of data makes its own high water mark to be

crossed. This flow control prevents its higher layer

MESSAGE HANDLING

The STREAMS message structure is shown in

Fig. 3. A message starts with a message block.

This links to a data block, which contains the

message type and size, and points to a data

buffer. The data buffer contains the data. A long

message can consist of a large number of mes-

sage blocks linked together by b_cont pointers in

the message blocks.

An allocb() procedure needs to be called to

create a message. This allocates both the mes-

sage blocks and the associated data buffer of the

requested size. Long messages can be created by

multiple calls to allocb() and linking the blocks

together by a linkb() procedure. If the system is

low on memory, allocb() can fail; a bufcall() pro-

cedure needs to be called, which queues the

allocation request and calls a function when the

buffer is available.

STREAMS also provide improved data buffer

handling capabilities for reducing the number of

memory copies. These mechanisms are called

extended buffer handling mechanisms. This allows

STREAMS messages to point directly to client-

supplied non-STREAMS data buffers. For exam-

ple, the data buffer in Fig. 3 can be a dual-port

IEEE Communications Magazine • February 2000

166

R AM on an NIC. The NIC stores the data

received from the communication link in its

dual-port memory data buffer. Since this buffer

is mapped to a STREAMS message, the proto-

col layers process this data just like data in any

normal STREAMS message. The message pass-

es through different protocol layer driver until it

reaches the stream head, where the data buffer

is copied to user buffers. Thus, a single copy is

required in the kernel, which results in improved

performance. It should be noted that copying is

required at the user/kernel interface, since in an

operating system the kernel buffers are different

from user buffers.

Head

Message

block

Data

block

buffer

b_cont

Message

block

Data

block

Data

buffer

INEFFICIENCIES OF

STREAMS MECHANISMS

ꢀ Figure 3. A STREAMS message consisting of two message blocks.

These original STREAMS mechanisms suffer

from a number of inefficiencies when used to

implement protocol stacks over uni- and multi-

processor systems. These inefficiencies have

been categorized under four broad headings,

and the reasons for these problems are described

below.

Finally, the flow control mechanisms can be

exerted only on the basis of byte count in a mes-

sage queue. This is a big limitation.

LIMITED MESSAGE HANDLING CAPABILITIES

STREAMS message processing at the receiving

application is inefficient. On getting a read call,

the application receive buffers are filled only

with the currently available received data at the

stream head, and the read call returns. This is

unlike Sockets mechanisms, where the socket

layer tries to fill the full read buffers by waiting

for more incoming data before returning the

call. Thus, STR E AMS receive mechanisms

require more read system calls, and the corre-

sponding context switches, than Sockets mecha-

nisms. This degrades the performance.

INFLEXIBLE INTERMODULE COMMUNICATION

STREAMS defines standard well defined mes-

sages and interfaces between drivers. Developers

must adhere to these interfaces. This well

defined separation prevents high-performance

mechanisms that require messages to be pro-

cessed across the stream head and the underly-

ing protocol layers boundary. The integrated

layer processing (ILP) technique of combining

checksum calculation with user-to-kernel copy

operation is not possible. ILP techniques opti-

mize costly memory access operations in any

protocol by integrating all data manipulations in

a single processing loop. Since the stream head

is a separate driver from the transport driver, it

is unaware of transport segment boundaries

while copying data from user space to kernel

buffers. Hence, the stream head cannot perform

transport segment checksum calculation simulta-

neously with this copy operation.

ISSUES IN SCHEDULING MECHANISMS

STR EAMS suggests that implementers carry

most of their protocol processing in service()

procedures instead of the put() procedure. The

processing routines should queue their messages

in the put() procedure, to be processed when

STREAMS schedules the service() procedure for

the queue. This prevents the current process

from hogging the kernel and gives a chance to

other kernel functions to execute.

Also, the multiplexer drivers prevent flow of

information across adjacent driver layers. For

example, in Fig. 1 the transport mux driver

causes multiple streams to be created. The mux

driver terminates stream # 1; hence, the stream

does not continue below the mux driver (stream

# 3 is an independent stream). This characteris-

tic of the mux driver prevents the usual flow

control mechanisms (water marks and the asso-

ciated scheduling) of STREAMS to work across

these streams. The implication of this is that

the flow control information in the subnet driv-

er cannot flow to the user applications running

over this subnet, since the information flow is

terminated at the transport driver. Subnet pro-

tocols like asynchronous transfer mode (ATM)

or frame relay usually receive congestion infor-

mation from the network and are supposed to

throttle the applications above them responsi-

ble for sending large data. This is impractical to

implement in STR E AMS since the subnet

stream cannot flow control the application

streams.

This introduces queuing/dequeuing delays.

Also, service() procedures are normally sched-

uled by different threads of execution, such as

the end of a system call or return of an interrupt

routine. This introduces delays due to context

switching and the time required before schedul-

ing service() procedures. These delays cause sig-

nificant performance degradation.

These problems do not exist with put(), and

the processing is faster. A put() procedure is

called with the message to be processed as a

parameter to the procedure. The procedure pro-

cesses the message immediately and passes it to

the next protocol layer by calling the put() pro-

cedure of the next layer. Thus, no queuing and

dequeuing of the messages or any scheduling

delays are incurred.

However, the use of service() procedures pro-

vides additional scheduling mechanisms for

developers that can be effectively used to imple-

ment many of the desired protocol functions

which require such scheduling (e.g., timer han-

IEEE Communications Magazine • February 2000

167

STR E AMS limitations. The work provides

improved intermodule communication and

makes effective use of STREAMS scheduling

mechanisms.

High water mark = 1

The STREAMS mechanisms force strict byte-

count-based intermodule flow control and pre-

vent one from exerting flow control on the

occurrence of any other event in the protocol

driver. These mechanisms were made flexible by

adapting a message storing architecture, as

shown in Fig. 4. The messages coming from

UWQ were stored in a locally defined message

queue that was independent of the STREAMS

message queue. Thus, storage of messages in this

queue does not exert flow control on the upper

modules. A high water mark of 1 byte was

defined for the actual STR E AMS message

queue, where no message was stored.

The protocol driver could exert flow control on

the upper module at any point of time by inserting

a 1-byte message in the STREAMS message queue.

The flow control can be relaxed by removing this 1-

byte message. Thus, one can exert flow control on

any event occurring in the module, independent of

the byte-count in the message queue. This could be

used by the subnet driver to exert flow control over

higher layers on receiving congestion notifications.

Alternatively, the same mechanisms can be used by

transport protocols (e.g., OSI TP4) that would like

to exert flow control on applications based on num-

ber of packets in their window, instead of number

of bytes (as in TCP).

OSI TP4 protocol was implemented using this

architecture. The standard flow control mecha-

nisms require one to choose a transport driver

high water mark based on a maximum number

of unacknowledged bytes in the transmit win-

dow. Since TP4 end-to-end flow control mecha-

nisms are based on a packet-count-based window

size, one needs to decide the HWM byte count

as a multiple of maximum window size (in terms

of number of packets) and maximum transport

packet size. If the average packet is much small-

er than this maximum, large memory buffers are

held onto in the transport driver (which cannot

be transmitted remotely, since the maximum

packet count has been reached) before the driv-

er can exert flow control on the applications.

In our architecture the TP4 packets were

kept in the OOB queue. The packet count was

tracked and 1-byte flow control exerted when

the packet count reached the maximum trans-

port window size. Thus, no memory buffer

wastage could occur. Our performance analysis

at a maximum packet size of 1 kbyte and an

average packet size of 512 bytes showed that the

worst case memory buffer usage in our architec-

ture was 50 percent of the standard STREAMS

approach. It should be noted that the overhead

of having one additional byte, for exerting flow

control, is insignificant compared to the large

memory buffer savings observed.

STREAMS message queue

STREAMS message

queue

URQ

LRQ

UWQ

LWQ

Transport driver

Locally

defined

data

queue

ꢀ Figure 4. Message queues for data in the OOB architecture.

dling). Thus, improved STREAMS architectures

need to judiciously choose between the use of

put() and service() procedures.

LACK OF PARALLELISM

STREAMS mechanisms are not optimized for

multiprocessor architectures. For good scalabili-

ty on such architectures, one must be able to

distribute tasks between the processors and

allow tasks to run in parallel. In STREAMS, the

kernel maintains a single global list of enabled

service() procedures, which are processed in

FIFO manner by the STR E AMS scheduler.

This strict FIFO scheduling prevents paral-

lelism. Service() procedures run in threads that

are out of context of the main data path; trying

to run them in parallel may lead to data being

delivered out of sequence, unless appropriate

locking mechanisms are applied. A large amount

of locking consumes time and degrades perfor-

mance.

Also, mux drivers prevent a stream from

being formed end to end. In Fig.1, one would

like different message paths to be parallelized.

Protocol processing on a message coming from

an application until the time it is transmitted on

the NIC should run on one processor so that

the complete in-sequence data path is paral-

lelized with other similar data paths. It would

require one to lock a stream and then process

data on it. But since streams break at the mux,

end-to-end processing on the stream is not pos-

sible since stream locking at stream# 1 cannot

lock the stream across the driver, stream # 3

(since stream # 3 is used by multiple upper

streams, any single upper stream locking this

lower stream will prevent all other upper

streams from using it, preventing parallelism).

Thus, per-stream parallelism becomes difficult

to implement.

A STUDY OF

IMPROVED ARCHITECTURES

Another contribution of the article was making

intelligent use of STREAMS scheduling mecha-

nisms to implement the timer handling functions

of protocols. The protocol driver stores the data

to be used on timeout in local structures and

invokes the timer by calling the timeout() proce-

dure, with the time period as the parameter. On

timeout, a user-defined function is invoked that

OUT OF BAND ARCHITECTURE

The author implemented a STR EAMS-based

protocol stack in [7] that will be referred to as

the out of band (OOB) architecture. The tech-

niques devised were able to address some of the

IEEE Communications Magazine • February 2000

168

executes the protocol functions by referring to the

data stored in these local structures (e.g., for

retransmission of packets these structures will

point to the messages to be retransmitted). This

implementation was simplified by assigning a sep-

arate queue for timer handling. A controlling

application program opens the transport driver

and creates a stream to it. All the data to be used

on expiry of the timer are stored in the message

queue of this stream. On timer expiry, the time-

out routine calls a STREAMS qenable() routine

to enable the service() procedure of this queue.

The procedure gets the relevant structures from

this message queue and executes its functions.

Thus, it could provide a simplified implementa-

tion by using the existing STREAMS queues and

scheduling mechanisms.

It should also be noted that the Retix archi-

tecture [8] also makes effective use of

STREAMS scheduling mechanisms by having an

independent timer driver based on it. The differ-

ence in our approach is that our timer queue is

within the transport driver, but their architecture

uses a single timer driver that services all proto-

col layers.

processes are dynamically balanced by the oper-

ating system, service() procedures also get auto-

matically balanced without any extra effort.

An interesting

design

Similarly, symmetric multiprocessor hardware

dynamically directs an interrupt to the processor

running the lowest-priority process. This fact was

used to send a scheduling interrupt to make an

instance of scheduler run as the handler of this

interrupt. This provides the thread for the service()

procedure. Since interrupt handlers are dynamical-

ly balanced, so are these service() procedures.

Although [9] does not give any experimental

data, it does mention that performance improve-

ment was observed when additional processors

were added, due to this improved parallelism.

The Sequent architecture also improves the

message handling capabilities of STREAMS for

both uni- and multiprocessors. If the memory

message allocation call fails, STREAMS requires

a bufcall() routine to be called to invoke a user-

defined function on the later availability of the

buffer. But if a stream closes after calling bufcall(),

the user-defined function is unable to find its data

structures and corrupts the kernel. To avoid this, a

new unbufcall() utility was created that cancels the

previous bufcall(). This unbufcall() must be called

as part of any module’s close procedures.

consideration of

the architecture

was improving

STREAMS

scheduling

mechanisms to

implement paral-

lelism. STREAMS

provides strict

FIFO scheduling

of service()

procedures on its

global queue.

This was modified

by having the

scheduler

SEQUENT ARCHITECTURE

We now look at the first parallel STR EAMS

architecture, from Sequent [9]. This architecture

runs on their tightly coupled Symmetry multipro-

cessor system. For improved scalability on multi-

processors, multiple tasks should work in parallel

on multiple processors. The Sequent architecture

provides mechanisms to provide two types of par-

allelism: horizontal parallelism, where functions

within each stream run on one processor (e.g., in

Fig. 1, stream # 1 will run on one processor and

# 2 on another); and vertical parallelism, where

functions within a stream can themselves run in

parallel on different processors (e.g., timod and

transport driver functions of a single stream run-

ning in parallel). Data consistency is achieved by

ensuring fine-grained locking.

An interesting design consideration of the

architecture was improving STREAMS schedul-

ing mechanisms to implement parallelism.

STREAMS provides strict FIFO scheduling of

service() procedures on its global queue. This

was modified by having the scheduler schedule

these procedures on the global queue on any

available processor. The addition and deletion of

procedures on this queue is synchronized by a

global lock. The thread of execution of the pro-

cedures must be provided intelligently to dynam-

ically balance the load on the processors. Two

alternative mechanisms were adopted.

It has been observed that symmetric multi-

processor operating systems inherently balance

the total process load on the number of available

processors. Thus, if a number of applications’

processes are present, the runnable process will

normally be dispatched to run on the processor

running the lowest-priority process. This fact was

used to create multiple system daemon processes

to provide the thread of execution for service()

procedures. These processes wait on a common

semaphore and get awakened when a new ser-

vice() procedure is enqueued in the global

queue. The service() procedure then executes in

the thread of the daemon process. Since daemon

SVR4MP ARCHITECTURE

The Sequent architecture was based on the

assumption that developers are performing most

protocol processing in the service() procedures

rather than in the put() procedures. There can

still be different views on this point, but the Sys-

tem V R elease 4 MultiProcessor (SVR 4MP)

parallel architecture [10] is based on the assump-

tion that put() procedures occur more than ser-

vice() procedures. They have analyzed a running

TCP/IP stack and have shown that put to service

ratio is 1.6:1.

Since parallelizing multiple service() proce-

dures is not required, they have not used the

fine-grained vertical parallelism and complex

scheduling mechanisms adopted in the Sequent

architecture. Instead, horizontal stream-level

parallelism was adopted where each stream was

run over a single processor. In effect, there is

only one operation per stream allowed at a time.

A lock per stream was used which significantly

reduced large number of locking operations

required in finer-grained parallelism approaches.

Thus, throughout the system call and corre-

sponding put() procedures in different modules,

a single stream lock was held. Independent

streams can run in parallel on multiple proces-

sors, improving performance. Results showed

that TCP /IP performance improved when addi-

tional 33 MHz Intel 486 processors were added

on Compaq Systempro. The speedups over one

CPU were 1.78 at two processors, 2.35 at three,

2.55 at four, and 2.93 at five processors.

schedule these

procedures on

the global queue

on any available

processor.

However, the inflexible intermodule commu-

nication of mux drivers prevents the single lock

to work across the streams on two sides of a

mux. For example, in Fig. 1, after locking stream

# 1 and processing data on it, the transport driv-

er put() cannot invoke the Internet driver put()

procedure since stream # 3 is an independent

stream and has not been locked. Thus, the

SVR4MP model fails.

IEEE Communications Magazine • February 2000

169

This problem was solved in

SVR 4MP by designing a coupler

module over the stream below a

mux driver. The upper layer call

will now use this module to give a

call to acquire the lock of the lower

stream. If this succeeds, the mes-

sage is passed to the lower stream.

If the acquisition fails, coupler mod-

ule queues the message in its queue

and tries later from its service()

procedure. A separate queue lock

has been provided that is indepen-

dent of a stream lock so that the

Feature

Architecture

Flexible intermodule communication

Improved message handling

OOB, SVR4MP, demux

Sequent, demux

Efficient use of scheduling mechanisms OOB, Sequent, Retix

Improved parallel architecture

Sequent, SVR4MP,

demux

ꢀ Table 1. A summary of improvements achieved by different

architectures.

messages can be queued even in absence of

holding the stream lock. When the lower layer

service() procedure runs, it is being executed

with its stream lock being held, and hence can

process the data.

frame relay need to control their data transmis-

sion rate on receipt of congestion notification

frames. In the demux architecture, the incoming

congestion notification makes the anchorage

driver exert flow control, which prevents the cor-

responding transport layer from sending more

data on a congested link.

DEMUX ARCHITECTURE

The demultiplexed STREAMS architecture [5]

provides improved intermodule communication

and efficient message handling on uniprocessors

and improved parallelism on multiprocesors.

This architecture, like SVR4MP, uses put()

rather than service() procedures to avoid schedul-

ing delays. As shown in Fig. 5, the architecture is

based on having an end-to-end stream from the

stream head to the lowest layer. The transport

and Internet layers reside as modules, not as

drivers. The protocol control functions are also

implemented as separate streams, for example,

and an ICMP/IP modules stream handles control

packets. The only demultiplexing is performed at

an anchorage driver residing above the subnet

driver. This looks at all protocol layer headers

and demultiplexes incoming packets to the trans-

port applications or control streams to which

they are destined.

The work also gives improved message han-

dling facilities in STR E AMS. In original

STREAMS, when an application sends data on

a stream, the stream head copies the user

buffer to the kernel, and the transport driver

segments it into packets. While segmenting,

the checksums are calculated. The demux

architecture moves the segmentation functions

to the application libraries (called XTI). The

library routines segment the data before send-

ing it down. These segmented buffers are then

copied from user to kernel buffers, and check-

sum is performed simultaneously on it (since

the segment boundaries are already known).

Integrating copying and checksum gives high

performance.

The receive buffer handling performance was

improved by having the stream head completely

fill the application buffers on getting a read call.

Also, to take care of the special case of interac-

tive applications (where application cannot wait

for the complete buffer to be received), an option

was provided in the library to make the read call

return immediately with the data currently avail-

able in the stream head receive buffers.

This architecture removes the limitations

imposed by a mux transport driver. The end-to-

end stream from application to anchorage driver

allows the driver to invoke flow control through

to the transport layer on receipt of subnet con-

gestion information. Protocols like ATM and

These improvements resulted in the

STREAMS TCP/IP stack giving similar perfor-

mance as a BSD TCP/IP stack. The experiments

were conducted running one TCP connection in

loopback mode on a 42 MH z Power DPX/20

running AIX/3.2.5. At a transport service data

unit (TSD U ) size of 8 kbytes, the standard

STR E AMS stack gave a throughput of 3000

kbytes/s, while both the demux and BSD stacks

gave a throughput of 5000 kbytes/s.

The architecture also simplifies multiproces-

sor implementation. As in SVR4MP, per-stream

locking was used to run all streams in parallel.

The clean end-to-end stream from the applica-

tion avoids the locking problems associated

with the breaking of the streams due to mux

drivers. An interesting advantage of the demux

approach is that it gives high parallelism

because the functions within the protocols, such

as control packet handling, can also run in par-

allel on different processors. This is possible

since each function is running in a different

stream (e.g., ICMP/IP is a separate stream).

This is not possible to implement in any other

Head

TIMOD

TCP

module

TCP

module

ICMP

module

Anchorage driver

Subnet

driver

ꢀ Figure 5. Demux STREAMS TCP/IP stack implementation.

IEEE Communications Magazine • February 2000

170

architecture, since all the protocol functions

run as a monolithic unit in their modules. Anal-

ysis has shown that their STREAMS architec-

ture shows better scalability than the BSD

parallel architecture on a four-processor 66

MHz PC601 ESCALA running AIX/4.1.2. The

speedup from one to four processors was 2.6 in

the demux architecture, compared to 2.2 in the

BSD architecture.

REFERENCES

[1] AT&T, Unix System V Release 4 Programmer’s Guide:

STREAMS, Prentice Hall, 1992.

[2] H. Custer, “Inside Windows NT,” Microsoft, 1995, pp.

285–326.

[3] Integrated System Inc., “pSOSystem System Concepts,”

rel. 2.0, 1995.

[4] S. J. Leffler et al., The Design and Implementation of the

4.3 BSD Unix Operating System, Addison-Wesley, 1990.

[5] V. Roca, T. Braun, and C. Diot, “Demultiplexed Architec-

tures: A Solution for Efficient STREAMS-Based Communi-

cation Stacks,” IEEE Network, July/Aug. 1997, pp. 16–26.

[6] B. Krupczak, K. L. Calvert, and M. H. Ammar, “Increas-

ing the Portability and Re-Usability of Protocol Code,”

IEEE/ACM Trans. Net., Aug. 1997, p. 445.

[7] V. K. Khanna, “ISO Transport Protocol Implementation

in Unix STREAMS Environment,” vol. 6, (1995), Inter-

networking Res. and Experience, pp. 143–66.

[8] SCO, “SCO/RETIX OSI/LT-610 Administrator’s Guide,” 1990.

[9] A. Garg, “Parallel STREAMS: A Multi-Processor Imple-

mentation,” Proc. USENIX, Winter 1990, pp. 163–76.

[10] S. Sa xen a e t a l., “Pit fa lls in Mu lt it h rea d in g SVR4

STREAMS a n d Ot h e r We ig h t le ss Pro ce sse s,” Pro c.

USENIX, Winter 1993, San Diego, CA, pp. 85–96.

The performance

of one of

the recent

architecture,

namely the

demux

CONCLUSIONS

architecture, has

been shown to

match the

STR EAMS provides a number of facilities to

ease development of protocol stacks, but the

original STREAMS-based stacks have had very

low performance. The reasons for this are ana-

lyzed in this article. These are limitations in

intermodule communication, inefficient message

handling, problems in scheduling mechanisms,

and lack of parallelism in the STREAMS archi-

tecture. A study of a number of mechanisms

devised in the past few years to improve the per-

formance of STREAMS-based stacks is present-

ed. The use of these mechanisms results in

STREAMS-based protocol implementations that

give high performance, coupled with the other

advantages available with STREAMS. The per-

formance of one recent architecture, namely

demux, has been shown to match the perfor-

mance of protocol stacks implemented in a BSD

Sockets kernel. Improved STREAMS parallel

architectures have shown good scalability on

multiprocessor systems. A summary of the archi-

tectures discussed and the improvements due to

these is given in Table 1.

performance of

protocol stacks

implemented

in a BSD

BIOGRAPHY

Sockets kernel.

VIMAL K. KHANNA heads Cabletron software engineering in

India. He has 14 years of networking software develop-

ment experience in the industry. He has developed prod-

ucts and executed projects in networking protocols like

TCP/IP, SNMP, OSI TP4/CLNP, ATM, frame relay, X.25, Ether-

net, wireless LANs, and others. His areas of research inter-

est also include suggesting operating system extensions to

support high-speed networking protocols. His indepen-

dently written papers have been published in international

networking journals. He is on the editorial board of IEEE

Communications Magazine. He is a recipient of the Nation-

al Talent Search Scholarship and the Central Board of Sec-

o n d a ry Ed u ca t io n Silve r Me d a l in In d ia . He h a s h e ld

positions of merit in the All India Senior Secondary and

Indian Engineering Services examinations.

IEEE Communications Magazine • February 2000

171

Products guided by the article

Product name:N-PHENYLCARBAMIC ACID METHYL ESTER

Cas No:2603-10-3

R&D Labs maybe for 2603-10-3

Dezhou Longteng Chemical Co., Ltd.

website:http://www.sodium-methoxide.cn/

Contact:0086-18866052283

Address:Xinhua Industrial Zone, Dezhou City, Shandong Province, China
jiangsu haian chemical co.,ltd.

Contact:86-513-15851283853

Address:No.99,Changjiang West Road,Haian County,Jiangsu Province,China
Jinan Jianfeng Chemical Co., Ltd

Contact:0086-531-88110457

Address:sales01(-a-t-)pharmachemm{dot}c+o+m
Lanling Hongchuang Flame Retardant Co., Ltd.

Contact:+86-531-68858132

Address:East Huafeichang Road, Cangshan County, Linyi, Shandong, China (Mainland)
Xi'an Tizan Tech & Industry Co., Ltd.

Contact:86-18629066522

Address:C3009 TANG FENG INTERNATIONAL PLAZA, NO.18 FENGHUI NAN ROAD, XI'AN HIGH TECH ZONE, 710075 CHINA.

Relevant to this article

A series of novel, potent, and selective histone deacetylase inhibitors

Doi:10.1016/j.bmcl.2006.09.002
(2006)
Conjugated polyelectrolytes with aggregation-enhanced emission characteristics: Synthesis and their biological applications

Doi:10.1002/asia.201300501
(2013)
Quantitative analysis of the chiral amplification in the amino alcohol- promoted asymmetric alkylation of aldehydes with dialkylzincs

Doi:10.1021/ja981740z
(1998)
New triterpenoid saponins from the roots of Gypsophila pacifica Kom.

Doi:10.1016/j.carres.2009.08.015
(2010)
Photoluminescent properties of Eu³⁺, Tb³⁺ activated M₃Ln(PO₄)₃ (M = Sr, Ca; Ln = Y, La, Gd) phosphors derived from hybrid precursors

Doi:10.1016/j.jallcom.2006.03.093
(2007)
Fuel cell anode catalyst performance can be stabilized with a molecularly rigid film of polymers of intrinsic microporosity (PIM)

Doi:10.1039/c5ra25320a
(2016)

Article Doi

DOI: 10.1109/35.819911

Source and publish data:

Authors:

Article abstract of DOI:10.1109/35.819911

Full text of DOI:10.1109/35.819911

Products guided by the article

R&D Labs maybe for 2603-10-3

Relevant to this article

Hot Product