ArticlePDF Available

Hardware functionality of the medium-speed AAPN Demonstrator Prototype

January 2009

January 2009

Authors:

The design and implementation of the hardware functionality of a prototype optical time division multiplexed communications network is presented in this report. The optical network follows the principles of the Agile All-Photonic Network (AAPN) paradigm: a wavelength division multiplexed network with an overlaid-star topology whose edge nodes perform electrical to optical conversion and whose core switches are all-optical and therefore very fast-switching. The AAPN operates in a time division multiplexing (TDM) mode at the optical core. The demonstrator prototype is intended to be a scaled-down version of an AAPN: it consists of a collapsed single-wavelength network of 4 edge nodes surrounding one 4x4 optical core switch. One of the edge nodes operates as the master edge node and is used to control the fast optical core switch. Each edge node is conformed of a PC and a FPGA development board, with a 100 Mbps Ethernet network card interconnecting them. A 1 Gbps optical transceiver connects the FPGAs to the core switch.

Software architecture of a master node and a slave node [11]

…

Free-X TM 2x2 optical switch by Civcom Inc. [10].

…

SFP MSA Transceiver by Fujitsu Limited [9].

…

Description of the field contents of the Ethernet payload in the PC-FPGA interface

…

+17

Clock and word size parameters for the optical transmission

…

Figures - uploaded by Trevor James Hall

Content may be subject to copyright.

Content uploaded by Trevor James Hall

Content may be subject to copyright.

Hardware functionality

of the medium-speed

AAPN Demonstrator Prototype

Technical Report by

Pino G. Dicorato, Peter Farkas, Sofia A. Paredes, James Zhang

Principal Investigators: Gregor v. Bochmann, Trevor J. Hall

Centre for Research in Photonics

School of Information Technology and Engineering

University of Ottawa

December 2008

This work was supported by the Natural Sciences and Engineering Research Council (NSERC) and industrial

and government partners, through the Agile All-Photonic Networks (AAPN) Research Network.

Abstract

The design and implementation of the hardware functionality of a prototype optical time division multiplexed

communications network is presented in this report. The optical network follows the principles of the Agile All-

Photonic Network (AAPN) paradigm: a wavelength division multiplexed network with an overlaid-star topology

whose edge nodes perform electrical to optical conversion and whose core switches are all-optical and therefore

very fast-switching. The AAPN operates in a time division multiplexing (TDM) mode at the optical core.

The demonstrator prototype is intended to be a scaled-down version of an AAPN: it consists of a collapsed

single-wavelength network of 4 edge nodes surrounding one 4x4 optical core switch. One of the edge nodes

operates as the master edge node and is used to control the fast optical core switch. Each edge node is

conformed of a PC and a FPGA development board, with a 100 Mbps Ethernet network card interconnecting

them. A 1 Gbps optical transceiver connects the FPGAs to the core switch.

The design of the edge node consists of a division of labour between the FPGA component and the PC

component. The "Hardware Functionality", as it is called here, consists of the low level, fast functions

implemented on the FPGA development board, such as interfacing to the PC component, precise slotted optical

transmission, optical burst-mode reception and configuration of the network core switch. The hardware has been

implemented using custom circuits developed using programmable logical elements on a System on Chip (SoC)

field programmable gate array (FPGA). The design of the custom circuits is implemented using a Mealy finite

state machine model programmed using VHDL code. Other elements within the design are implemented using

compiled software within a hard-core microprocessor on the FPGA using on-chip memory. This report also

includes the implementation of the core optical switch, which was done with a Clos array of 2x2 free-space

optical switches.

Slotted optical transmission is the process of transmitting a data slot (a relatively large data packet) to the optical

core of the network precisely at the time dictated by the TDM scheduler. The hardware assembles the slot within

the FPGA memory, performs a parallel-to-serial conversion, encodes the slot and converts it to an optical signal

for transmission at the pre-defined time. For optical burst-mode reception a clock is first derived with clock-data

recovery circuitry in order for the data to be sampled appropriately and then converted back to the electrical

domain after the start of the slot being received has been identified. Upon reception, the data is deserialized for

further processing in the electrical domain and sent to the PC component of the edge node.

High level functionality of the AAPN prototype such as traffic aggregation, traffic monitoring, bandwidth allocation

functions and network synchronization protocols were implemented in the software component on the PC of the

edge nodes, which was addressed in a parallel separate project ("Software Control Platform") not discussed in

this document.

Table of contents

Hardware functionality of the medium-speed AAPN Demonstrator Prototype......................................................1

Abstract..................................................................................................................................................................2

Table of contents....................................................................................................................................................3

List of figures..........................................................................................................................................................5

List of tables...........................................................................................................................................................6

Objectives......................................................................................................................................................7

AAPN Demonstrator Prototype Overview ......................................................................................................7

2.1.

AAPN architecture overview..................................................................................................................7

2.2.

Demonstrator Prototype architecture.....................................................................................................8

2.3.

Edge Node Architecture in the Demonstrator Prototype........................................................................9

Hardware functionality overview..................................................................................................................10

3.1.

Edge node...........................................................................................................................................10

3.2.

Fast optical core switch.......................................................................................................................12

Design specifications...................................................................................................................................12

4.1.

PC-FPGA interface Ethernet frame.....................................................................................................12

4.2.

Time Diagram of the time slot in the optical core ................................................................................14

4.3.

AAPN slot in the optical domain..........................................................................................................15

PC-FPGA Interface......................................................................................................................................17

5.1.

Overview .............................................................................................................................................17

5.2.

Detailed description of the Ethernet interface......................................................................................18

5.3.

Detailed description of the Asynchronous Serial Bridge API ...............................................................19

5.3.1.

READ ADDRESS Register.........................................................................................................20

5.3.2.

WRITE ADDRESS Register .......................................................................................................21

5.3.3.

READ LENGTH Register............................................................................................................21

5.3.4.

WRITE LENGTH Register ..........................................................................................................22

5.3.5.

FLAG LENGTH Register ............................................................................................................22

5.3.6.

CONTROL Register....................................................................................................................23

5.3.7.

STATUS Register.......................................................................................................................24

Slotted Transmission ...................................................................................................................................25

6.1.

Overview .............................................................................................................................................25

6.2.

Detailed description of slot transmission .............................................................................................26

6.2.1.

Algorithmic State Machine..........................................................................................................27

6.2.2.

Hardware blocks.........................................................................................................................29

Optical Transmission and Reception ...........................................................................................................35

7.1.

Overview .............................................................................................................................................35

7.2.

Detailed description.............................................................................................................................36

7.3.

GXB Transmitter Interface...................................................................................................................37

7.4.

GXB Receiver Interface.......................................................................................................................39

Slot Reception .............................................................................................................................................44

8.1.

Overview .............................................................................................................................................44

8.2.

Detailed description of slot reception...................................................................................................44

Core Optical Switch .....................................................................................................................................50

9.1.

Overview .............................................................................................................................................50

9.2.

Non-blocking architecture....................................................................................................................50

10.

Experimental test results .........................................................................................................................52

11.

Summary and Discussion........................................................................................................................55

12.

Future work..............................................................................................................................................57

References...........................................................................................................................................................57

Appendix 1. Project team members......................................................................................................................59

Appendix 2. Hardware codes ...............................................................................................................................59

List of figures

Figure 2.1. AAPN Architecture: AAPN Architecture: (a) Overlaid star topology that characterises an Agile All-

Photonic Network. (b) Use of selectors in AAPN to allow for larger numbers of edge nodes.................................7

Figure 2.2. AAPN Core node architecture..............................................................................................................8

Figure 2.3. AAPN Demonstrator Prototype architecture. E1 corresponds to the master edge node. .....................9

Figure 2.4. Block diagram of the main functions in the edge node of the demonstrator prototype. ........................9

Figure 2.5. Software architecture of a master node and a slave node [11]...........................................................10

Figure 3.1. Stratix GX Development Board by Altera Corp. [8].............................................................................11

Figure 3.2. SFP MSA Transceiver by Fujitsu Limited [9]. .....................................................................................12

Figure 3.3. Free-X

2x2 optical switch by Civcom Inc. [10]................................................................................12

Figure 4.1. PC-FPGA interface: Ethernet payload contents in the downstream direction.....................................13

Figure 4.2. PC-FPGA interface: Ethernet payload contents in the upstream direction.........................................13

Figure 4.3. Timing diagram in the optical domain, downstream direction. ............................................................15

Figure 4.4. Timing diagram in the optical domain, upstream direction. ................................................................15

Figure 4.5. Contents of the AAPN time slot..........................................................................................................16

Figure 5.1. Block diagram of the PC-FPGA interface within the edge node prototype .........................................17

Figure 5.2. Ethernet frame field structure .............................................................................................................18

Figure 5.3. Flowchart describing the Ethernet interface program .........................................................................18

Figure 6.1. Main functional blocks of the Slotted Transmission module ...............................................................25

Figure 6.2. AAPN slot transmission block diagram...............................................................................................27

Figure 6.3. Algorithmic State Machine of the Slotted Transmission module - part 1 ............................................28

Figure 6.4. Algorithmic State Machine of the Slotted Transmission module - part 2 ............................................29

Figure 6.5. Main hardware blocks of the Slotted Transmission module ...............................................................30

Figure 6.6. Data path block of the Slotted Transmission module .........................................................................31

Figure 6.7. Main hardware design blocks of the Slotted Transmission module....................................................34

Figure 6.8. Clock difference hardware design block of the Slotted Transmission module....................................35

Figure 6.9. Comparator blocks for the Padding hardware design block and the Slot Transmission hardware

design block of the Slotted Transmission module ................................................................................................35

Figure 7.1. Stratix GX Serializer and Deserializer ................................................................................................36

Figure 7.2. Finite State Machine of the GXB Transmitter Interface ......................................................................37

Figure 7.3. Hardware block diagram of the GXB Transmitter Interface ................................................................39

Figure 7.4. Finite state machine of the GXB Receiver Interface...........................................................................40

Figure 7.5. Packet Alignment Scenarios with Four Segments Packets ................................................................41

Figure 7.6. Hardware block diagram of the GXB Receiver Interface ....................................................................43

Figure 8.1. AAPN slot receiver block diagram ......................................................................................................45

Figure 8.2. AAPN slot receiver processor block diagram .....................................................................................45

Figure 8.3. Slot receiver processor algorithmic state machine .............................................................................49

Figure 9.1. Architecture of the AAPN demonstrator optical core switch ...............................................................50

Figure 9.2. Bench top assembly of the AAPN optical core switch ........................................................................52

Figure 10.1. Loopback test configuration setup....................................................................................................52

Figure 10.2. Loopback test configuration using core switch setup .......................................................................53

Figure 11.1. View of the FPGA component of the AAPN edge node for the optical loopback test .......................55

List of tables

Table 4.1. Description of the field contents of the Ethernet payload in the PC-FPGA interface ...........................13

Table 4.2. Clock and word size parameters for the optical transmission ..............................................................15

Table 4.3. Description of the time slot contents in the optical domain ..................................................................16

Table 5.1. Register X-1: READ ADDRESS Register ............................................................................................20

Table 5.2. Register X-2: WRITE ADDRESS Register...........................................................................................21

Table 5.3. Register X-3: READ LENGTH Register...............................................................................................21

Table 5.4. Register X-4: WRITE LENGTH Register .............................................................................................22

Table 5.5. Register X-5: FLAG LENGTH Register ...............................................................................................22

Table 5.6. Register X-6: CONTROL Register.......................................................................................................23

Table 5.7. Register X-7: STATUS register............................................................................................................24

Table 6.1. Top level entity signals of the Slotted Transmission module ...............................................................32

Table 6.2. Data path signals of the Slotted Transmission module........................................................................32

Table 6.3. Control path signals of the Slotted Transmission module....................................................................33

Table 8.1. Slot receiver processor signal definitions ............................................................................................47

Table 8.2. Modifiable parameters in the slot receiver processor ..........................................................................47

Table 9.1. Optical core Switch Connectivity .........................................................................................................50

Table 9.2. AAPN core switch truth table settings..................................................................................................51

Table 10.1. Received word sequence in the loopback test configuration .............................................................53

1. Objectives

Within the context of the AAPN Research Network [1], the objective for the implementation of the Demonstrator

Prototype (Theme 3) is to build a scaled-down version of the AAPN to demonstrate that the ideas developed

under the AAPN project are of practical interest by showing that the component technologies developed under

Theme 2 "Enabling Technologies" and the network architectures, optimization methods and control protocols

developed under Theme 1 "Architectures and Networks" can be combined into an operational network.

The Demonstrator Prototype has been designed to be a combination of software and hardware functionalities.

The objectives for the hardware functionality are:



To implement the functions of the edge node that require high speed and precision:



Slotted optical transmission (E-O conversion),



Burst-mode reception from the optical core (O-E conversion) and



Reconfiguration of the core photonic switch



To implement and test a fast photonic switch at the core node

2. AAPN Demonstrator Prototype Overview

2.1. AAPN architecture overview

An Agile All-Photonic Network, AAPN [2], is a wavelength division multiplexed network that consists of several

overlaid stars formed by edge nodes interconnected by bufferless optical core nodes (Figure 2.1). The edge

nodes aggregate incoming traffic in larger size data units called “slots” and transmit them over the photonic links.

The core nodes consist of a wavelength stack of bufferless transparent photonic switches (a set of space

switches, one switch for each wavelength as shown in Figure 2.2) and perform fast switching in order to provide

bandwidth allocation in sub-wavelength granularities.

Edge Node

Core Node

(a) (b)

Figure 2.1. AAPN Architecture: AAPN Architecture: (a) Overlaid star topology that characterises an

Agile All-Photonic Network. (b) Use of selectors in AAPN to allow for larger numbers of edge nodes

Most proposals for photonic networks architectures envisage a mesh topology, which distributes the load over

many switches. In the AAPN, the core switches have enormous capacity, which permits the simpler overlaid star

topology to compare favourably with mesh architectures [3]. The AAPN is mainly characterized by its “agility”;

that is, the ability to rapidly adapt bandwidth allocation as the traffic demand varies, which is possible to achieve

since the core switches are fast (1µs switching time). Moreover, routing in a star network is trivial and, since the

core is all photonic, the costly optics-electronics-optics conversions are completely eliminated.

WDM links

…

Core Node

Figure 2.2. AAPN Core node architecture.

To share the network bandwidth, the AAPN may operate in two modes: Time Division Multiplexing (TDM) or

Optical Burst Switching (OBS). In TDM mode, time is ‘slotted’ and the slots arrive at the core switch

synchronously (slots are of fixed size); while in OBS mode the slots/bursts arrive at the core switch

asynchronously (slots may be of variable size). In both cases, a centralised or distributed scheduling method is

necessary to allocate the available bandwidth and solve contention among the edge nodes.

In a TDM-AAPN, the slots are a fixed size of 100kb (10µs at the target link rate of 10Gbps) and all arrive

synchronously at the core switch from the various edge nodes. Transmission schedules at the edge nodes must

therefore be appropriately coordinated with the core switch configuration schedule in order to transfer slots to

their correct destinations. Propagation delays must be taken into account. To allocate variable bandwidth

between pairs of edge nodes (flows), the schedules must be computed such that a larger or a smaller number of

time slots within a (possibly repeating) frame are assigned to flows with a higher or a lower bandwidth demand,

respectively. With a centralised approach, this involves the edge nodes sending measures of incoming traffic or

bandwidth requests to the core node, which then computes a frame schedule that adapts the core network

bandwidth to the different measures/requests. The response of the centralised system is therefore delayed at

least by the maximum edge-core round-trip propagation time plus the computation time of the schedule at the

core node.

Several centralised (e.g. [4] [5] [6]) and distributed (e.g. [7]) bandwidth sharing methods have been designed in

the context of a TDM-AAPN. In all these scheduling methods, the queueing structure in the source module

consists of a set of simple buffers/queues, organized as virtual destination queues i.e. one queue in a source

node for each of the destination edge nodes. Traffic measures and/or bandwidth requests may be computed

from the length of these queues. In the destination modules, buffers are required as well for the process of

deaggregation of slots back into the external (IP or MPLS) packets and these are conveniently organized as

virtual source queues; i.e., one queue in a destination node for each of the source nodes.

2.2. Demonstrator Prototype architecture

The Prototype architecture, as shown in Figure 2.3, consists of 4 edge nodes, with one of them being the master

edge node used to control the fast photonic switch at the core node (i.e., the core node is an edge node with

extended control functionality). Each edge node consists of a combination of software functions running on a PC

and hardware functions operating on a Stratix GX FPGA development board [8].

The optical transmission/reception is performed at 1Gbps using small-form pluggable (SFP) multi-service

agreement (MSA) transceivers made by Fujitsu [9] that are plugged into MSA cages on the FPGA development

board. The optical core switch is implemented with Civcom's Free-X 2x2 Solid Free Space optical technology

switches [10].

Only one wavelength is used for the prototype and it is assumed that no separate control channel would be

available between the edge and core switches; the necessary signalling information will be sent over the data

channel provided by the AAPN network.

A distributed software Control Platform that controls the high level functions of the core, the multiplexers and

edge nodes of an AAPN network, (e.g. bandwidth allocation, traffic monitoring and network synchronization) has

been implemented separately ([11][12]) from the work presented in this technical report. This software runs in

the PC module of the edge/core nodes. The Control Platform is easily adapted to different AAPN demonstrators

since the majority of the software modules are independent of the nature of the particular optical

transmission/switching components and because only the control interfaces to the optical components must be

adapted to the control protocols supported by them.

Figure 2.3. AAPN Demonstrator Prototype architecture. E1 corresponds to the master edge node.

2.3. Edge Node Architecture in the Demonstrator Prototype

The main functional blocks of the prototype edge nodes are shown in Figure 2.4. The diagram shows as well the

type and form of the information interchanged between the different blocks. The high level network functions

implemented on the PC are the synchronization among edge nodes, traffic monitoring, bandwidth allocation, etc

(Figure 2.5). The slotted transmission scheduling is therefore dictated by the software modules in the PC, while

the actual execution of it is done with the FPGA.

traffic

aggregation

virtual

queues

of slots

optical

transmission

add padding

LAN

Ethernet

PC with

apps

PC with

apps

optical

reception

bandwidth

request

functions

Ethernet

bandwidth

assignment

functions

FPGA Development Board

reception

slot

queues

all other

control

functions

EDGE NODE

control

slots electrical

slots optical

LAN traffic

1 Gbps

switch configuration

and timing

Time

Transmit

remove

padding

CRC

compare

1 Gbps

extract

PC time

Extract

TimeToTransmit

and Switch

Configuration

traffic

extraction

MPLS

router

traffic

monitoring

add CRC

introduce to

Ethernet

payload

extract

Ethernet

payload

add

PC-FPGA

clock

difference

Figure 2.4. Block diagram of the main functions in the edge node of the demonstrator prototype.

Figure 2.5. Software architecture of a master node and a slave node [11]

Other low-level hardware high speed functions implemented in the FPGA are the following:



Fast control interfaces for optical burst-mode transceivers and optical core switch



Addition of padding to the AAPN data slots



Cyclic Redundancy Check for optical transmission

At the HW functionality level, the Core and Edge nodes implementations are the same. At the core node, the

output port carrying the switch configuration signal will be connected to the driver of the photonic core switch,

while at the edge nodes this port will be simply not used.

Integrating and interfacing the PC with the FPGA require the following two functions:



Interchanging the AAPN data slots between the PC and the FPGA via and Ethernet link for

transmission/reception in the optical domain



Local PC-FGPA synchronization within the edge node

3. Hardware functionality overview

3.1. Edge node

The FPGA components have been designed to perform the low-level functions that require high speed operation

and high precision, such as the configuration of the optical core switch, slotted transmission, slot reception,

cyclic redundancy check, time stamping, padding, E/O and O/E conversions and local synchronization in the

edge node.

All the hardware functionality has been implemented on Stratix GX Development boards [8] (Figure 3.1) using

custom circuits developed using programmable logical elements on a System on Chip (SoC) field programmable

gate array (FPGA). The design of the custom circuits is implemented using a Mealy finite state machine model

programmed using VHDL code. Other elements within the design are implemented using compiled software

within a hard-core microprocessor on the FPGA using on-chip memory.

The following modules were built:

1) The PC-FPGA Interface module receives Ethernet frames from the PC, which contain the AAPN slots and

their respective control information in their payloads. The interface has two sub-modules implemented in the

Stratix FPGA. The first is implemented in the NIOS II RISC processor and receives the Ethernet frames. If found

error free, the payloads are sent to the second sub-module, who will then send all the information via an

asynchronous serial bridge to the Stratix GX FPGA where the rest of the modules of the HW functionality are

placed. Upon reception, the interface module creates an Ethernet frame whose payload is the APPN data slot

and its control information; and then sends the Ethernet frame to the PC component.

2) The Slotted Transmission module receives the AAPN slots and their control information, then sends the

data slots to the optical transmitter at exactly the time scheduled by the TDM allocation algorithm. It keeps track

of the PC-FPGA clock difference (used to keep the two components locally synchronized), and outputs the

control signals to configure the optical core switch when the edge node in question is the master edge node.

3) The Optical Transmission/Reception module first adds padding (to achieve the higher rate of the core) and

a Cyclic Redundancy Check (CRC) field to the AAPN slot and then performs the conversion to the optical

domain by operating a Fujitsu SFP transceiver [9] at 1Gbps using 8B/10B encoding. Two FIFOS are used for

clock domain crossing. For reception, a block of the the Stratix GX FPGA is programmed as a very simple burst-

mode receiver that achieves Clock Data Recovery (CDR) in 15µs. The receiving sub-module removes the

padding after calculating the CRC field from the received data.

4) The Slot Reception module retrieves the slots from the optical receiver, performs error validation with the

CRC field, timestamps the slot with the reception time, merges the local synchronization details into the slot and

sends the AAPN slot to the PC-FPGA interface module for further transmission to the PC component.

SFP

slots

Ethernet

port

Stratix GX

FPGA

Stratix

FPGA

Electrical

ports used to

configure the

optical switch

Figure 3.1. Stratix GX Development Board by Altera Corp. [8]

A dummy traffic generator and a simple analyzer have also been implemented in order to test each of the

modules separately. All the modules have been implemented and loop-back tests of modules 3) and 4) have

been successfully completed. We are currently debugging modules 1) and 2). The main challenges for the FPGA

work have been the lack of documentation for the development board and the incompatibility with previous

versions of the hardware code compilers.

Figure 3.2. SFP MSA Transceiver by Fujitsu Limited [9].

3.2. Fast optical core switch

The 4x4 core switch has been implemented with a Clos array of six small switches. Fast port-to-port non-

blocking interconnection is achieved using six Solid Free Space 2x2 optical switches bought from Civcom Inc.

[10]. Actuation to configure the switches is provided by six DC-coupled electrical signals generated by the

Slotted Transmission module in the master edge node. The truth table for the core switch configuration is

programmed in the software (PC) component of the edge node and the configuration information needed for

each AAPN slot is sent within its “control information” fields as will be described in the design specifications.

The Solid Free Space technology developed by Civcom Inc. allows a switching speed of 400 ns, which complies

with the AAPD design target of 1µs; however, it turned out to have a limited reconfiguration frequency of 6 kHz

arising from the electronic drivers of the switch. This became a major speed limitation for the prototype since

consequently only one slot can be transmitted every 166.67µs (as opposed to the 10µs AAPN design target).

Our current design defines a time slot of 200µs for this reason.

Figure 3.3. Free-X

2x2 optical switch by Civcom Inc. [10].

4. Design specifications

The design specifications are the result of continuous discussion among members of the software and hardware

teams over several months. They take into consideration both the characteristics and limitations of the software

and hardware equipment available to build the AAPN prototype.

4.1. PC-FPGA interface Ethernet frame

The contents of the Ethernet frames used in the PC-FPGA interface are shown in Figure 4.1 for the downstream

direction (towards the photonic core) and in Figure 4.2 for the upstream direction (towards the PC). The

CTRINFO set of bit fields contains the control information required for the accompanying AAPN Data Slot and

is relevant only to the PC-FPGA communications. The AAPN Data Slot set of fields is the stream of bits to be

sent to the photonic core of the network at a particular time dictated by the scheduler. All the fields in the

Ethernet frame payload are described in Table 4.1.

8,160 bits

Ethernet Payload

Control Information AAPN Data Slot

160 bits

CTRINFO

8000 bits (250 x 32 bits)

DATA

32 bits

SwitchConfiguration

26 bits

(empty)

6 bits

S0 toS5

64 bits

PCSendingTime

64 bits

TimeToSendOptical

64 bits

TimeToSendOptical

64 bits

TimeReceivedOptical

...

other data

Figure 4.1. PC-FPGA interface: Ethernet payload contents in the downstream direction.

In the upstream direction, the CTRINFO set of fields are added -after- the AAPN block of data in order to

increase the speed of operation given that the CRC and the TransmissionError bit cannot be calculated until

all the bits of the AAPN DATA field have been processed.

8,096 bits

Ethernet Payload

AAPN Data Slot Control Information

8000 bits (250 x 32 bits)

DATA

96 bits

CTRINFO

CRC

64 bits

TimeToSendOptical

64 bits

TimeReceivedOptical

...

other data

1 bit

TransmissionError

31 bits

(empty)

64 bits

ClockDifference

Figure 4.2. PC-FPGA interface: Ethernet payload contents in the upstream direction.

Table 4.1. Description of the field contents of the Ethernet payload in the PC-FPGA interface

Field name Size

[bits]

Size

[words]

Description Comments

SwitchConfiguration

32 1 Set of bits used to configure the core

optical switch

Only the 6 least-significant bits are

used.

A full word is used to facilitate the

PC component in writing the value

on the slot field.

S0, S1, S2, S3, S4,

S5 6 - Actuation signals for the six 2x2

switches conforming the 4x4 core

optical switch

DC-coupled TTL electrical signal

outputs from the connector pin-outs

of the Stratix GX Development

Board

PCSendingTime 64 2 Clock value in the PC at the moment of

sending the Ethernet frame to the

FPGA

Time value in FPGA format (a

numerical unsigned integer value)

TimeToSendOptical 64 2 Time slot at which the AAPN data slot

must be transmitted to the optical core

of the network, as determined by the

scheduling algorithm running on the

Time value in FPGA format (a

numerical unsigned integer value)

It corresponds to the start of the

time slot, as opposed to the start of

the meaningful data

TimeReceivedOptical

64 2 Time at which the AAPN data slot is

received at the destination edge node.

It corresponds to the beginning of the

time slot in the optical domain.

Time value in FPGA format (a

numerical unsigned integer value)

In the downstream direction this

field is empty.

other data 7872 246 The rest of the data in the AAPN slot Irrelevant to the operation of the

FPGA component

CRC 32 1 Cyclic Redundancy Check of the data

transmitted optically, including the

padding

In the upstream direction, only the

least significant bit is used, but a

full word is still used to facilitate

extraction of the value in the PC by

respecting the “word” definition of

32-bits

TransmissionError 1 - A "0" indicates that there has been a

transmission error (ex digital decoding

errors; i.e. sampling the slot in the

wrong order, or synchronization in the

clocking between transmit and receive.

ClockDifference 64 - The difference between the

PCSendingTime in the last Ethernet

frame arriving from the PC and the

clock value in the FPGA at the time the

last Ethernet frame was sent to the PC

Time in FPGA format.

This value is used by the PC

control software to maintain

synchronization between the PC

and the FPGA components

A copy of the TimeToSendOptical field was added to the Control Information part of the Ethernet frame in the

downstream direction in order to facilitate the hardware operation. It is also important to note that the maximum

Ethernet payload is 400 x 32-bit words (12,800 bits) and, since the Ethernet network card pads the payload

automatically, there is no need to add another field in the upstream direction to make the implementation

symmetrical.

The TimeReceivedOptical field is defined as the time at which the AAPN data slot is received at the

destination edge node. As defined in the AAPN literature, it corresponds to the beginning of its assigned time

slot in the optical domain (as opposed to the beginning of the actual meaningful information being received).

4.2. Time Diagram of the time slot in the optical core

The target AAPN time slot length in the optical domain has been defined as 10µs (100kb) at 10Gbps in the

AAPN literature [2]. For this project, however, given the limitation set by the 6kHz maximum reconfiguration

frequency of the Free-X Civcom switches, the time slot for the prototype has been rounded up to 200µs as the

minimum achievable time slot is 166.67µs.

Figure 4.3 shows the timing diagram of slots in the optical domain for the downstream direction while Figure 4.4

shows the corresponding diagram for the upstream direction. The figures depict the relations between the field

contents of the AAPN slot and the transmission/reception time parameters.

The Guard Time is defined as the time during which no meaningful information can be transmitted/received. For

the AAPN prototype design the Guard Time is composed of:

1. the time taken by the optical switch to reconfigure (SwitchConfigurationTime), plus

2. the Clock Data Recovery (CDR) time; i.e., the time it takes the optical burst receiver to lock to the bit

stream being received

Time s

lot

in the optical domain

Optical

Data

TimeToSendOptical

Guard Time

TimeToSendOptical

Guard Time

Figure 4.3. Timing diagram in the optical domain, downstream direction.

Time

lot

in the optical domain

Optical

Data

Time

Received

Optical

Guard Time

Time when the data is received

at the optical transceiver

Figure 4.4. Timing diagram in the optical domain, upstream direction.

4.3. AAPN slot in the optical domain

The clock parameters and word size for the electrical and optical domains are shown in Table 4.2. Within this

context, the content of the AAPN time slot used for transmission/reception in the optical domain of the prototype

is shown in Figure 4.5.

Table 4.2. Clock and word size parameters for the optical transmission

Clock rate 5.00E+07 Hz

Clock cycle 2.0E-08 seconds

Electrical word size 32 bits

Optical word size 16 bits

Time slot length in the optical domain

2.00E-04 seconds

Guard Time

2.50E-05

Optical data transmitted

1.75E-04 seconds

SYNCH DATA PADDING CRC EOT

16 8000 131904 32 48 bits

8750 words

SCT CDR

1 500 8244 2 3 words

4.00E-07 2.46E-05 2.00E-08 1.00E-05 1.6488E-04 4.00E-08 6.00E-08 seconds

Electrical domain

4373 words

250 4122 1

words

Figure 4.5. Contents of the AAPN time slot

During the first stages of the project, padding was added to the AAPN slot in the optical domain in order to

compensate for a slight bottleneck that was foreseen in the Ethernet interface. The reason for the padding to be

so large at the end of the project was that, although the maximum line rate achieved by the Ethernet network

cards is 100 Mbps, the achieved line rate between the PC and the FPGA was measured as only ~10 Mbps. This

drastic bottleneck and thus limitation in the implementation was the result of two factors:



The Ethernet port on the Altera development board is linked only to one of the FPGA’s chipset (the Stratix

FPGA), which was used to extract and buffer the Ethernet data, and then transfer this data via a custom

designed interface to another FPGA chipset (the Stratix GX). The inclusion of this Ethernet data extraction

and hand-off forces a constraint in the throughput.



The PC-FPGA Ethernet interface on the Stratix chipset was designed to use an embedded microprocessor

(NIOS RISC processor) and off-chip memory. The direct memory access (DMA) turned out to be the main

slowing component of the interface, an issue that did not come to light until the last few stages of the project

timeline.

In the downstream direction, the cyclic redundancy check (CRC) bits at the end of the optical data transmitted are

calculated before transmission with the TimeReceivedOptical field set to zero. Upon reception at the

destination edge node, the CRC is calculated and checked -before- populating the TimeReceivedOptical

and the ClockDifference fields. The most significant bit of the CRC field is then set to 1 if there was a

transmission error in the optical domain (or left equal to 0 if the optical transmission had no errors).

Table 4.3. Description of the time slot contents in the optical domain

Field name Size

[bits]

Size

[optical

words]

Size

[electrical

words]

Description Comments

SCT - - - Optical switch configuration time

CDR - - - Clock Data Recovery time: time taken by the

optical burst-mode receiver to identify the

clock rate of the incoming bit stream

SYNCH 16 1 - Synchronization bits: A specific bit sequence

used to recognize the start of a block of

information

DATA 8000 500 250 AAPN block of data: the bit stream with the

meaningful network information

PADDING 131904 8244 4122 Bit sequence appended to the AAPN data in

order to fill the remaining space in the 200µs

time slot

A sequence of

alternating ones and

zeroes

CRC 32 2 1 Cyclic Redundancy Check Calculated for the DATA

and PADDING fields

EOT 48 3 - End Of Transmission: A specific bit sequence

used to indicate the end of the block of

information

5. PC-FPGA Interface

5.1. Overview

The PC interfaces with the development board via an Ethernet connection. To connect to the Ethernet hardware,

the Stratix FPGA was used. However, to connect to the optical transceivers the Stratix GX must be used, thus

the implementation required the use of both FPGAs on the development board. The Stratix FPGA communicated

with the PC over Ethernet and it communicated with the Stratix GX via a high speed serial link referred to as the

Asynchronous Serial Bus (ASB). Within the Stratix FPGA a SoC design was implemented with Altera’s SOPC

builder to link the PC to the Stratix GX FPGA, as shown in Figure 5.1. It contains an Ethernet interface, an

embedded NIOS II processor, a RAM and the custom designed ASB.

Figure 5.1. Block diagram of the PC-FPGA interface within the edge node prototype

5.2. Detailed description of the Ethernet interface

To process the data interchanged with the PC via the Ethernet interface, one could adopt either the software

approach or the hardware approach. Processing an Ethernet frame all by hardware (digital circuits) is

advantageous in terms of transmission speed; however, the development in general would take a much longer

time and larger manpower than the software approach. Altera provides a very easy soft processor (NIOS II) that

can be instantiated on an Altera FPGA device to implement the data processing functions using software in a

relatively short time. The instantiation even has some hardware modules in the NIOS system, such as DMA, that

can be used to speed up the data processing. Given the time constraint that the project team had, this was the

option chosen to design the interface. Chapter 11 describes some problems that were later found with this

approach.

Raw Ethernet frames are used between the PC and the Ethernet Interface (lan91c111). The Ethernet frame

structure is shown in Figure 5.2. The Ethernet interface program only deals with the following fields: Destination

Address, Source Address, Length, and AAPN Data. The AAPN Data field is processed as a whole piece without

knowing what is inside of the data structure.

Starting

Delimiter

Destination

Address

Source

Address Length

Payload

(AAPN Data Slot

Control Information)

Frame Check

Sequence

1 byte 6 bytes 6 bytes 2 bytes 46 - 1500 bytes 4 bytes

Figure 5.2. Ethernet frame field structure

The interface program can be roughly divided into three functional blocks as shown in Figure 5.3: Initialization,

Receiving Slot (from the PC) and Sending Slot (towards the PC).

Figure 5.3. Flowchart describing the Ethernet interface program

The Initialization Stage performs the following procedure:

1. initialize the operating system

2. initialize the internet interface hardware

3. initialize the drivers

4. initialize the Asynchronous Serial Bridge (ASB)

After the Initialization Stage, the software enters the communication part, which runs receiving and sending

functions simultaneously. The Sending Function (upstream direction, towards the PC) executes the following

steps:

1. wait for slot data from the GX FPGA

2. allocate packet memory

3. fill in the Destination Address with the PC’s MAC address

4. fill in the Source Address with the Altera board’s MAC address

5. fill in the length (check whether one slot = multiple packets)

6. use the DMA to transfer data from the ASB to the packet memory

7. send the packet to the PC using hardware DMA

The Receiving Function (downstream direction, from the PC to the FPGA board) executes the following steps:

1. wait for a packet from the PC

2. allocate packet memory

3. check whether the Destination Address and Source Address fields are correct

4. check the length of the data (check whether one slot = mutiple packets)

5. DMA the ethernet payload (control information plus slot data) in the buffer to ASB

The interface program employs the TCP/IP stack included in the NIOS operating system, where the following

packet buffers are used to store the incoming Ethernet packets:

extern queue bigfreeq; /* big free buffers */

extern queue lilfreeq; /* small free buffers */

All the received Ethernet frames are stored in a data structure defined as:

extern queue rcvdq; /* queue of all received (but undemuxed) packets */

When an Ethernet packet is received, the corresponding IRQ interruption of the NIOS operating system will be

launched to fill a buffer using DMA, and then stores it in rcvdq. The program thus keeps checking the length of

rcvdq to see whether there is any Ethernet frame received from the PC.

In the upstream direction, to check whether there is any slot data coming from the ASB, the following command

is used:

IORD_ASB_STATUS(ASB_BASE);

The Ethernet interface program iteratively checks the data from the ASB and the data from the PC and then

does the corresponding processing.

5.3. Detailed description of the Asynchronous Serial Bridge API

Packets stored in the on-chip RAM could be transferred to the Stratix GX by writing to the registers of the ASB.

A DMA is implemented within the ASB and only requires the start address, and length of the packet in order to

setup a transfer. Once the transfer is initiated the ASB starts buffering up the packet within a circular buffer. At

the same time, once at least one 32-bit word is buffered, the ASB hardware separates the word into 4 bytes and

sends them out over four separate 100 Mbps parallel to serial converters. To implement this, four wires that link

the Stratix to the Stratix GX are used, in addition to a reset wire.

On the Stratix GX side the packets are received asynchronously via four serial to parallel converters also running

at 100 Mbps. The bytes are reconstructed into 32-bit words and are buffered. The system has access to empty

and full flags in addition to a register that contains the number of words buffered. Packets sent from the Stratix

GX to the Stratix works much the same way except on the Stratix GX there is no DMA and the other application

hardware interface directly to the circular buffers.

When using the ASB within an embedded system, the embedded software must reference the “ASB.h” header

file:

#include "ASB.h"

This file provides the user with the necessary commands in order to work with the ASB. The commands

available are:

IORD_ASB_RD_ADDRESS(base)

IOWR_ASB_RD_ADDRESS(base, data)

IORD_ASB_WR_ADDRESS(base)

IOWR_ASB_WR_ADDRESS(base, data)

IORD_ASB_RD_LENGTH(base)

IOWR_ASB_RD_LENGTH(base, data)

IORD_ASB_WR_LENGTH(base)

IOWR_ASB_WR_LENGTH(base, data)

IORD_ASB_CTRL(base)

IOWR_ASB_CTRL(base, data)

IORD_ASB_FLAG_LENGTH(base)

IOWR_ASB_FLAG_LENGTH(base, data)

IORD_ASB_STATUS(base)

These commands read and write to the ASB’s registers:

READ ADDRESS Register

WRITE ADDRESS Register

READ LENGTH Register

WRITE LENGTH Register

FLAG LENGTH Register

CONTROL Register

STATUS Register

The registers and there associated commands are detailed in the following sections.

5.3.1. READ ADDRESS Register

The READ ADDRESS register, shown in Table 5.1, contains the starting memory address where data will

be read from for a transmission. To write to the register the IOWR_ASB_RD_ADDRESS(base, data) instruction

should be used. To read the register, the IORD_ASB_RD_ADDRESS(base) instruction should be used.

Table 5.1. Register X-1: READ ADDRESS Register

bit 31 - 0 RD_ADDRESS: Memory address of first word to transmit

Example Code:

// Assuming ASB_BASE represents the base address of the ASB

// Replace 0x010000 with the desired address

int x;

// write to register

IOWR_ASB_RD_ADDRESS( ASB_BASE, 0x010000 );

// read register

x = IORD_ASB_RD_ADDRESS(ASB_BASE);

printf( "RD_ADDRESS: %x\n",x);

5.3.2. WRITE ADDRESS Register

The WRITE ADDRESS register, shown in Table 5.2, contains the starting memory address where received data

will be written to. To write to the register the IOWR_ASB_WR_ADDRESS(base, data) instruction should be

used. To read the register the IORD_ASB_WR_ADDRESS(base) instruction should be used.

Table 5.2. Register X-2: WRITE ADDRESS Register

bit 31 - 0 WR_ADDRESS: Destination memory address of first word received

Example Code:

// Assuming ASB_BASE represents the base address of the ASB

// Replace 0x010000 with the desired address

int x;

// write to register

IOWR_ASB_WR_ADDRESS( ASB_BASE, 0x010000 );

// read register

x = IORD_ASB_WR_ADDRESS(ASB_BASE);

printf( "WR_ADDRESS: %x\n",x);

5.3.3. READ LENGTH Register

The READ LENGTH register, shown in Table 5.3, contains the number of words that will be read from memory

and transmitted. To write to the register the IOWR_ASB_RD_LENGTH(base, data) instruction should be used.

To read the register the IORD_ASB_RD_ LENGTH(base) instruction should be used.

Table 5.3. Register X-3: READ LENGTH Register

bit 31 - 16 Unimplemented: read as ‘0’

bit 15 - 0 RD_LENGTH: The number of words to be read sequentially from

memory and transmitted.

The largest value it can store is 65,535.

At the time of reading the register will always contain a value on

number lower than what was written in. This is done to make the

hardware simpler.

Example Code:

// Assuming ASB_BASE represents the base address of the ASB

int x, txlen;

txlen = 400;

// write to register

IOWR_ASB_RD_LENGTH( ASB_BASE, txlen );

// read register

x=IORD_ASB_RD_LENGTH(ASB_BASE);

printf( "RD_LENGTH: %d\n",x);

5.3.4. WRITE LENGTH Register

The WRITE LENGTH register, shown in Table 5.4, contains the number of received words that will be written into

memory. To write to the register the IOWR_ASB_WR_LENGTH(base, data) instruction should be used. To

read the register the IORD_ASB_WR_LENGTH(base) instruction should be used.

Table 5.4. Register X-4: WRITE LENGTH Register

bit 31 - 16 Unimplemented: read as ‘0’

bit 15 - 0 WR_LENGTH: The number of received words to be sequentially

written into memory

The largest value it can store is 65,535

At the time of reading the register will always contain a value on

number lower than what was written in. This is done to make the

hardware simpler.

Example Code:

// Assuming ASB_BASE represents the base address of the ASB

int x, rxlen;

rxlen = 400;

// write to register

IOWR_ASB_WR_LENGTH( ASB_BASE, rxlen );

// read register

x=IORD_ASB_WR_LENGTH(ASB_BASE);

printf( "WR_LENGTH: %d\n",x);

5.3.5. FLAG LENGTH Register

The FLAG LENGTH register, shown in Table 5.5, contains the minimum number of received words required to

trigger the SLOT_RDY flag. To write to the register the IOWR_ASB_FLAG_LENGTH(base, data) instruction

should be used. To read the register the IORD_ASB_ FLAG_LENGTH(base) instruction should be used.

Table 5.5. Register X-5: FLAG LENGTH Register

bit 31 - 16 Unimplemented: read as ‘0

bit 15 - 0 FLAG_LENGTH: The minimum number of received words required to

trigger the SLOT_RDY flag

The largest value it can store is 65,535

Example Code:

// Assuming ASB_BASE represents the base address of the ASB

int x, flaglen;

flaglen = 400;

// write to register

IOWR_ASB_FLAG_LENGTH( ASB_BASE, flaglen);

// read register

x=IORD_ASB_FLAG_LENGTH(ASB_BASE);

printf( "FLAG_LENGTH: %d\n",x);

5.3.6. CONTROL Register

The CONTROL register, shown in Table 5.6, is used to adjust the state of the ASB. It allows for the user to reset

or initialize the hardware, and start either reception or transmission. To write to the register the

IOWR_ASB_CTRL(base, cmd) instruction should be used, where cmd can be ASB_RESET, ASB_START,

ASB_GO_RX or ASB_GO_TX. To read the register the IORD_ASB_CTRL(base) instruction should be used. Note

that the ASB is full duplex, which means that transmission and reception can be executed simultaneously.

Table 5.6. Register X-6: CONTROL Register

bit 31 - 4 Unimplemented: read as ‘0

bit 3 RESET: Restart ASB

1 = Resets ASB hardware

0 = Normal operation

bit 2 START: Initialize ASB

1 = Puts ASB to listening state

0 = Normal operation

bit 1 GO_RX: Initiate reception

1 = Start reception

0 = ASB reception idle

Used as signal and is not registered, thus reading it will always

produce a ‘0’.

bit 0 GO_TX: Initiate transmission

1 = Start transmission

0 = ASB transmission idle

Used as signal and is not registered, thus reading it will always

produce a ‘0’.

Example Code:

// Assuming ASB_BASE represents the base address of the ASB

/* Initialize Hardware. */

IOWR_ASB_CTRL(ASB_BASE, ASB_RESET);

IOWR_ASB_CTRL(ASB_BASE, ASB_START);

// receive

IOWR_ASB_CTRL( ASB_BASE, ASB_GO_RX);

// transmit

IOWR_ASB_CTRL( ASB_BASE, ASB_GO_TX);

5.3.7. STATUS Register

The STATUS register, shown in Table 5.7, contains the status of the ASB, the number of words available to read,

the slot ready flag and busy flags. The STATUS register can only be read by the user’s application and not be

written to. To read the register the IORD_ASB_STATUS(base) instruction should be used.

Table 5.7. Register X-7: STATUS register

bit 31 - 13 Unimplemented: read as ‘0’

bit 12 - 3 DATA_RDY: Number of words availably to read

bit 2 SLOT_RDY: Slot ready

1 = number of words available >= flag_length

0 = number of words available < flag_length

For the SLOT_RDY flag to be triggered there must be more than zero

words available, even if the flag_length register is set to zero

bit 1 WR_BUSY: Write busy

1 = ASB busy writing received data to memory

0 = ASB reception idle

bit 0 RD_BUSY: Read busy

1 = ASB busy writing transmitting data from memory

0 = ASB transmission idle

Example Code:

// Assuming ASB_BASE represents the base address of the ASB

int status;

status = IORD_ASB_STATUS(ASB_BASE);

printf( "STATUS: [SLOT_RDY:%d][RX_BUSY:%d][TX_BUSY:%d][nwords:%d]

\n",(status&4)>>2,(status&2)>>1,status&1,(status)>>3);

6. Slotted Transmission

6.1. Overview

The slot transmission for an AAPN slot is primarily responsible for taking an AAPN data slot received from the

higher level aggregation software on the PC, digitally processing the contents of the slot and forwarding it at the

scheduled time to an optical transceiver where the digital stream is converted to an optical signal. The details of

this chapter will be focused on the Slotted Transmission module – the point just before the digital data is

forwarded to the optical transceiver.

ASB FIFO

Figure 6.1. Main functional blocks of the Slotted Transmission module

The Slotted Transmission module retrieves data from the Asynchronous Serial Bridge and stores them in a large

FIFO created to hold more than one block of data, which are then processed one by one. Each block of data

consists of Control Information (SwitchConfiguration = 32 bits, PCSendingTime = 64 bits and

TimeToSendOptical = 64 bits) and the AAPN data slot (TimeToSendOptical, TimeReceivedOptical

and "other data"). The Slotted Transmission module controls the overall transmission (including configuration of

the optical switches) to make sure that the AAPN slots are sent to the core at the right scheduled time and after

the appropriate guard times. It also makes sure that padding bits and a cyclic redundancy check (CRC) field are

included in the data transmitted.

The processing requires the appropriate finite state machine architecture to control the type of data that is

inserted into the slot. The slot transmission on the FPGA needs to consider transmission allowance control for

the outgoing slot from a large buffer to respect the mismatch in the clock domains. The main functional blocks of

the Slotted Transmission are the Control and Data blocks, as shown in Figure 6.1.

6.2. Detailed description of slot transmission

Once the AAPN data slot is retrieved from the PC-FPGA interface, the Slotted Transmission module extracts the

SwitchConfiguration information for the optical switch from the Control Information data fields, it then saves

the PCSendingTime which is used to calculate the clock difference between the PC and FPGA and stores it in

a 64 bit register. This information is necessary for synchronization between PC and FPGA time: when any other

different data slot is received, during its upstream transmission towards the PC, the clock difference is added to

the control information by the Slot Reception block. In addition to the previous functions, Slotted Transmission

also extracts the TimetoSendOptical information which is used to enable the transmitter and make the data

available to the Optical Transmission module at the appropriate time dictated by the schedule calculated in

software.

The TimeToSendOptical is crucial element in Slotted Transmission; and entails taking into account all

possible scenarios of incoming data, such as:

1. When TimetoSendOptical < FPGA Time, the packet of data is dropped because its time to be

transmitted to the optical core has passed and transmitting it would most probably cause a collision.

2. When TimetoSendOptical = FPGA Time, the Slotted Transmission module performs the following:

a) The SwitchConfiguration information is loaded (i.e. optical switches are configured),

b) The Guard Time is accounted for (an FPGA counter is activated as soon as the switch is loaded

– at TimetoSendOptical) , and finally

c) The optical transmitter is enabled and data sent to the Optical Transmission module in the

following order: slot data, padding bits (1010….sequence) and CRC at the end of transmission

3. When TimetoSendOptical > FPGA Time, the Slotted Transmission module waits until the

TimetoSendOptical is equal to the FPGA time and then performs step 2 (a, b and c).

4. When TimetoSendOptical = 0 the Slotted Transmission block performs step 2 (a, b and c)

immediately. This option is used by the network synchronization protocols and other special cases

when the network is not operating in TDM mode.

Figure 6.2 provides a block diagram of the custom design of the slot transmission circuit within the FPGA. The

ASB FIFO shown in this figure is located in the Altera Stratix chipset on the development board. This chipset is

responsible for obtaining the slot information from the Ethernet interface. The information displayed to the right

of the ASB FIFO is integrated within the Stratix GX chipset on the development board. A custom state machine

provides control signals to each FIFO (ASB, Assembly and TCVR). The Assembly FIFO is dimensioned with

slightly more than twice the size (i.e. 512 words deep; where 1 word = 32 bits) of the slot. It is responsible for

temporarily buffering the slot for the duration of time that the state machine can decode the necessary slot

information for processing. The decoding of the information from the slot is executed by a synchronous Mealy

state machine that generates control signals for demultiplexing each slot data fields and the read and write

operations to the FIFO’s.

FPGA Time

(64-bit

Counter)

50 MHz

clock

UART FIFO TRCVR FIFO

ASSEMBLY FIFO (word

depth = 512

16384 packet / 32 bits) Assembly AAPN Tx

State Machine

(Provide control signals

to bypass or assemble

correct AAPN slot

information to the

transceiver interface)

Switch Status

PC Sending Time

TimeToSendOptical

TimeReceivedOptical

CRC

Calculator &

Comparator

50 MHz

clock

Sending Time

Current Time

Slot Payload

Slot Padding

CRC Value

Slot Payload

Clock

Difference

Calculator

50 MHz

clock

Tx Enable

Calculator

50 MHz

clock

Switch Config Signal

ASB FIFO

Figure 6.2. AAPN slot transmission block diagram

The control information is processed first (refer to Figure 4.1). The switch configuration state

(SwitchConfiguration) is the first element decoded and the 6-bit signal is latched. The latch is required in

order to provide a constant voltage to the optical switch drivers. The PCSendingTime data from the slot is then

decoded. This data field is transmitted to the “Clock Difference Calculator”, who takes the current FPGA time

(obtained from a 64-bit counter) and subtracts the PCSendingTime to calculate the ClockDifference field.

The “Clock Difference Calculator” block is a 64-bit subtractor circuit and the output is latched to allow the data to

be retrieved by the “Slot Receiver” module.

The first instance of the TimeToSendOptical data field from the slot is demultiplexed and sent to an input of a

64-bit adder circuit. The second input to the adder block is a hard-wired value (constant = 0xC4E2), which

represents the number of 20 ns clock cycles needed to conform the guard time (switch reconfiguration time plus

clock data recovery time), which is needed for the optical receiver to operate properly at the edge node. This

time is compared with the current 64-bit FPGA time (counter) and as soon as the times equal, the transmitter

enable signal is set HIGH, to allow the actual AAPN slot data to be transferred to the optical transceiver block.

The remaining data fields that are then decoded represent the actual AAPN data slot and their contents are just

transferred to the optical transceiver module unchanged and unprocessed. The state machine selects the

appropriate port on the multiplexer to arrange that the correct sequence is transmitted to the transceiver FIFO

(refer to Figure 4.1 and Figure 4.5):

1. The second instance of the TimeToSendOptical

2. The TimeReceivedOptical

3. The rest of the meaningful data sent by the PC ("other data")

4. The padding generated to fill the 200µs optical slot, a sequence of “1’s” and 0”s (4122 words)

5. The 32-bit Cyclic Redundancy Check value to provide bit verification at the receiver

Three of the fields (SwitchConfiguration, PCSendingTime, TimeToSendOptical) in the control

information just processed are then discarded and the ClockDifference field is stored in the FPGA for future

use. The cyclic redundancy check method used is described in [13] [14] and [15].

6.2.1. Algorithmic State Machine

Figure 6.3 and Figure 6.4 show the Algorithmic State Machine (ASM) of the module, which depicts all the system

operations performed by the module. The top level entity signals shown in the ASM are defined in Table 6.1, the

data path signals are defined in Table 6.2 and the control path signals are defined Table 6.3.

Figure 6.3. Algorithmic State Machine of the Slotted Transmission module - part 1

Figure 6.4. Algorithmic State Machine of the Slotted Transmission module - part 2

6.2.2. Hardware blocks

Figure 6.5 depicts the main hardware blocks of the Slotted Transmission module and the Data Path block is

shown in Figure 6.6. Please zoom in your computer for a detailed view of the figures.

Figure 6.5. Main hardware blocks of the Slotted Transmission module

Figure 6.6. Data path block of the Slotted Transmission module

Table 6.1. Top level entity signals of the Slotted Transmission module

Signal name Type Description

clock input, std logic 50 MHz clock

FPGA_Time input, std logic vector 64 bit actual FPGA time

tx_full input, std logic status signal from the transceiver

wrreq output, std logic write request control signal to the Transmitter module

data_in input, std logic vector 32 bit data input from ASB FIFO

mux_data_out output, std logic data out from the Slotted Transmission module

CD_result output, std logic vector 64 bit result of Clock Difference, to be read by the Receiver module

Switch_config

output, std logic vector 6 bits output signals to optical switches

en_tx output, std logic output enable signal to Transmitter module

CRC_out output, std logic vector 32 bit output CRC result to Transmitter module

RDA input, std logic input signal indicating Read Data Available from ASB FIFO

Table 6.2. Data path signals of the Slotted Transmission module

Signal name Type Description

clock input, std logic clock

FPGA_Time input, std logic vector 64 bit actual FPGA time

reset input, std logic reset signal

rst_counter input std logic resets all module counters

rdreq input std logic read request control signal

ld_switch input std logic control signal to the temporary switch register

ld_tx_time_msb input std logic control signal to the tx time msb register

ld_tx_time_lsb input std logic control signal to the tx time msb register

ld_pc_time_msb input std logic control signal to the pc time msb register

ld_pc_time_lsb input std logic control signal to the pc time lsb register

ld_CD input std logic control signal to the clock difference register

ld_main_sw_cofig

input std logic control signal to the main switch configuration register

en_CRC input std logic enable signal for CRC

en_slot_counter input std logic enables slot counter

en_pad_counter input std logic enables pad coutner

mux_sel input std logic vector 2 bit control signals for multiplexer

slot_rdy out std logic status signal indicating that slot is ready

time_to_tx out std logic status signal indicating the time when switch is loaded

rdy_to_tx out std logic enable transmitter signal

slot_sent out std logic status signal indicating that the slot data is sent

pad_sent out std logic status signal indicating that the pad bits are sent

data_in input std logic vector 32 bit input data

mux_data_out output std logic vector 32 bit output data from the multiplexer

CD_result output std logic vector 64 bit clock difference result

Table 6.3. Control path signals of the Slotted Transmission module

Signal name Description

clock, reset global signals

tx_full status signal from the Transmitter block

wrreq control signal to the Transmitter block

slot_rdy, time_to_tx, rdy_to_tx

status signals from data to Control Path

slot_sent, pad_sent output status signals from Data Path

rst_counters, rdreq, ld_switch output control signal

ld_tx_time_msb, ld_tx_time_lsb output enable signals for Time to Send Optical

ld_pc_time_msb, ld_pc_time_lsb output enable signals for PC Sending Time

ld_CD, ld_main_sw_config output enable signal for the CD result and switch configuration

en_tx, en_CRC output signals to Transmitter block

en_slot_counter,en_pad_counter enable signals to slot and pad counters

mux_sel select signals for multiplexer to Data Path

The hardware design blocks of the Slotted Transmission module are shown in Figure 6.7 (main blocks), Figure

6.8 (Clock Difference block) and Figure 6.9 (Padding block and Slot Transmission block). Please note that the

ASB is called "UART" here.

Figure 6.7. Main hardware design blocks of the Slotted Transmission module

Figure 6.8. Clock difference hardware design block of the Slotted Transmission module

Figure 6.9. Comparator blocks for the Padding hardware design block and the Slot Transmission

hardware design block of the Slotted Transmission module

7. Optical Transmission and Reception

7.1. Overview

Optical transmission is the process of transmitting a slot at a particular edge node towards the optical core of the

network by performing a parallel-to-serial conversion (SERDES), encoding it using 8B/10B and convert the

digital data in logical electrical bit form to an equivalent digital bit form but in the optical domain. The electrical-to-

optical conversion is done by a distributed feedback laser (DFB), directly modulated within a pluggable

transceiver module. At the destination edge node, the light is captured with the use of PIN diodes which converts

the optical photon energy back into photocurrent. The photocurrent can then be converted to an electrical

voltage which can be interpreted as digital pulses of the data stream.

The process of conversion to the optical domain modulates the bias current of the laser by a small percentage

and in doing so, light energy experiences amplitude dependence proportional to the bit stream on/off sequence.

These optical components and the driving circuitry reside within a compact module form-factor called small-form

pluggable (SFP) transceivers. The SFP contains the electronic circuitry need to drive and modulate the laser and

convert the photocurrent from the receiver to the necessary voltage. This section will not go into detail of this

process; rather it will focus on detailing the digital state machine needed to provide data directly to the SFP from

the FPGA and the configuration of the SFP.

7.2. Detailed description

The FPGA development board provides direct access to the SFP transceivers via an AC-coupled differential

interface at 100 ohm impedance from the Stratix GX FPGA. The differential pair provides noise immunity from

any common-mode noise that may appear between the devices. The optical interface has a limitation with line

rates between 622 Mbps up to 2.7 Gbps. The SFP does, however, allow both pre-emphasis, programmable

terminations and an equalizer to account for any data skew that may occur on the evaluation PCB interface of

both devices. The effects were minimal and no pre-emphasis was provisioned on the transmitter.

In order to properly receive the incoming data stream, a clock must be derived so that the data can be sampled

appropriately. To derive such a clock the output voltage produced by the SFP receiver is provided to the Stratix

GX FPGA and an internal programmable digital clock-data recovery circuitry (CDR) is used. The CDR is itself a

phased-lock loop (PLL) using an internal oscillator on the board as a reference clock. The frequency difference

threshold for the CDR (i.e., the allowed difference between the reference and the data) was provisioned to be

1000 ppm for the demonstrator to allow for low-quality oscillators used on the evaluation board. When the CDR

synchronizes onto the incoming data, a “lock” status signal is raised.

Figure 7.1. Stratix GX Serializer and Deserializer

On the transmitter side, a phased-lock loop is used to derive a stable clock for the data transmission. The PLL

here derives both a “fast clock” and a “reference clock”. The global clock was chosen to drive the PLL because it

has the same frequency as the internal logic of the FPGA. The maximum frequency of the digital transceiver

interface was proven to be close the 100 MHz in simulation. The “fast clock” is used to transmit the data of the

fast register in the serializer and the reference clock is used for the rest of the digital components of the

transceivers including the CDR.

The data stream is serialized prior to the transmission and deserialized at the reception because the maximum

frequency of the digital logic is a lot lower than the serially transmitted data. As shown in Figure 7.1, the high

speed clock of the deserializer on the Stratix GX FPGA is obtained by the received data stream using the CDR

and the low speed clock is derived by dividing the fast clock by 10 in this case. The deserializer of the receiver

shifts the data serially at the reception in the registers driven by the fast clock. The slow clock loads the data

present inside the register in parallel in order to be sent to the other digital components of the receiver.

The serializer of the transmitter requires a PLL that is able to derive a fast and a slow clock. As shown in Figure

7.1, the slow clock loads the parallel data from the digital components of the transceiver into fast registers of the

serializer every n cycles, in this case n=10. The fast clock shifts the bits present in these registers to be

transmitted.

The FPGA also contains an 8B/10B encoder/decoder that is used to encode the digital data stream before it is

serialized, and decode it after it is deserialized. The encoding converts an 8-bit data sequence to a 10-bit data

sequence by stuffing bits with a specific coding algorithm to allow removing cases when there are long

sequences of “1’s” and “0’s” in the data; a situation that is detrimental to a receiver due to the nature of the clock

recovery circuitry: the circuitry expects equally probable transitions of “1” and “0” but, in the event of long train of

“1’s” or “0’s”, the signal level produces a “DC” voltage which causes the CDR circuitry to fail in deriving a clock

from the data.

7.3. GXB Transmitter Interface

To follow the transmission protocol, a Finite State Machine (FMS) has been designed that indicates to the

transmitter when to send the synchronization stream, the end of transmission stream and the actual data.

As shown in Figure 7.2, the initial state after reset is DISABLE. In this state the transmitter sends the EOT

stream (see figure Figure 4.5. Contents of the AAPN time slot) continuously since the current transmission is

finished and there is no data to be sent. The receiver is informed that the current transmission is done and does

not wait for more packets (during this time it is possible to perform a switching operation over the medium).

Figure 7.2. Finite State Machine of the GXB Transmitter Interface

When the transmitter is enabled it goes into the WAITING state, in which synchronizing streams (SYNCH) are

sent. This state is used to synchronize the receiver before sending packets and when the transmission FIFO is

empty (i.e. when the transmitter is idle). The transmitter can be forced to go into this state even if the

transmission FIFO is not empty when the send_sync signal is logic ‘1’. As it can be seen the send_sync signal

cannot hold the transmitter in the WAITING state for more than one clock cycle, which ensures that the

transmitter is not idling when there is data to be transmitted.

When the transmission FIFO is not empty, the next state the FSM goes into is the transmit state (TX), in which

one segment of a packet is sent per clock cycle. The transmitter goes into the TX_DONE state after the

transmission of the last segment. This state is used to inform that the transmission of a packet is completed and

afterwards if the transmitter is enabled it goes into the TX or WAITING state. The transmitter goes back in the TX

state if the transmission FIFO is not empty or no request to send a synchronized stream has been done;

otherwise, it goes into the WAITING state. When the transmitter is disabled, it goes directly into the DISABLE

state. In order to ensure completion of the current packet transmission, there is no transition from the TX state to

the DISABLE state.

The diagram of the GXB hardware block is depicted in Figure 7.3. The only signals available to the user are the

following: enable, reset, clk, send_sync, data_in, active, load, analyzer_req,

tx_ctrl_enable and data_out. The FSM is shown at the top of the diagram with its input and output

signals indicating the current state. The block diagram contains a counter, comparator, register, encoder and

multiplexers which are required to transmit according to the protocol.

The counter in Figure 7.3 is used to know when the last segment will be transmitted. A comparator is used to

verify if the counter output (count) is equal to the number of segments minus two, since it starts to count at 0. If

the count value is equal to the next to last segment, the last_segment signal has a logical value of ‘1’, else it

is ‘0’. The same signal can be used to request the traffic generator to generate the next packet (it takes a clock

cycle for the generator to generate the next packet), together with the signal analyzer request signal

(analyzer_req). This counter increases when it is in the TX state, else it is synchronously cleared.

Prior to the transmission, the current packet to be transmitted is loaded onto the transmitter buffer register

(tx_buffer). For simplicity, the content of this register is loaded when the state is not TX. In the TX state, the

data is shifted to the right by 16 bits which is equal to the segment size. The 16 LSBs, tx_buffer[n-1..0]

are used for the transmission of the current segment.

An encoder is used to generate the select signal for the data stream multiplexer (whose output is data_out),

which also selects the appropriate type of character, tx_ctrlenable, to be sent (data or control). When the

transmitter is in states TX and TX_DONE, the input 0 of the multiplexers is selected. The character sent over

tx_ctrlenable has a logical value of ‘0’ and the tx_buffer[n-1..0] output is selected in order to send a

segment of the packet over data_out.

In the DISABLE state, the transmitter must send the EOT stream and thus the output of the multiplexer must be

‘1’. The tx_ctrlenable will have a logical output of ‘1’ because the EOT_STREAM is composed of control

characters. In the WAITING state, the multiplexer’s input 2 is selected in order to send the synchronization

stream (SYNC_STREAM) over data_out with tx_ctrlenable to a logic ‘1’ to indicate that control characters

are sent.

The load signal is used as a read signal for the transmission FIFO (data_in) in order to have a new packet

loaded into the tx_buffer. The load signal is logic ‘1’ when the current state is TX_DONE or WAITING, the

transmission FIFO is not empty and the transmitter is enabled. When a request to send a synchronization stream

is done in the TX_DONE state, the load signal is logic ‘0’. Without this extra verification, a packet would be read

in the TX_DONE state and then another in the WAITING state with one read in the TX_DONE state which

results in packets lost. The verification of the enable signal is needed because the transmitter does not send

any packet when it is disabled.

The active signal indicates when the transmitter is not in the disable state. As it was mentioned previously, the

transmitter does not go in the DISABLE state before the current transmission is completed. When the active

signal is logic ‘0’, it indicates the transmitter is disabled and transmits the end of transmission stream.

The VHDL code associated with the block diagram details the actual usage. In it, the number of segments

variable (NB_OF_SEGMENTS) specifies how many segments form a packet and its minimum value is four. The

variable NB_OF_SEGMENTS_LOG2 is the logarithm base two of the variable NB_OF_SEGMENTS. The

variable SEGMENT_SIZE has to be set to 16 because two 8-bit characters are sent every clock cycle.

Figure 7.3. Hardware block diagram of the GXB Transmitter Interface

7.4. GXB Receiver Interface

The receiver obeys the transmitting protocol using the FSM of Figure 7.4. When the reset signal is applied to

the receiver, the FSM goes to the DISABLE state, where the receiver waits to be enabled (sync_enable);

receiving a signal (i.e. no Loss of Signal (LOS), sync_los is logic ‘0’); or its Clock Data Recovery (CDR) to be

locked on the incoming data stream which is indicated by the sync_pll_locked signal. The receiver uses a

metastable harden flip-flop to ensure proper Clock Domain Crossing (CDC) of these signals. The CDR is locked

to the data only when sync_p11_locked is logic “1”. LOS is indicated with sync_los being set to logic ‘1’.

The receiver is enabled when sync_enable is logic ‘1’. The enable signal has to be CDC because the receiver

constantly changes of transmitting source which are in a different clock domain as the signals sync_los and

sync_pll_locked. The signals before the CDC had the prefix async instead of sync. These prefixes stand

for asynchronous and synchronous to the receiver clock domain. In any state, if the receiver is either disabled,

loses its incoming signal or is not locked to the incoming data stream, it will go into the DISABLE state.

When the receiver is enabled and it receives a signal with the CDR lock on the data, it goes into the SYNCHING

state. In this state, the receiver waits for a valid synchronization stream to ensure the received words are

properly aligned. A valid synchronization stream is made of the character K28.5 followed by K28.0 which results

in the stream h”1CBC”. When the character K28.5, sync is equal to b“01”, is detected in the eight LSBs and the

eight Most Significant Bits (MSB) contain the character K28.0 which is indicated by the signal sync_detect[1],

the receiver can go in the RECEIVING state. The other possible transition to the RECEIVING state is done by

the detection of the character K28.5, sync[1], in the eight MSBs. In the RECEIVING state, the receiver will

verify if the control character K28.0 is present in the 8 LSBs, sync_detect[0], and the sync_ver signal is set.

The reception of two consecutive alignment characters, K28.5, is an example of protocol error. From the

SYNCHING state, the receiver will synchronize to the most significant synchronization character and go into the

RECEIVING state in order to verify if the synchronization stream is valid. This is done by verifying if

sync_detect[0] and sync_ver are both logic ‘1’.

Figure 7.4. Finite state machine of the GXB Receiver Interface

When the receiver is in the RECEIVING state, it receives packet segments. The receiver has to go in the

SYNCHING state when a protocol error occurs. The receiver FSM monitors for two types of protocol error in the

RECEIVING state: an invalid synchronization stream and the reception of an end of transmission stream. The

former is characterized by the reception of the character K28.5 without K28.0 or vice-versa. When the

synchronization stream is done in the current clock cycle, sync_detect[1] and sync[0] have to be both

logic ‘1’, which indicate the presence of the character K28.0 and K28.5 in the 8 MSBs and LSBs respectively.

The protocol error can be detected by taking the xor of the signals sync_detect[1] and sync[0]. When the

character K28.5 is detected in the 8 MSBs, sync[1], the sync_ver signal is set to logic ‘1’. At the next clock

cycle, the receiver expects the character K28.0 in the 8 LSBs, sync_detect[0]. A protocol error can be

detected by taking the xor of the signals sync_ver and sync_detect[0]. The reception of an end of stream

character in the 8 LSBs, eot[0], will force the receiver to go in the SYNCHING state. The only case where the

receiver will not go into the SYNCHING state is when the character K28.5 is detected in the 8 MSBs, sync[1]

because it is considered as being sent more recently. The reception of an EOT stream character in the 8 MSBs,

eot[1], will force the receiver to go immediately in the SYNCHING state because the character is considered

as being sent more recently.

In the FSM, two more conditions other than protocol errors have to be verified when the receiver is in the

RECEIVING state. The first condition to verify is if the current segment being sent is the last one. This is

indicated by the last_segment signal being logic ‘1’. When it is not the last segment, the receiver remains in

the RECEIVING state, else it can go in the ALIGNMENT_RX or RX_LAST states. The last condition is based on

the alignment signals. When the alignment signal is logic ‘1’, the receiver goes in the ALIGNMENT_RX state;

otherwise it goes in the RX_LAST state.

Before proceeding to the ALIGNMENT_RX state of Figure 7.4, the two possible cases of alignment have to be

explained. Figure 7.5 a) and b) shows the cases where the alignment character was aligned to the 8 LSBs and

MSBs accordingly, and it takes four segments to form a packet. Each row represents a packet segment being

received during one clock cycle. In a) it is possible to see that a segment is received in one clock cycle and it

takes four clock cycles for the reception of a complete packet. In b) one half of a segment is received in a clock

cycle because of the synchronization stream. In order to receive the entire packet, it takes five clock cycles. The

last cycle of the reception contains the first half of the first segment of the next packet. The ALIGNMENT_RX

state was added for this extra clock cycle it takes to receive a packet.

Figure 7.5. Packet Alignment Scenarios with Four Segments Packets

In the ALIGNMENT_RX state of the FSM of Figure 7.4, the receiver verifies for any protocol error but it is

important to know where the current packet reception is completed. According to the Figure 7.5 b), the 8 LSBs

received during the ALIGNMENT_RX state is part of the current packet being received. By knowing this

information, any control character received in the 8 LSBs is considered as a protocol error. When there is no

alignment character, K28.5, in the 8 MSBs and any control character are detected in the 8 LSBs the receiver is

force to go in the SYNCHING state in the second condition of the FSM. The sync_detect[1] and sync[0]

signals are xor in order to detect an invalid synchronization stream. This means the reception of the control

character K28.0 without K28.5 or vice-versa. The third condition takes in consideration that the 8 MSBs contains

the alignment character K28.5, sync[1], and the 8 LSBs contains any control characters. The receiver is forced

to go in the RECEIVING state because the alignment character in the 8 MSBs is considered as being sent more

recently than the 8 LSBs. In the third condition, the detection of a valid synchronization stream in the 8 LSBs will

force the receiver to go directly in the RECEIVING state. If the receiver is not sent into the SYNCHING or

RECEIVING state, it goes in the RX_LAST state.

The RX_LAST state of Figure 7.4 is used to indicate a packet has been completely received and is ready to be

store in a FIFO. In this state, any protocol error or end of transmission stream will force the receiver to go into

the SYNCHING state. The detection of an end of stream character in the 8 MSBs will force the receiver to go

directly in the SYNCHING state because it is considered as being sent more recently. In the absence of an

alignment character in the 8 MSBs, sync[1], an end of transmission stream in the 8 LSBs, eot[0], will force

the receiver to go in the SYNCHING state. An invalid synchronization sequence like the presence of the

alignment character in the 8 LSBs, sync[0], without the control character K28.0, sync_detec[0], or vice-

versa, will force the receiver to go in the SYNCHING state. Similarly, the presence of the control character K28.0

in the 8 LSBs, sync_detect[0], without the sync_ver signal being logic ‘1’ or vice-versa will force the

receiver to go in the SYNCHING state. When an end of transmission request is done (i.e. the signal eot_req is

logic ‘1’) and no alignment are present, the receiver has to go in the SYNCHING state. When the alignment

character is present in the 8 MSBs, the receiver have to go in the RECEIVING state where the synchronization

request will be validated. When the receiver is not forced to go into the SYNCHING state, it will continue the

packet reception in the RECEIVING state.

The FSM of Figure 7.4 gives the priority to the following events in decreasing order:

1. The enabled signal, the LOS detection signal and the CDR lock signal. The receiver is disabled.

2. The control character, K28.5 or EOT, in the 8 MSBs are assumed to be most recently sent than the 8

LSBs. For example if two synchronizations K28.5 are present, only the one in the 8 MSBs will be taken

into consideration. At the next clock cycle, there will be verification for the presence of the character

K28.0.

3. The presence of a control character in the 8 LSBs will force the receiver to verify the synchronized

stream or end the current reception.

4. The reception of the actual data.

The receiver block diagram associated with the FSM of Figure 7.4 is depicted in Figure 7.6. The reset signal is

the only global signal. The input signals from the transceiver instantiation are clk, sync, ctrl_char,

rx_in, async_los and async_pll_locked. The input signal async_enable is used to enable the

receiver. The output signals synching_rx, sync_los, sync_pll_locked and rx_done are used to know

the status of the transceiver interface. The signal rx_done is used for writing the incoming packet onto the

receiver FIFO. The currently received packet is outputted by the signal rx_out.

The VHDL code associated with the block diagram of Figure 7.6 details the number of segments

(NB_OF_SEGMENTS) variable specifies how many segments form a packet. The minimum it can be set to is

four. The variable NB_OF_SEGMENTS_LOG2 is obtained by taking the logarithm base two of the variable

NB_OF_SEGMENTS. The last variable is the segment size which has to be set to 16 because two 8 bit

characters are sent every clock cycle.

The first component to be described in Figure 7.6 is the alignment register which has a value of 0 or 1 when the

reception looks like the diagram of Figure 7.5 a) or b), respectively. It outputs the alignment signals which

indicate to the FSM of Figure 7.4 to go into the ALIGNMENT_RX or RX_LAST state from the RECEIVING state.

It is set to logic ‘1’ when the alignment character is detected in the 8 MSBs, sync[1], and the receiver is not in

the DISABLE state. It is synchronously cleared when there is no alignment character present in the 8 MSBs,

sync[1], because it is considered as being sent more recently in time. In order to be synchronously cleared, it

also requires an alignment character to be present in the 8 LSBs, sync[0], and the receiver is not in the

DISABLE state.

The receiver buffer in Figure 7.6, rx_buffer, is used to shift right the incoming segments, rx_in, upon their

arrival. Its width is the packet length plus half a segment size. This extra width is necessary when the receiver is

aligned like in Figure 7.5 b) which means half a segment is received per clock cycle. In this case, it would

contain the segment_0_0 in the 8 LSBs followed by the remaining segments and the 8 MSBs would be part of

the next packet. The ALIGNMENT_RX state of Figure 7.4 forces the receiver to shift the incoming packet to the

right one more clock cycle and store the segment_0_0 in the 8 LSBs. When the receiver receives a complete

segment in the current clock cycle like in Figure 7.5 a), the packet is contained in the MSBs of the receiver buffer

(rx_buffer). The alignment register is used to control a two-input multiplexer which takes these two alignment

cases and outputs the received packet properly through rx_out. When the alignment register has a value of 0,

the MSBs of the receiver buffer are selected; else it is the LSBs of the receiver buffer. The output of this

multiplexer has to be fed to the receiver output FIFO.

A register is needed in order to validate the synchronization stream at the next clock cycle when the alignment

character K28.5 is detected in the 8 MSBs. As it was explained using the Figure 7.4 when the sync_ver signal

was logic ‘1’ the receiver was expecting the control character K28.0 in the 8 LSBs. In Figure 7.6, this register is

named sync_ver and is synchronously set when the alignment character is detected in the 8 MSBs, sync[1],

and the current state is not DISABLE. The register is synchronously clear whenever the 8 MSBs do not contain

an alignment character and the receiver is in the DISABLE state.

In the RX_LAST state of the FSM of Figure 7.4, the receiver has completed the reception of a packet. In this

state, the signal rx_done has a logical value of “1” in Figure 7.4 which indicates that the packet is ready to be

written in the output FIFO. The RX_LAST state signal from the state machine is used to generate the rx_done

signal.

Figure 7.6. Hardware block diagram of the GXB Receiver Interface

When the receiver is in the DISABLE and SYNCHING state, it does not receive data. In Figure 7.4 the signal

synching_rx is generated when the state machine is not in the DISABLE or SYNCHING state. This signal is

useful in order to know when the receiver receives or not data.

In the receiver block diagram of Figure 7.6, a counter is used in order to know when it is the last arriving

segment. It is synchronously reset when it is in the DISABLE state because there is no segment being received.

The detection of an alignment character (i.e. the sync signal has a different value than b“00”) will synchronously

reset the counter because it signifies the beginning of a new transmission. The counter is synchronously reset

when an end of transmission character is detected in the 8 LSBs, eot[0]. When the EOT character is present

in the 8 LSBs, it signifies the current transmission is completed and there are no valid segments in the receiver

buffer (rx_buffer). The receiver is synchronously reset when an end of transmission character is detected in

the 8 MSBs and the current state is not ALIGNMENT_RX. This comes from the fact that the receiver has

received a complete packet but has to go in RX_LAST in order to inform that its reception is completed with the

signal tx_done. It is synchronously reset during the reception of the last segment in the RECEIVING state when

the alignment signal has a value of 0. The last segment is received in the RECEIVING state when the

alignment signal is ‘0’ according to Figure 7.5 a), and the counter can be synchronously reset for this reason

when the signal last_segment is logic ‘1’. The counter is synchronously reset when an end of transmission

request, eot_req, is done. It is incremented when the request to synchronously clear is not present and the

current state is RECEIVING or RX_LAST. In these states the receiver receives segments and it needs to keep a

count. When the receiver is in the ALIGNMENT_RX state and there is no request to reset it, its value is set to 1

because it already received half a segment.

The counter output count in Figure 7.6 is used for detecting when the last segment is received. An equal

comparator is used for detecting when the count output is equal to the number of segments creating a packet

minus one. This comparator generates the last segment signal (last_segment) which is used for the state

machine and synchronously resetting the counter. Four equal comparators in Figure 7.6 are used to detect the

reception of the K28.0 characters of a synchronize stream (sync_detect[1..0]) and also to detect the end of

transmission K28.6 characters (eot[1..0]). A comparator is needed for the detection of the control character

in 8 MSBs and LSBs using the receiver data input (rx_in) and the control character detection signal

(ctrl_char[1..0]). K28.6 and K28.0 are control characters and the ctrl_char[1..0] signal is used to

distinguish them from regular characters. An extra precaution was taken by verifying that the alignment character,

sync[1..0], was not detected at the same time. The detection of an alignment character with a different

disparity will result into an unknown character than could be interpreted as the K28.6 or K28.0 characters.

In Figure 7.6, a register is used for registering the end of transmission character when it is received in the 8

MSBs in the ALIGNMENT_RX state. According to the Figure 7.5 b), the last half segment can be received with

an end of transmission segment. The receiver has to go into the RX_LAST state in order to inform the packet

has been received correctly with the rx_done signal. Then from the RX_LAST state, the counter can be reset.

This register is set to one only if the end of transmission character is present in the 8 MSBs (eot[1]) and there

is no end of transmission present in the 8 LSBs. When an end of transmission character is present in the 8 LSBs

of the ALIGNMENT_RX state, it is considered a protocol error and there is no need for an end of transmission

request. When the end of transmission request register is not set, it is synchronously reset.

8. Slot Reception

8.1. Overview

After the optical-to-electrical conversion is done by the optical receiver, the processing of the slot information

occurs in the FPGA where the digital data stream is buffered and decoded. This chapter describes the details of

the slot processing once the data stream is buffered in the FPGA memory FIFOs. The processing requires the

appropriate state machine architecture to control what is decoded from the incoming slot and what is put

together to assemble the slot that will be sent to the PC module of the edge node, which includes adding the

time the slot was received, the CRC status and the PC-FPGA clock difference. The Slot Reception module also

needs to consider slot entry allowance control for the incoming and outgoing slots from the buffers and, finally,

synchronization so that the slot is processed with a clock that is equivalent to the transmission clock.

8.2. Detailed description of slot reception

The slot receiver for an AAPN slot is primarily responsible for taking an AAPN slot from an optical signal,

converting this to an electrical signal, and digitally processing the contents of the slot and forwarding this to the

PC where the contents of the bit stream are analyzed and used at a higher level of the network hierarchy. The

details of this chapter will be focused on the “slot receiver processor”.

Figure 8.1 captures the general block diagram of the slot receiving process of the hardware module of the AAPN

prototype. The block diagram shows the functions required for the optical signal to be converted, transmitted and

buffered electrically within an FPGA development board. It travels into a series of logical circuits located within

two FPGAs so that the data stream can propagate to the PC via an Ethernet link. The architecture shows two

FPGAs but the design could have been constructed with only one FPGA. The reason for having two is merely a

result of the restrictions of the printed circuit evaluation board used for the project. The first FPGA transfers

Ethernet frames between the PC and the FPGA development board. The second FPGA is designed to contain

the ASB receiver needed to accept the transmitted packets from the first FPGA. The second FPGA also contains

the remaining logical circuits needed for clock recovery, deserializing, buffering and, finally, the additional

processing circuits required to analyze the incoming slot data stream. Clock extraction is done on the receive

path so that all timing is equivalent to the transmitter clock.

Optical Rx

(O-E

conversion)

CDR

SERializer-

DESerializer

32 64

TRCVR FIFO

8B/10B

Decoder

Slot Receiver

Processor

UART

Buffer +

Transmitter

UART-to-

Ethernet PC

Transmitter

FPGA #2

FPGA #1

ASB

ASB-to-

Figure 8.1. AAPN slot receiver block diagram

The “slot receiver processor” (Figure 8.2) is primarily responsible for extracting particular fields from the

incoming AAPN slot and recreating a new slot with these fields plus additional fields needed by the higher layer

software algorithms at the PC end. It uses a custom state machine to create control signals that connect to a

“slot field” demultiplexer. The sequence of outputs of the demultiplexer (on the right side of the figure) is, as

defined in Figure 4.1 and Figure 4.2:

1. sending time (TimeToSendOptical),

2. received time (TimeReceivedOptical), still set to zero

3. slot payload ("other data"),

4. slot padding,

5. the CRC value calculated at the source node just before transmission

FPGA Time

(64-bit

Counter)

50 MHz

clock

UART FIFO

TRCVR FIFO

ASSEMBLY FIFO

(word depth = 253

8096 packet / 32 bits)

Re-assembly AAPN Rx

State Machine

(Provide control signals

to bypass or assemble

correct AAPN slot

information to th e

UART interface)

Sending Time

Current Time

Slot Payload

CRC Status

Clock Difference

CRC

Calculator &

Comparator

50 MHz

clock

Sending Time

Received Time

Slot Payload

Slot Padding

CRC Value

ASB FIFO

Figure 8.2. AAPN slot receiver processor block diagram

The control signals are generated by a series of decoders at the output of several counters where each control

signal represents the value of the counters when they reach the length, in 4 byte words, of each specific slot field.

Each AAPN slot field is obtained from the transceiver FIFO buffer (after the O-E conversion).

For the assembly of the new slot required to be transmitted to the PC, the custom state machine also generates

the control signals and transfers them to a multiplexer (on the left side of the figure) and a FIFO buffer where it

selects the “slot field” and writes this into the buffer. This buffer is called the “Assembly FIFO” and is sized

according to the number of 4-byte words equivalent to the length of the new slot and therefore only stores one

slot.

The “slot receiver processor” is synchronized with a common clock derived from the received signal from the

optical core and the data is forwarded directly to the “assembly FIFO” using the same data width from the input

from the transceiver FIFO. Since the communication path can potentially corrupt the data that is being received,

a 32-bit cyclic-redundancy check (CRC) circuit [13] [14] [15], referred as the “CRC calculator”, is also included

after the demultiplexer. The custom state machine also provides a control signal to the “CRC calculator” block so

that it is activated when each “slot field” is read from the demultiplexer and disabled otherwise. The “CRC

calculator” is asynchronous and requires these control signals to synchronize the computation with the correct

“slot field” of the received signal. A comparator is used to compare the CRC calculated and the CRC received

within the slot. One of the slot fields created for the new recreation of the slot to be transmitted to the PC is the

“CRC status” (TransmissionError). If the output of the comparator shows that both the received CRC and

the computed value are the same, a logical ‘1’ will be set; otherwise, a logical value of ‘0’ will be set.

The sequence of outputs of the multiplexer (on the left side of the figure) are, as defined in Figure 4.2:

1. sending time (TimeToSendOptical),

2. current time (which becomes TimeReceivedOptical),

3. slot payload ("other data"),

4. CRC status (which becomes the CRC field including the TransmissionError bit)

5. clock difference (ClockDifference) as it was calculated when the last slot arrived from the PC

module

The “FPGA Global Time” represents the actual time in units of counts and is used to produce the "current time"

(TimeReceivedOptical). The FPGA Global Time is a 64-bit number at the output of a counter which uses a

50 MHz reference oscillator. At this clock rate, the counter value should roll-over after approximately 11690

years ([2

-1] x 20 ns / 60 sec/min / 60 min/hr / 24 hrs/day / 365.25 days/year). A large data width was chosen for

this counter so that the state machine of the “slot receiver processor” could consider this scenario negligible.

The “clock difference” slot field is computed by the “slot transmitter processor” circuit when the last slot arrived to

the FPGA module and it was latched so that the “slot receiver processor” can capture the value at the moment it

was originally calculated. Since this is computed before slots are received from the optical core, the “clock

difference” slot field should always represent a valid (positive) number.

Once an entire “new slot” is stored within the “Assembly FIFO”, the custom state machine provides additional

control signals to transfer data to the ASB transmitter buffer. Specifically, read and write enable signals are

triggered to both buffers to allow the transfer to start.

In order for the “slot receiver processor” to be adaptive under cases where: 1) the incoming serial line rate can

be faster than the internal buffer bandwidth or, 2) the transmitted slot experiences delay before it is received by

the optical receiver or, 3) the ASB transmitter buffer is sized smaller than the “Assembly FIFO” buffer; the

custom state machine de-asserts all output control signals so that it disables any read action from the incoming

receiver buffer if it is empty and it disables any write actions to the ASB FIFO buffer if it is full. During the time

where the output signals are de-asserted, the state machine is placed into a suspending mode where it then

continues to monitor the full and empty signals until they are de-asserted and allows the state machine to

resume operation.

Table 8.1 details the signals and their definitions for the “slot receiver processor” and Table 8.2 details the

variables that can be modified to adjust the length of a particular slot field. Here the slot field lengths, in 4-byte

blocks, that can be modified are the “slot payload”, “slot padding” and finally the “Assembly FIFO depth”, which

represents the “new slot” length, also in 4-byte blocks.

Table 8.1. Slot receiver processor signal definitions

Name Type Description

CLOCK Input Global clock for the slot receiver processor state machine

RESET Input puts the state machine in the reset state (active high)

TRCVR_FIFO_DATA_IN Input 32-bit data that arrives from the transceiver FIFO

CLOCK_DIFF_RESULT Input 64-bit data signal calculated by the Slotted Transmission

module, representing the PC-FPGA clock difference

TRCVR_FIFO_EMPTY_STATUS

Input a logical one indicates that the transceiver FIFO is empty

UART_FIFO_FULL_STATUS Input a logical one indicates that the ASB FIFO is full

FPGA_GLOBAL_TIME Input FPGA time in units of counts. It is a 64-bit counter

TRCVR_FIFO_RDREQ Output Read request signal for the transceiver FIFO

PRESTORE_UART_FIFO_OUT Output 32-bit data that will be forwarded to the ASB FIFO and

represents the output of the Assembly FIFO

UART_FIFO_WRREQ Output Write request signal for the ASB FIFO

Table 8.2. Modifiable parameters in the slot receiver processor

Name and current value Type Description

payload_fragment

maximum datawidth = 256

actual size = 246

Counter with an

end-limit decoder

Total length, in 4-byte blocks for the slot payload.

datawidth is the maximum size of the counter as

an integer (should be a power of 2).

Actual size is the slot payload length.

padding_fragment_label

maximum datawidth = 8192

actual size = 4122

Counter with an

end-limit decoder

Total length, in 4-byte blocks for the slot padding.

datawidth is the maximum size of the counter as

an integer (should be a power of 2).

Actual size is the slot padding length.

uart_fifo_packet_counter_label

maximum datawidth = 256

actual size = 253

Counter with an

end-limit decoder

Total length, in 4-byte blocks, for the new slot;

equivalent to the length of the buffer.

datawidth is the maximum size of the counter as

an integer (should be a power of 2).

Actual size is the new slot length.

The algorithmic state machine of the “slot receiver processor” is detailed in Figure 8.3 (the figure spans two

pages, please zoom in your computer). The output signals are shown to the right of the state diagram indicating

their logical value at each state. Please note that the ASB is denoted as "UART" in this figure.

SLOT_PACKET_COUNT_EN

PAD_PACKET_COUNT_EN

RD_REQ_TRCVR

RD_REQ_RX_FIFO

INIT

RST

WRREQ_RX_FIFO

Rd/Wr Send Time

to Tem p FIFO

Pkt count =

Write Cur rent

Time MSB

Write Payload to

Tem p FIFO &

Compu te CRC

Pkt count =

246

CRC_ENABLE

64B_WORD_COUNT_EN

00 00 00 00 10

00 00 01 00 10

01 10 11 00 00

UART_PACKET_COUNT_EN

WRREQ_UART_FIFO

Validate Tcvr

Empty Status

00 00 00 00 00

Read Trcvr FIFO

for Receive d Time

11 01

11 11

11 01

SLOT_PACKET_COUNT_RST

PAD_PACKET_COUNT_RST

64B_WORD_COUNT_RST

UART_PACKET_COUNT_RST

UART_DATA_MUX_SEL

CRC_CLR

00 10 11 01 10 01 01 0

00 10 10 00 10 01 11 0

Write Cur rent

Time LSB

00 00 01 00 10 11 21 0

em pty_trcvr

= 0

Reset 64B Word

Counter

00 00 00 00 10 11 11 0

Read Curre nt

FPGA Time

00 00 00 00 10 11 11 0

00 00 00 00 10 11 01 0

Reset Slot Word

Counte r

Validate T cvr

Empty Status

00 00 00 00 10 10 01 0

Check_FIFO_Empty_

FLAG_0

00 00 00 00 10 01 01 0

em pty_trcvr

Reset_Se nd_Tim e

_Word_Cntr

00 00 00 00 10 11 01 0

Check_FIFO_Empty_

FLAG_1

00 00 00 00 10 01 01 0

emp ty_trcvr

Pkt count =

Write Clock

Differe nce MSB to

Tem p FIFO

Pkt count =

253

Write to UART FIFO

Goto

INIT

00 00 01 00 10

00 01 00 00 11

11 51

11 00

Read Clock

Differe nce from

TX FSM

00 00 00 00 10 11 51 1

Perfor m CRC

Compar e / Read

Tcvr

CRC_Status

= 1

00 10 00 00 10

Validate T cvr

Empty Status

00 00 00 00 10 11 31

11 31

em pty_trcvr

= 0

Wirte CRC Status

Error

00 00 01 00 10 11 41 0

Wirte CRC Status

Valid

00 00 01 00 10 11 31 0

Write Clock

Differe nce LSB to

Tem p FIFO

00 00 01 00 10 11 61 1

Read Padding &

Compu te CRC

Pkt count =

4122

00 10 10 00 10 10 01 0

em pty_trcvr

= 0

00 00 00 00 10 11 01 0

Reset Pad Word

Counte r

11 31 0

Validate UART Full

Status

00 00 00 00 10 11 00 1

uart_full = 0

00 00 00 00 10 11 01 1

Reset UART Wor d

Counte r

Increm ent UART

Word Count er

00 00 00 10 10 11 00 1

Figure 8.3. Slot receiver processor algorithmic state machine

9. Core Optical Switch

9.1. Overview

The intent of an optical switching fabric is to allow modulated light signals to traverse a path between any two

edge nodes without any optical-to-electrical-to-optical conversion as in traditional switching architectures. The

non-blocking architecture approach provides light paths the possibility to traverse to any edge node freely and

without any interference from other selected ports. High port count switches that can reconfigure in the scale of

nanoseconds were not yet commercially available and had not been completed by Theme 2 of the AAPN

Research Network [1] at the time this prototype project started. Therefore, a fabrication using fast switches at

lower port densities was used to obtain the size and speed required for the AAPN hardware demonstrator. The

design is also to be wavelength agnostic so that any edge node transmitting at any particular wavelength can

communicate to another edge node through the switching fabric. This is achieved through the use of Solid Free

Space technology within the Civcom devices.

9.2. Non-blocking architecture

The architecture of the core optical switch is shown in Figure 9.1. It is a 4x4 non-blocking switch that is made up

of 2x2 switches. The architecture is a Clos-type where each individual switch is actuated by one TTL control

signal. For a 4x4 architecture, six switches are required and, therefore, 6 control lines are needed (S0 to S5).

The switches are interconnected with single-mode optical fiber. This architecture was favoured for the design of

the core switch since the Civcom switches is available only in a 2x2 format or smaller [10]. Table 9.1 provides a

description of each port symbol and their connections to the AAPN network.

4 x 4 non-blocking switch

IN0

IN1

IN2

IN3

OUT0

OUT1

OUT2

OUT3

Figure 9.1. Architecture of the AAPN demonstrator optical core switch

Table 9.1. Optical core Switch Connectivity

Switch Port Port Connection Name

IN0 Master Transmitter

IN1 Edge Node 0 Transmitter

IN2 Edge Node 1 Transmitter

IN3 Edge Node 2 Transmitter

OUT0 Master Receiver

OUT1 Edge Node 0 Receiver

OUT2 Edge Node 1 Receiver

OUT3 Edge Node 2 Receiver

In order to establish a path between a particular edge node’s transmitter to another edge node’s receiver, the

correct settings for each switch will have to be obtained with the appropriate control signals S0 to S5. The truth

table of the switch, Table 9.2, depicts the settings of inputs (control lines) to outputs needed for the correct

operation of the prototype. The table does not include the entire number of possible combinations of settings for

the control lines since any duplicate settings that achieve the same path configuration can be removed.

Moreover, it is assumed that no edge node can transmit data to itself (however it is very easy to include these

additional cases in the future for the purposes of testing).

Table 9.2. AAPN core switch truth table settings

S0 S1 S2 S3 S4 S5 Hex Code OUT-0 OUT-1 OUT-2 OUT-3

0 0 0 0 1 1 0x03 IN-1 IN-0 IN-3 IN-2

0 0 0 1 1 1 0x07 IN-2 IN-0 IN-3 IN-1

0 0 1 0 1 1 0x0B IN-1 IN-3 IN-0 IN-2

0 0 1 1 0 0 0x0C IN-3 IN-2 IN-1 IN-0

0 0 1 1 0 1 0x0D IN-3 IN-2 IN-0 IN-1

0 0 1 1 1 0 0x0E IN-2 IN-3 IN-1 IN-0

0 0 1 1 1 1 0x0F IN-2 IN-3 IN-0 IN-1

0 1 0 1 1 0 0x16 IN-3 IN-0 IN-1 IN-2

0 1 1 0 1 0 0x1A IN-1 IN-2 IN-3 IN-0

Figure 9.2 shows the implemented 4x4 optical switch mounted on a bench top powered by an AC/DC power

supply required for the internal high-voltage drivers. The Civcom switches were packaged in the metal boxes for

easy and safe use in the lab.

Figure 9.2. Bench top assembly of the AAPN optical core switch

10. Experimental test results

The results obtained to confirm the correct operation of the “slot transmitter” and the “slot receiver” processors

were based on two setup configurations. The first was connecting the output of the transmitter directly to the

receiver as in Figure 10.1 in order to validate that the slot sequence being transmitted is identical to that being

received. The sequence transmitted was generated by a custom slot generator that sends 32-bit words. The

received sequence was printed on a computer screen where it was analyzed in order to be tested for

consistency.

Slot Receiver

Slot Transmitter

Optical Rx

(E-O

conversion)

Optical Rx

(O-E

conversion)

Custom Slot

Rx from PC

Custom Slot

Tx to PC

FPGA #1 FPGA #2

Optical Tx

(E-O

conversion)

Optical

(O-E

conversion)

Figure 10.1. Loopback test configuration setup

The second test configuration was set by replacing the optical loopback with the core switch as in Figure 10.2.

Again the sequence is analyzed for consistency with what was transmitted.

Slot Receiver

Slot Transmitter

Optical Rx

(E-O

conversion)

Optical Rx

(O-E

conversion)

Custom Slot

Rx from PC

Custom Slot

Tx to PC

FPGA #1 FPGA #2

4 x 4 Optical Core

switch

Optical Tx

(E-O

conversion)

Optical Rx

(O-E

conversion)

Figure 10.2. Loopback test configuration using core switch setup

Both configurations yielded identical correct outcomes for the sequence received. For this experiment, the

number of words being transmitted is 255 and the number of words received is 253, as expected: the received

length is exactly the size of the new slot or the length of the “Assembly FIFO”, which is 64 bits shorter than the

transmitted slot (refer to the slot format specifications in chapter 4). The custom sequence begins with 1000 and

increments until the length of the new slot is completed. The returned values of each word field can be seen in

Table 10.1, which describes the fields that are reported when the slot is received by the screening PC. As

expected, the first returned value is 1005 and not 1000 since the first words are cut out by the slot transmitter

processor: in the downstream direction the first 3 parameters that represent the SwitchConfiguration, the

PCSendingTime, and the TimeToSendOptical (a total of 5 x 32-bit words) have been removed.

The correct computation of the CRC has been verified by purposely replacing one of the slot fields with a

different value just prior to optical transmission so that the receiver calculates a different CRC compared to the

transmitter. In this case, the status reported was 0, indicating an error in transmission.

Table 10.1. Received word sequence in the loopback test configuration

Slot Word Index Received Slot

Sequence

Description

0 1005 Represents the least significant 32-bit word of the TimeToSendOptical

parameter

1 1006 Represents the most-significant 32-bit word of the TimeToSendOptical

parameter

2 0 Represents the least significant 32-bit word of the current FPGA global time

3 439940955 Represents the most significant 32-bit word of the current FPGA global time

4 1009 Represents the start of the slot payload ("other data")

5 1010

6 1011

Slot payload

7 1012

8 1013

9 1014

10 1015

11 1016

12 1017

13 1018

14 1019

15 1020

16 1021

Etc…. Etc….

240 1245

241 1246

242 1247

243 1248

244 1249

245 1250

246 1251

247 1252

248 1253

249 1254

250 0 Represents the 32-bit CRC status flag.

A status = 0 indicates that there has been a transmission error

251 4294966295 Represents the least significant 32-bit word of the clock difference calculation

obtained from the slot transmitter processor

252 439889910 Represents the most significant 32-bit word of the clock difference calculation

obtained from the slot transmitter processor

The critical synchronization parameters needed between the Slotted Transmission module and the Slot

Reception module are the guard time and the correct number of words to be used for padding. The length

(number of words) for the padding is adjusted by the amount of guard time needed in order to keep the total slot

length fixed. For the 200 µs time slot in the current design, a total number of 4122 padded words are needed.

11. Summary and Discussion

The hardware implementation of the "medium speed" demonstrator prototype has been finalized, with only a few

minor bugs still to be addressed. The demonstrator is intended to be a scaled-down version of an AAPN: it

consists of a collapsed network of one 4x4 optical core switch and 4 edge nodes, each conformed with a PC and

a FPGA development board. A 100 Mbps Ethernet network card connects the PC to the FPGA development

board and a 1 Gbps optical transceiver connects the FPGAs to the core switch. One of the edge nodes operates

as the master edge node and is used to control the fast optical core switch.

Figure 11.1. View of the FPGA component of the AAPN edge node for the optical loopback test

The design of the edge node consists of a division of labour between the FPGA component and the PC

component. High level functionality such as traffic aggregation, traffic monitoring, bandwidth allocation functions

and network synchronization protocols were implemented in the software component on the PC of the edge

nodes, which was addressed in the "Software Control Platform" parallel project described in [11] and [12]. Low

level, fast functions have been implemented in hardware, which is the work presented in this report. The

"Hardware Functionality", as it is called here, consists of: interface to the PC component, precise slotted optical

transmission, optical burst-mode reception and configuration of the network core switch. The hardware has been

implemented using custom circuits developed using programmable logical elements on a System on Chip (SoC)

field programmable gate array (FPGA). The design of the custom circuits is implemented using a Mealy finite

state machine model programmed using VHDL code. Other elements within the design are implemented using

compiled software within a hard-core microprocessor on the FPGA using on-chip memory.

The core optical switching fabric has also been built as part of the Hardware Functionality. It has been

implemented and tested using six Civcom's Free-X 2x2 switches in a Clos-type architecture. The switch is fully

operational and can be reconfigured in 400ns, which complies with the AAPN target of 1µs. It has, however, a

limitation in its reconfiguration frequency since it can be only 6 KHz, which dictates that only 1 AAPN data slot

can be transmitted every 166.67 µs. The original design target was to have a time slot of 10 µs.

Some of the knowledge gained while working on this project is discussed in the sequel.

The HW component presented here has been designed for a network operating in TDM mode; however, it will

also work for an AAPN operating in Optical Burst Switching mode because the HW functionality does not

determine the TimeToSendOptical, it only executes it. To use the FPGA component in an OBS-AAPN, only

the definition of the variable TimeReceivedOptical has to be changed: the Guard Time should not be

subtracted from it.

The slot pointer correct alignment in the optical domain was difficult to achieve - the transmit and receive

components need to have their slot indices aligned in order for the custom state machine within the transmitter

and receiver portion of the FPGA to encode and decode the correct slot information.

Local synchronization of the two components of the Edge Node is a complex issue. During the initialization

phase of the edge node and in normal operation, synchronization between the PC and its corresponding FPGA

was achieved by calculating, at the FPGA, the time difference between the timestamp field within the slot and

the current time in the FPGA. This time is actually an integer value (implemented as a counter) that resides on

the FPGA and is inserted into the slot as a 64-bit value. The difference between these integer components is

denoted as the clock difference, which is sent back to the PC in order to keep track of the delay offset in the

electrical domain needed for future transmissions in the optical domain. It is important that this clock difference

be calculated immediately after the PC time has been extracted from the Ethernet frame, otherwise one risks

measuring not only the clock difference but also the time elapsed between a slot being transmitted to the core

and a subsequent slot being received at the edge node (at low network loads, this may be a long time).

The efficiency of the prototype would also be greatly improved by implementing a more complex optical burst-

mode receiver since currently it takes 25µs to recover the clock. This time is acceptable relatively to the current

time slot length of 200µs (it represents 12.5% of the time slot, close to the 10% target in the AAPN literature),

though. It is important to note as well that a suitable much faster optical transceiver will be provided by

researchers working on Theme 2 "Enabling Technologies" of the AAPN Research Network [1].

Not all the functionality of the prototype has been fully tested because of several problems in the integration of

software and hardware. Every hardware and software part has been tested separately but errors still appear

when certain blocks are put together. The team is still working on this task.

The data rate between the PC to the FPGA proved to be the largest obstacle. Although the maximum line rate

achieved by the Ethernet card is 100 Mbps, the speed measured between the PC and the Stratix FPGA was

only ~10 Mbps. The limitation is a result of the Ethernet interface being implemented using an embedded

microprocessor (NIOS RISC processor) and off-chip memory, whose advantage is in general a reduction of

implementation time and effort. Due to the immaturity of the related technologies, however (especially the

operating system, and the lack of technical support), there exist a lot of bugs in both the hardware and software

systems, which resulted in a large amount of wasted time and effort after all, trying just to figure out how to

program around the system bugs. Moreover, both the IDE (Integrated Development Environment) system and

the NIOS operating system keep getting upgrades and every new version has a different set of system bugs.

Maintaining even the same code thus became a time-consuming conundrum game. In the end, the overall

development time of the software-based PC-FPGA interface took much longer than it was originally expected,

and the hardware implementation could have been implemented for the same amount of time and effort.

One can remove the speed limitation of the Ethernet interface by implementing dedicated hardware on the

FPGA (custom state machine) and use either SRAM and/or internal chip memory from the FPGA. The most

desirable alternative would be to eliminate the intermediary Stratix FPGA chipset and link the Ethernet directly to

the Stratix GX FPGA chipset; however this is not possible given the interconnections layout of the development

board. Unfortunately, the limited PC-FPGA link rate imposes a large bottleneck to the design of the AAPN edge

node as the bottleneck was expected to occur on the electrical-to-optical side based on the architecture and not

at the PC-to-FPGA interface.

Though the digital design elements (the transmit portion of the FPGA, the receive portion of the FPGA, and the

optical transmission) proved to have their limitations, the implementation choices were always made with the aim

of achieving the desired performance. In the design of the transmission to the optical core, the Mealy state

machine allowed full control of the flow of slots from the incoming buffer located on the PC-to-FPGA interface.

The concept of addressing boundaries of the designs using these queues avoids any synchronization issues

between different design methodologies. The same concept was applied in the reception from the optical core;

where there is a rate differential between slots coming from the optical transceivers to the lower speed PC-to-

FPGA interface. The FIFO buffers provide a level of flow control for the incoming information and holds off any

data if the opposite side is overloaded. The transceiver design also addresses this issue in a different style that

was denoted as clock-domain-crossing. Unfortunately, the different bus widths requires the design to handle the

information in smaller segments (16-bits instead of 32-bits) and extra logic is required to address this. To avoid

the harsh buffer memory requirements that are needed to deal with the synchronization of the buses between

the different digital blocks, the design could be modified within the FPGA to keep the same bus width throughout

in order to maximize speed and yet provide simplicity.

12. Future work

There are two main items for future work on the Edge Node. The first is to implement an improved version of the

optical receiver that will not need PLLs to recover the clock since a much faster operation is needed. The second

item is to simplify and improve the implementation of the Ethernet interface to the PC from the FPGA board,

which would not only remove the biggest bottleneck of the implementation but would also allow for a smoother

integration. This could be achieved either by:



designing custom developed circuitry instead of using the NIOS processor on the Stratix chipset,

and / or



using a faster and larger FPGA development board; preferably one that allows a direct connection between

the Ethernet port and the faster FPGA chipset

It is also intended to make measurements of the core switch optical impairments to establish distance penalties

of the core, relative to the edge nodes. However, given that the biggest limitation of the current optical switches

is the low reconfiguration frequency, newer and more integrated technologies are being sought such as

semiconductor optical amplifiers to address both the switching frequency and loss issues.

The funding for these activities will be addressed through other university grant proposals; mainly for projects on

Passive Optical Networks where all our knowledge gathered from the AAPN projects can be effectively used

given their strong relationship.

References

[1] The Agile All-Photonic Networks (AAPN) Research Network, 2003-2007. Available:

http://www.aapn.mcgill.ca/.

[2] T.J. Hall, S. A. Paredes and G. v. Bochmann, “An Agile All-Photonic Network”, International Conference on

Optical Communications and Networks, ICOCN 2005; Bangkok, Thailand, 14-16 December 2005.

[3] R. Vickers and M. Beshai, “PetaWeb architecture”, 9th International Telecommunications and Networking

Planning Symposium, Toronto, Canada, 2000.

[4] J. Zheng, C. Peng, G. v. Bochmann and T.J. Hall, “Load balancing in all-optical overlaid-star TDM networks”,

Proceedings of IEEE SARNOFF’06 conference, Princeton, USA, 27-28 March, 2006.

[5] C. Peng, S.A. Paredes, T.J. Hall and G. v. Bochmann, “Constructing Service Matrices for Agile All-Optical

Cores”, The 11th IEEE Symposium on Computers and Communications, ISCC 2006; Pula-Cagliari, Italy,

26-29 June 2006, pp 967-973.

[6] L. Mason, A. Vinokurov, N. Zhao, D. Plant, "Topological design and dimensioning of agile all-photonic

networks", Computer Networks, Vol. 50, No. 2, February 2006, pp 268-287.

[7] S. A. Paredes, T. J. Hall., "A Load-Balanced Agile All-Photonic Network", The Eleventh IEEE Symposium on

Computers and Communications (ISCC 2007), Aveiro, Portugal, 1-4 July 2007, pp 107-114

[8] Stratix GX Development Board, Altera Corp, 2003. Available:

http://www.altera.com/literature/ds/ds_stx_gx_dev_bd.pdf , http://www.altera.com/literature/lit-sgx.jsp ,

http://www.altera.com/

[9] SFP MSA Transceiver, Fujitsu Limited, January 2008. Available:

http://www.fujitsu.com/downloads/OPTCMP/lineup/sfpmsa/sfp-catalog-e.pdf

[10] Free-X

Ultra-Fast Optical Switch Series, Civcom Inc., 2001-2009. Available:

http://www.civcom.com/admin/pdf/SysPic/OSdatasheet.pdf ,

http://www.civcom.com/Free_light.asp?MainID=11&Name=Free-X%20Family ,

http://www.civcom.com/admin/Articles/SPic/SFS.pdf, http://www.civcom.com.

[11] Y. Deng, "Design and Implementation of Signaling and Traffic Control for AAPN", Ph.D. thesis, School of

Information Technology and Engineering, University of Ottawa, 2007.

[12] G. v. Bochman, "Design of an agile all-photonic network", Proc. SPIE, Vol. 6784, 67842Y (2007),

DOI:10.1117/12.751911, November 2007

[13] R. F Hobson, K. L. Cheung, “A High-Performance CMOS 32-Bit Parallel CRC Engine”, IEEE Journal of

Solid-State Circuits, Vol. 34, No. 2, Feb. 1999, pp 233-235.

[14] M. Sprachmann, “Automatic Generation of Parallel CRC Circuits”, IEEE Design and Test of Computers, Vol

18, No. 3, May 2001, pp 108-114.

[15] G. Albertengo, S. Riccardo, “Parallel CRC Generation”, IEEE Micro, Vol 10, No. 5, Oct. 1990, pp 63-71.

Appendix 1. Project team members

Name Major tasks Minor tasks

Gregor v. Bochmann

bochmann@site.uottawa.ca



Overall project supervision



Prototype design

Jonathan Couturier

jcouturi@site.uottawa.ca



Optical transmission and

reception



Prototype design

Pino G. Dicorato

pdicorat@site.uottawa.ca



Edge node slot reception



Core optical switch



Prototype design



Testing



Edge node slot transmission



Optical transmission and

reception

Peter Farkas

farkasengineering@gmail.com



Asynchronous Serial Bridge

within the PC-FPGA interface



Custom dummy traffic generator

and analyzer



Prototype design

Trevor J. Hall

thall@site.uottawa.ca



Overall project supervision



Prototype design

Sofia A. Paredes

sparedes@site.uottawa.ca



Project management



Prototype design



Coordination with software team

Blerim Qela

bqela@site.uottawa.ca



Edge node slot transmission



Prototype design

Robert Radziwilovicz

radziwil@site.uottawa.ca



Technical support



Prototype design

James Y. Zhang

zhang_yi_ming@hotmail.com



Ethernet interface within the

PC-FPGA interface



Prototype design

Appendix 2. Hardware codes

The file aapn_prototype_hw_functionality.zip , delivered with this document, contains all the hardware

codes written for this project.

Deploying Agile Photonic Networks over Reconfigurable Optical Networks

Conference Paper

Jul 2009

The advantages and issues in deploying a fast photonic network on top of a reconfigurable WDM network are discussed. The agile photonic network is deployed as another user of the reconfigurable optical WDM network (RON), with the reconfigurable optical switches setting up the optical circuits that define the virtual topology for the agile network. The services provided by the agile network are then carried over the wavelengths that are assigned to it by the global control plane of the RON. Such deployment would allow the agile network to provide the fast optical time division multiplexing (OTDM) scheduling techniques warranted for fast-changing, low-capacity traffic flows typical of metropolitan and access networks; while sustained, high-capacity flows would remain in whole lightpaths provided at the RON level to other users. Connectivity options are described for edge and core nodes, as well as the functionality requirements of the global control plane that would manage such a deployment.

Design of an agile all-photonic network - art. no. 67842Y

Article

Full-text available

Dec 2007
Proceedings of SPIE

Gregor V. Bochmann

Agile All-Photonic Networks" (AAPN) is the theme of a Canadian research collaboration. An AAPN is a wavelength-division-multiplexed network that consists of several overlaid stars formed by edge nodes that aggregate traffic, interconnected by bufferless optical core nodes that perform fast switching in order to provide bandwidth allocation in sub-wavelength granularity. Specific issues addressed in this context are (a) efficient bandwidth allocation, (b) routing of MPLS flows over the AAPN, (c) allocation of protection paths, and (d) development of a demonstration prototype. This paper high-light research results and design choices related to these issues.

An agile all-photonic network

Conference Paper

Full-text available

Dec 2005

This paper presents an overview of recent and current work being conducted in the "Agile All-Photonic Networks", AAPN, Research Network. An AAPN is a wavelength division multiplexed network that consists of several overlaid stars formed by edge nodes that aggregate traffic, interconnected by bufferless optical core nodes that perform fast switching in order to provide bandwidth allocation in sub-wavelength granularities. The architectures, tools and methods being developed for its operation are described, as well as the issues to be solved.

Load Balancing in All-Optical Overlaid-Star TDM Networks

Conference Paper

Full-text available

Apr 2006

Load balancing is an effective solution to relieving network congestion and achieving good network performance. This paper investigates routing strategies for load balancing in all-optical overlaid-star TDM networks. A random routing strategy and a least-congested-path routing strategy are first presented, based on which a weighted-least-congested-path routing strategy is then proposed. The proposed strategy takes into account both load balancing and end-to-end delay in path selection, and thus can achieve better delay performance while maintaining the same blocking performance under low traffic load as compared with the other strategies. The performance of the routing strategies is evaluated through simulation results.

Topological design and dimensioning of Agile All-Photonic Networks

Article

Feb 2006
COMPUT NETW

We present the design and analysis of an Agile All-Photonic Network (AAPN); in the context of our study, the agility is derived from sub-microsecond photonic switching and global network synchronization. We have articulated a set of circuit design alternatives in terms of switch configurations referred to as symmetric and asymmetric designs, and two-layer and three-layer designs and discuss the implications of these alternatives in terms of transmitter and receiver design and synchronization requirements. In order to evaluate performance and cost of this range of design alternatives, we developed a set of software tools and methodologies for designing and dimensioning our vision of an AAPN. The topological design problem consists of determining the optimal number, size and placement of edge nodes, selector/multiplexers and core switches as well as the placement of the DWDM links so as to minimize network costs while satisfying performance requirements of the supported traffic. A new mixed integer linear programming formulation is presented for core node placement and link connectivity. A methodology has been developed for two-layer and three-layer network topology design and implemented in software. These tools were exercised under a wide variety of equipment cost assumptions for both a metropolitan network and a long-haul network assuming a gravity model for traffic distribution and a flat community of interest factor. Key findings include the determination of near cost optimal designs for both metropolitan (two-layer design) and a Canadian wide area network (WAN, three-layer design). We also show the cost and topology sensitivity to the selector switch size and the preferred size in terms of port count and number of switches.

A Load-Balanced Agile All-Photonic Network

Conference Paper

Jul 2007

We investigate a load-balancing method in a time division multiplexed agile all-photonic network (AAPN), which has a star topology and a buffeHess optical core. This method for bandwidth sharing is derived from a packet switch architecture that consists of three electronic buffering stages holding layered cross-point queues and two optical transpose interconnects between the stages. For AAPN, the architecture is folded: the slots (the data units) are fust sent to the same set of edge nodes acting as intermediate stage, and then sent to their final-destination edge nodes. The approach is suitable for the metro / access scenarios in which propagation delays are small and it simplifies the scheduling problem significantly since it is fixed and performed locally, without the need for signaling or centralised schedulers.

Constructing Service Matrices for Agile All-Optical Cores

Conference Paper

Jan 2006

A semi-analytical method based on alternate projections on a linear vector space is used to construct a service matrix from a traffic matrix, where the traffic matrix represents the bandwidth requested by the edge nodes and the service matrix represents how the bandwidth will be distributed by the core of an optical star network that operates in a Time Division Multiplexing mode. The algorithm iterates over a mathematical expression of complexity O(N^2), where N denotes the number of edge nodes. The complexity of the method is therefore O(kN^2) where k denotes the number of iterations needed to converge. With N large enough one observes that kleleN and hence this expression tends to O(N^2). Results show that the service matrices obtained with this projection method have very high measures of similarity to the original traffic matrix, with an average similarity greater than 95% for N geqslant 32 . The method is robust to inadmissible/bursty traffic and yields equal or improved delay performance in the optical network compared to other allocation methods.

Automatic generation of parallel CRC circuits

Article

Jun 2001

Michael Sprachmann

A parallel CRC circuit simultaneously processes multiple data bits. A generic VHDL description of parallel CRC circuits lets designers synthesize CRC circuits for any generator polynomial or required amount of parallelism.

Parallel CRC generation

Article

Nov 1990

Theoretical aspects of encoding cyclic redundant codes (CRCs) are reviewed. A method of designing hardware parallel encoders for CRCs that is based on digital system theory and z-transforms is presented. It allows designers to derive the logic equations of the parallel encoder circuit for any generator polynomial. A few interesting application areas for hardware parallel encoders are pointed out.< >

A high-performance CMOS 32-bit parallel CRC engine

Article

Mar 1999

Design highlights for a 32-bit parallel cyclic redundancy check (CRC) generator engine are presented. In a 0.8-μm three-layer-metal CMOS process, the engine could handle about 5 Gbps data throughput. A compact layout is achieved by predecoding eight groups of four bits followed by performing a binary tree reduction on nets that are sorted by fanout. There are six gate delays plus a single-phase clock edge-triggered register

PetaWeb architecture

Jan 2000

R Vickers
M Beshai

R. Vickers and M. Beshai, "PetaWeb architecture", 9th International Telecommunications and Networking Planning Symposium, Toronto, Canada, 2000.

Hardware functionality of the medium-speed AAPN Demonstrator Prototype

Abstract and Figures

Recommended publications

Design and analysis of high-performance switched optical interconnection network using WDM technolog...

Deploying Agile Photonic Networks over Reconfigurable Optical Networks

A Load-Balanced Agile All-Photonic Network

Design issues for edge nodes in agile all-photonic networks

Design of an agile all-photonic network - art. no. 67842Y