ArticlePDF Available

Hardware functionality of the medium-speed AAPN Demonstrator Prototype

Authors:

Abstract and Figures

The design and implementation of the hardware functionality of a prototype optical time division multiplexed communications network is presented in this report. The optical network follows the principles of the Agile All-Photonic Network (AAPN) paradigm: a wavelength division multiplexed network with an overlaid-star topology whose edge nodes perform electrical to optical conversion and whose core switches are all-optical and therefore very fast-switching. The AAPN operates in a time division multiplexing (TDM) mode at the optical core. The demonstrator prototype is intended to be a scaled-down version of an AAPN: it consists of a collapsed single-wavelength network of 4 edge nodes surrounding one 4x4 optical core switch. One of the edge nodes operates as the master edge node and is used to control the fast optical core switch. Each edge node is conformed of a PC and a FPGA development board, with a 100 Mbps Ethernet network card interconnecting them. A 1 Gbps optical transceiver connects the FPGAs to the core switch.
Content may be subject to copyright.
1
Hardware functionality
of the medium-speed
AAPN Demonstrator Prototype
Technical Report by
Pino G. Dicorato, Peter Farkas, Sofia A. Paredes, James Zhang
Principal Investigators: Gregor v. Bochmann, Trevor J. Hall
Centre for Research in Photonics
School of Information Technology and Engineering
University of Ottawa
December 2008
This work was supported by the Natural Sciences and Engineering Research Council (NSERC) and industrial
and government partners, through the Agile All-Photonic Networks (AAPN) Research Network.
2
Abstract
The design and implementation of the hardware functionality of a prototype optical time division multiplexed
communications network is presented in this report. The optical network follows the principles of the Agile All-
Photonic Network (AAPN) paradigm: a wavelength division multiplexed network with an overlaid-star topology
whose edge nodes perform electrical to optical conversion and whose core switches are all-optical and therefore
very fast-switching. The AAPN operates in a time division multiplexing (TDM) mode at the optical core.
The demonstrator prototype is intended to be a scaled-down version of an AAPN: it consists of a collapsed
single-wavelength network of 4 edge nodes surrounding one 4x4 optical core switch. One of the edge nodes
operates as the master edge node and is used to control the fast optical core switch. Each edge node is
conformed of a PC and a FPGA development board, with a 100 Mbps Ethernet network card interconnecting
them. A 1 Gbps optical transceiver connects the FPGAs to the core switch.
The design of the edge node consists of a division of labour between the FPGA component and the PC
component. The "Hardware Functionality", as it is called here, consists of the low level, fast functions
implemented on the FPGA development board, such as interfacing to the PC component, precise slotted optical
transmission, optical burst-mode reception and configuration of the network core switch. The hardware has been
implemented using custom circuits developed using programmable logical elements on a System on Chip (SoC)
field programmable gate array (FPGA). The design of the custom circuits is implemented using a Mealy finite
state machine model programmed using VHDL code. Other elements within the design are implemented using
compiled software within a hard-core microprocessor on the FPGA using on-chip memory. This report also
includes the implementation of the core optical switch, which was done with a Clos array of 2x2 free-space
optical switches.
Slotted optical transmission is the process of transmitting a data slot (a relatively large data packet) to the optical
core of the network precisely at the time dictated by the TDM scheduler. The hardware assembles the slot within
the FPGA memory, performs a parallel-to-serial conversion, encodes the slot and converts it to an optical signal
for transmission at the pre-defined time. For optical burst-mode reception a clock is first derived with clock-data
recovery circuitry in order for the data to be sampled appropriately and then converted back to the electrical
domain after the start of the slot being received has been identified. Upon reception, the data is deserialized for
further processing in the electrical domain and sent to the PC component of the edge node.
High level functionality of the AAPN prototype such as traffic aggregation, traffic monitoring, bandwidth allocation
functions and network synchronization protocols were implemented in the software component on the PC of the
edge nodes, which was addressed in a parallel separate project ("Software Control Platform") not discussed in
this document.
3
Table of contents
Hardware functionality of the medium-speed AAPN Demonstrator Prototype......................................................1
Abstract..................................................................................................................................................................2
Table of contents....................................................................................................................................................3
List of figures..........................................................................................................................................................5
List of tables...........................................................................................................................................................6
1.
Objectives......................................................................................................................................................7
2.
AAPN Demonstrator Prototype Overview ......................................................................................................7
2.1.
AAPN architecture overview..................................................................................................................7
2.2.
Demonstrator Prototype architecture.....................................................................................................8
2.3.
Edge Node Architecture in the Demonstrator Prototype........................................................................9
3.
Hardware functionality overview..................................................................................................................10
3.1.
Edge node...........................................................................................................................................10
3.2.
Fast optical core switch.......................................................................................................................12
4.
Design specifications...................................................................................................................................12
4.1.
PC-FPGA interface Ethernet frame.....................................................................................................12
4.2.
Time Diagram of the time slot in the optical core ................................................................................14
4.3.
AAPN slot in the optical domain..........................................................................................................15
5.
PC-FPGA Interface......................................................................................................................................17
5.1.
Overview .............................................................................................................................................17
5.2.
Detailed description of the Ethernet interface......................................................................................18
5.3.
Detailed description of the Asynchronous Serial Bridge API ...............................................................19
5.3.1.
READ ADDRESS Register.........................................................................................................20
5.3.2.
WRITE ADDRESS Register .......................................................................................................21
5.3.3.
READ LENGTH Register............................................................................................................21
5.3.4.
WRITE LENGTH Register ..........................................................................................................22
5.3.5.
FLAG LENGTH Register ............................................................................................................22
5.3.6.
CONTROL Register....................................................................................................................23
5.3.7.
STATUS Register.......................................................................................................................24
6.
Slotted Transmission ...................................................................................................................................25
6.1.
Overview .............................................................................................................................................25
6.2.
Detailed description of slot transmission .............................................................................................26
6.2.1.
Algorithmic State Machine..........................................................................................................27
6.2.2.
Hardware blocks.........................................................................................................................29
7.
Optical Transmission and Reception ...........................................................................................................35
7.1.
Overview .............................................................................................................................................35
7.2.
Detailed description.............................................................................................................................36
4
7.3.
GXB Transmitter Interface...................................................................................................................37
7.4.
GXB Receiver Interface.......................................................................................................................39
8.
Slot Reception .............................................................................................................................................44
8.1.
Overview .............................................................................................................................................44
8.2.
Detailed description of slot reception...................................................................................................44
9.
Core Optical Switch .....................................................................................................................................50
9.1.
Overview .............................................................................................................................................50
9.2.
Non-blocking architecture....................................................................................................................50
10.
Experimental test results .........................................................................................................................52
11.
Summary and Discussion........................................................................................................................55
12.
Future work..............................................................................................................................................57
References...........................................................................................................................................................57
Appendix 1. Project team members......................................................................................................................59
Appendix 2. Hardware codes ...............................................................................................................................59
5
List of figures
Figure 2.1. AAPN Architecture: AAPN Architecture: (a) Overlaid star topology that characterises an Agile All-
Photonic Network. (b) Use of selectors in AAPN to allow for larger numbers of edge nodes.................................7
Figure 2.2. AAPN Core node architecture..............................................................................................................8
Figure 2.3. AAPN Demonstrator Prototype architecture. E1 corresponds to the master edge node. .....................9
Figure 2.4. Block diagram of the main functions in the edge node of the demonstrator prototype. ........................9
Figure 2.5. Software architecture of a master node and a slave node [11]...........................................................10
Figure 3.1. Stratix GX Development Board by Altera Corp. [8].............................................................................11
Figure 3.2. SFP MSA Transceiver by Fujitsu Limited [9]. .....................................................................................12
Figure 3.3. Free-X
TM
2x2 optical switch by Civcom Inc. [10]................................................................................12
Figure 4.1. PC-FPGA interface: Ethernet payload contents in the downstream direction.....................................13
Figure 4.2. PC-FPGA interface: Ethernet payload contents in the upstream direction.........................................13
Figure 4.3. Timing diagram in the optical domain, downstream direction. ............................................................15
Figure 4.4. Timing diagram in the optical domain, upstream direction. ................................................................15
Figure 4.5. Contents of the AAPN time slot..........................................................................................................16
Figure 5.1. Block diagram of the PC-FPGA interface within the edge node prototype .........................................17
Figure 5.2. Ethernet frame field structure .............................................................................................................18
Figure 5.3. Flowchart describing the Ethernet interface program .........................................................................18
Figure 6.1. Main functional blocks of the Slotted Transmission module ...............................................................25
Figure 6.2. AAPN slot transmission block diagram...............................................................................................27
Figure 6.3. Algorithmic State Machine of the Slotted Transmission module - part 1 ............................................28
Figure 6.4. Algorithmic State Machine of the Slotted Transmission module - part 2 ............................................29
Figure 6.5. Main hardware blocks of the Slotted Transmission module ...............................................................30
Figure 6.6. Data path block of the Slotted Transmission module .........................................................................31
Figure 6.7. Main hardware design blocks of the Slotted Transmission module....................................................34
Figure 6.8. Clock difference hardware design block of the Slotted Transmission module....................................35
Figure 6.9. Comparator blocks for the Padding hardware design block and the Slot Transmission hardware
design block of the Slotted Transmission module ................................................................................................35
Figure 7.1. Stratix GX Serializer and Deserializer ................................................................................................36
Figure 7.2. Finite State Machine of the GXB Transmitter Interface ......................................................................37
Figure 7.3. Hardware block diagram of the GXB Transmitter Interface ................................................................39
Figure 7.4. Finite state machine of the GXB Receiver Interface...........................................................................40
Figure 7.5. Packet Alignment Scenarios with Four Segments Packets ................................................................41
Figure 7.6. Hardware block diagram of the GXB Receiver Interface ....................................................................43
Figure 8.1. AAPN slot receiver block diagram ......................................................................................................45
Figure 8.2. AAPN slot receiver processor block diagram .....................................................................................45
Figure 8.3. Slot receiver processor algorithmic state machine .............................................................................49
6
Figure 9.1. Architecture of the AAPN demonstrator optical core switch ...............................................................50
Figure 9.2. Bench top assembly of the AAPN optical core switch ........................................................................52
Figure 10.1. Loopback test configuration setup....................................................................................................52
Figure 10.2. Loopback test configuration using core switch setup .......................................................................53
Figure 11.1. View of the FPGA component of the AAPN edge node for the optical loopback test .......................55
List of tables
Table 4.1. Description of the field contents of the Ethernet payload in the PC-FPGA interface ...........................13
Table 4.2. Clock and word size parameters for the optical transmission ..............................................................15
Table 4.3. Description of the time slot contents in the optical domain ..................................................................16
Table 5.1. Register X-1: READ ADDRESS Register ............................................................................................20
Table 5.2. Register X-2: WRITE ADDRESS Register...........................................................................................21
Table 5.3. Register X-3: READ LENGTH Register...............................................................................................21
Table 5.4. Register X-4: WRITE LENGTH Register .............................................................................................22
Table 5.5. Register X-5: FLAG LENGTH Register ...............................................................................................22
Table 5.6. Register X-6: CONTROL Register.......................................................................................................23
Table 5.7. Register X-7: STATUS register............................................................................................................24
Table 6.1. Top level entity signals of the Slotted Transmission module ...............................................................32
Table 6.2. Data path signals of the Slotted Transmission module........................................................................32
Table 6.3. Control path signals of the Slotted Transmission module....................................................................33
Table 8.1. Slot receiver processor signal definitions ............................................................................................47
Table 8.2. Modifiable parameters in the slot receiver processor ..........................................................................47
Table 9.1. Optical core Switch Connectivity .........................................................................................................50
Table 9.2. AAPN core switch truth table settings..................................................................................................51
Table 10.1. Received word sequence in the loopback test configuration .............................................................53
7
1. Objectives
Within the context of the AAPN Research Network [1], the objective for the implementation of the Demonstrator
Prototype (Theme 3) is to build a scaled-down version of the AAPN to demonstrate that the ideas developed
under the AAPN project are of practical interest by showing that the component technologies developed under
Theme 2 "Enabling Technologies" and the network architectures, optimization methods and control protocols
developed under Theme 1 "Architectures and Networks" can be combined into an operational network.
The Demonstrator Prototype has been designed to be a combination of software and hardware functionalities.
The objectives for the hardware functionality are:
To implement the functions of the edge node that require high speed and precision:
Slotted optical transmission (E-O conversion),
Burst-mode reception from the optical core (O-E conversion) and
Reconfiguration of the core photonic switch
To implement and test a fast photonic switch at the core node
2. AAPN Demonstrator Prototype Overview
2.1. AAPN architecture overview
An Agile All-Photonic Network, AAPN [2], is a wavelength division multiplexed network that consists of several
overlaid stars formed by edge nodes interconnected by bufferless optical core nodes (Figure 2.1). The edge
nodes aggregate incoming traffic in larger size data units called “slots” and transmit them over the photonic links.
The core nodes consist of a wavelength stack of bufferless transparent photonic switches (a set of space
switches, one switch for each wavelength as shown in Figure 2.2) and perform fast switching in order to provide
bandwidth allocation in sub-wavelength granularities.
Edge Node
Core Node
(a) (b)
Figure 2.1. AAPN Architecture: AAPN Architecture: (a) Overlaid star topology that characterises an
Agile All-Photonic Network. (b) Use of selectors in AAPN to allow for larger numbers of edge nodes
Most proposals for photonic networks architectures envisage a mesh topology, which distributes the load over
many switches. In the AAPN, the core switches have enormous capacity, which permits the simpler overlaid star
topology to compare favourably with mesh architectures [3]. The AAPN is mainly characterized by its “agility”;
that is, the ability to rapidly adapt bandwidth allocation as the traffic demand varies, which is possible to achieve
since the core switches are fast (1µs switching time). Moreover, routing in a star network is trivial and, since the
core is all photonic, the costly optics-electronics-optics conversions are completely eliminated.
8
WDM links
Core Node
Figure 2.2. AAPN Core node architecture.
To share the network bandwidth, the AAPN may operate in two modes: Time Division Multiplexing (TDM) or
Optical Burst Switching (OBS). In TDM mode, time is ‘slotted’ and the slots arrive at the core switch
synchronously (slots are of fixed size); while in OBS mode the slots/bursts arrive at the core switch
asynchronously (slots may be of variable size). In both cases, a centralised or distributed scheduling method is
necessary to allocate the available bandwidth and solve contention among the edge nodes.
In a TDM-AAPN, the slots are a fixed size of 100kb (10µs at the target link rate of 10Gbps) and all arrive
synchronously at the core switch from the various edge nodes. Transmission schedules at the edge nodes must
therefore be appropriately coordinated with the core switch configuration schedule in order to transfer slots to
their correct destinations. Propagation delays must be taken into account. To allocate variable bandwidth
between pairs of edge nodes (flows), the schedules must be computed such that a larger or a smaller number of
time slots within a (possibly repeating) frame are assigned to flows with a higher or a lower bandwidth demand,
respectively. With a centralised approach, this involves the edge nodes sending measures of incoming traffic or
bandwidth requests to the core node, which then computes a frame schedule that adapts the core network
bandwidth to the different measures/requests. The response of the centralised system is therefore delayed at
least by the maximum edge-core round-trip propagation time plus the computation time of the schedule at the
core node.
Several centralised (e.g. [4] [5] [6]) and distributed (e.g. [7]) bandwidth sharing methods have been designed in
the context of a TDM-AAPN. In all these scheduling methods, the queueing structure in the source module
consists of a set of simple buffers/queues, organized as virtual destination queues i.e. one queue in a source
node for each of the destination edge nodes. Traffic measures and/or bandwidth requests may be computed
from the length of these queues. In the destination modules, buffers are required as well for the process of
deaggregation of slots back into the external (IP or MPLS) packets and these are conveniently organized as
virtual source queues; i.e., one queue in a destination node for each of the source nodes.
2.2. Demonstrator Prototype architecture
The Prototype architecture, as shown in Figure 2.3, consists of 4 edge nodes, with one of them being the master
edge node used to control the fast photonic switch at the core node (i.e., the core node is an edge node with
extended control functionality). Each edge node consists of a combination of software functions running on a PC
and hardware functions operating on a Stratix GX FPGA development board [8].
The optical transmission/reception is performed at 1Gbps using small-form pluggable (SFP) multi-service
agreement (MSA) transceivers made by Fujitsu [9] that are plugged into MSA cages on the FPGA development
board. The optical core switch is implemented with Civcom's Free-X 2x2 Solid Free Space optical technology
switches [10].
Only one wavelength is used for the prototype and it is assumed that no separate control channel would be
available between the edge and core switches; the necessary signalling information will be sent over the data
channel provided by the AAPN network.
A distributed software Control Platform that controls the high level functions of the core, the multiplexers and
edge nodes of an AAPN network, (e.g. bandwidth allocation, traffic monitoring and network synchronization) has
been implemented separately ([11][12]) from the work presented in this technical report. This software runs in
9
the PC module of the edge/core nodes. The Control Platform is easily adapted to different AAPN demonstrators
since the majority of the software modules are independent of the nature of the particular optical
transmission/switching components and because only the control interfaces to the optical components must be
adapted to the control protocols supported by them.
Figure 2.3. AAPN Demonstrator Prototype architecture. E1 corresponds to the master edge node.
2.3. Edge Node Architecture in the Demonstrator Prototype
The main functional blocks of the prototype edge nodes are shown in Figure 2.4. The diagram shows as well the
type and form of the information interchanged between the different blocks. The high level network functions
implemented on the PC are the synchronization among edge nodes, traffic monitoring, bandwidth allocation, etc
(Figure 2.5). The slotted transmission scheduling is therefore dictated by the software modules in the PC, while
the actual execution of it is done with the FPGA.
aggregation
virtual
queues
of slots
optical
transmission
add padding
LAN
PC
Ethernet
PC with
apps
PC with
apps
optical
reception
bandwidth
request
functions
Ethernet
bandwidth
assignment
functions
FPGA Development Board
reception
slot
queues
all other
control
functions
EDGE NODE
control
slots electrical
slots optical
LAN traffic
1 Gbps
switch configuration
and timing
Time
To
Transmit
remove
padding
CRC
compare
1 Gbps
extract
PC time
Extract
TimeToTransmit
and Switch
Configuration
extraction
MPLS
router
monitoring
add CRC
introduce to
Ethernet
payload
extract
Ethernet
payload
add
PC-FPGA
clock
difference
Figure 2.4. Block diagram of the main functions in the edge node of the demonstrator prototype.
10
Figure 2.5. Software architecture of a master node and a slave node [11]
Other low-level hardware high speed functions implemented in the FPGA are the following:
Fast control interfaces for optical burst-mode transceivers and optical core switch
Addition of padding to the AAPN data slots
Cyclic Redundancy Check for optical transmission
At the HW functionality level, the Core and Edge nodes implementations are the same. At the core node, the
output port carrying the switch configuration signal will be connected to the driver of the photonic core switch,
while at the edge nodes this port will be simply not used.
Integrating and interfacing the PC with the FPGA require the following two functions:
Interchanging the AAPN data slots between the PC and the FPGA via and Ethernet link for
transmission/reception in the optical domain
Local PC-FGPA synchronization within the edge node
3. Hardware functionality overview
3.1. Edge node
The FPGA components have been designed to perform the low-level functions that require high speed operation
and high precision, such as the configuration of the optical core switch, slotted transmission, slot reception,
cyclic redundancy check, time stamping, padding, E/O and O/E conversions and local synchronization in the
edge node.
All the hardware functionality has been implemented on Stratix GX Development boards [8] (Figure 3.1) using
custom circuits developed using programmable logical elements on a System on Chip (SoC) field programmable
gate array (FPGA). The design of the custom circuits is implemented using a Mealy finite state machine model
programmed using VHDL code. Other elements within the design are implemented using compiled software
within a hard-core microprocessor on the FPGA using on-chip memory.
The following modules were built:
1) The PC-FPGA Interface module receives Ethernet frames from the PC, which contain the AAPN slots and
their respective control information in their payloads. The interface has two sub-modules implemented in the
Stratix FPGA. The first is implemented in the NIOS II RISC processor and receives the Ethernet frames. If found
11
error free, the payloads are sent to the second sub-module, who will then send all the information via an
asynchronous serial bridge to the Stratix GX FPGA where the rest of the modules of the HW functionality are
placed. Upon reception, the interface module creates an Ethernet frame whose payload is the APPN data slot
and its control information; and then sends the Ethernet frame to the PC component.
2) The Slotted Transmission module receives the AAPN slots and their control information, then sends the
data slots to the optical transmitter at exactly the time scheduled by the TDM allocation algorithm. It keeps track
of the PC-FPGA clock difference (used to keep the two components locally synchronized), and outputs the
control signals to configure the optical core switch when the edge node in question is the master edge node.
3) The Optical Transmission/Reception module first adds padding (to achieve the higher rate of the core) and
a Cyclic Redundancy Check (CRC) field to the AAPN slot and then performs the conversion to the optical
domain by operating a Fujitsu SFP transceiver [9] at 1Gbps using 8B/10B encoding. Two FIFOS are used for
clock domain crossing. For reception, a block of the the Stratix GX FPGA is programmed as a very simple burst-
mode receiver that achieves Clock Data Recovery (CDR) in 15µs. The receiving sub-module removes the
padding after calculating the CRC field from the received data.
4) The Slot Reception module retrieves the slots from the optical receiver, performs error validation with the
CRC field, timestamps the slot with the reception time, merges the local synchronization details into the slot and
sends the AAPN slot to the PC-FPGA interface module for further transmission to the PC component.
SFP
slots
Ethernet
port
Stratix GX
FPGA
Stratix
FPGA
Electrical
ports used to
configure the
optical switch
Figure 3.1. Stratix GX Development Board by Altera Corp. [8]
A dummy traffic generator and a simple analyzer have also been implemented in order to test each of the
modules separately. All the modules have been implemented and loop-back tests of modules 3) and 4) have
been successfully completed. We are currently debugging modules 1) and 2). The main challenges for the FPGA
work have been the lack of documentation for the development board and the incompatibility with previous
versions of the hardware code compilers.
12
Figure 3.2. SFP MSA Transceiver by Fujitsu Limited [9].
3.2. Fast optical core switch
The 4x4 core switch has been implemented with a Clos array of six small switches. Fast port-to-port non-
blocking interconnection is achieved using six Solid Free Space 2x2 optical switches bought from Civcom Inc.
[10]. Actuation to configure the switches is provided by six DC-coupled electrical signals generated by the
Slotted Transmission module in the master edge node. The truth table for the core switch configuration is
programmed in the software (PC) component of the edge node and the configuration information needed for
each AAPN slot is sent within its “control information” fields as will be described in the design specifications.
The Solid Free Space technology developed by Civcom Inc. allows a switching speed of 400 ns, which complies
with the AAPD design target of 1µs; however, it turned out to have a limited reconfiguration frequency of 6 kHz
arising from the electronic drivers of the switch. This became a major speed limitation for the prototype since
consequently only one slot can be transmitted every 166.67µs (as opposed to the 10µs AAPN design target).
Our current design defines a time slot of 200µs for this reason.
Figure 3.3. Free-X
TM
2x2 optical switch by Civcom Inc. [10].
4. Design specifications
The design specifications are the result of continuous discussion among members of the software and hardware
teams over several months. They take into consideration both the characteristics and limitations of the software
and hardware equipment available to build the AAPN prototype.
4.1. PC-FPGA interface Ethernet frame
The contents of the Ethernet frames used in the PC-FPGA interface are shown in Figure 4.1 for the downstream
direction (towards the photonic core) and in Figure 4.2 for the upstream direction (towards the PC). The
CTRINFO set of bit fields contains the control information required for the accompanying AAPN Data Slot and
is relevant only to the PC-FPGA communications. The AAPN Data Slot set of fields is the stream of bits to be
sent to the photonic core of the network at a particular time dictated by the scheduler. All the fields in the
Ethernet frame payload are described in Table 4.1.
13
8,160 bits
Ethernet Payload
Control Information AAPN Data Slot
160 bits
CTRINFO
8000 bits (250 x 32 bits)
DATA
32 bits
SwitchConfiguration
26 bits
(empty)
6 bits
S0 toS5
64 bits
PCSendingTime
64 bits
TimeToSendOptical
64 bits
TimeToSendOptical
64 bits
TimeReceivedOptical
...
other data
Figure 4.1. PC-FPGA interface: Ethernet payload contents in the downstream direction.
In the upstream direction, the CTRINFO set of fields are added -after- the AAPN block of data in order to
increase the speed of operation given that the CRC and the TransmissionError bit cannot be calculated until
all the bits of the AAPN DATA field have been processed.
8,096 bits
Ethernet Payload
AAPN Data Slot Control Information
8000 bits (250 x 32 bits)
DATA
96 bits
CTRINFO
32
CRC
64 bits
TimeToSendOptical
64 bits
TimeReceivedOptical
...
other data
1 bit
TransmissionError
31 bits
(empty)
64 bits
ClockDifference
Figure 4.2. PC-FPGA interface: Ethernet payload contents in the upstream direction.
Table 4.1. Description of the field contents of the Ethernet payload in the PC-FPGA interface
Field name Size
[bits]
Size
[words]
Description Comments
SwitchConfiguration
32 1 Set of bits used to configure the core
optical switch
Only the 6 least-significant bits are
used.
A full word is used to facilitate the
PC component in writing the value
on the slot field.
S0, S1, S2, S3, S4,
S5 6 - Actuation signals for the six 2x2
switches conforming the 4x4 core
optical switch
DC-coupled TTL electrical signal
outputs from the connector pin-outs
of the Stratix GX Development
Board
14
PCSendingTime 64 2 Clock value in the PC at the moment of
sending the Ethernet frame to the
FPGA
Time value in FPGA format (a
numerical unsigned integer value)
TimeToSendOptical 64 2 Time slot at which the AAPN data slot
must be transmitted to the optical core
of the network, as determined by the
scheduling algorithm running on the
PC
Time value in FPGA format (a
numerical unsigned integer value)
It corresponds to the start of the
time slot, as opposed to the start of
the meaningful data
TimeReceivedOptical
64 2 Time at which the AAPN data slot is
received at the destination edge node.
It corresponds to the beginning of the
time slot in the optical domain.
Time value in FPGA format (a
numerical unsigned integer value)
In the downstream direction this
field is empty.
other data 7872 246 The rest of the data in the AAPN slot Irrelevant to the operation of the
FPGA component
CRC 32 1 Cyclic Redundancy Check of the data
transmitted optically, including the
padding
In the upstream direction, only the
least significant bit is used, but a
full word is still used to facilitate
extraction of the value in the PC by
respecting the “word” definition of
32-bits
TransmissionError 1 - A "0" indicates that there has been a
transmission error (ex digital decoding
errors; i.e. sampling the slot in the
wrong order, or synchronization in the
clocking between transmit and receive.
ClockDifference 64 - The difference between the
PCSendingTime in the last Ethernet
frame arriving from the PC and the
clock value in the FPGA at the time the
last Ethernet frame was sent to the PC
Time in FPGA format.
This value is used by the PC
control software to maintain
synchronization between the PC
and the FPGA components
A copy of the TimeToSendOptical field was added to the Control Information part of the Ethernet frame in the
downstream direction in order to facilitate the hardware operation. It is also important to note that the maximum
Ethernet payload is 400 x 32-bit words (12,800 bits) and, since the Ethernet network card pads the payload
automatically, there is no need to add another field in the upstream direction to make the implementation
symmetrical.
The TimeReceivedOptical field is defined as the time at which the AAPN data slot is received at the
destination edge node. As defined in the AAPN literature, it corresponds to the beginning of its assigned time
slot in the optical domain (as opposed to the beginning of the actual meaningful information being received).
4.2. Time Diagram of the time slot in the optical core
The target AAPN time slot length in the optical domain has been defined as 10µs (100kb) at 10Gbps in the
AAPN literature [2]. For this project, however, given the limitation set by the 6kHz maximum reconfiguration
frequency of the Free-X Civcom switches, the time slot for the prototype has been rounded up to 200µs as the
minimum achievable time slot is 166.67µs.
15
Figure 4.3 shows the timing diagram of slots in the optical domain for the downstream direction while Figure 4.4
shows the corresponding diagram for the upstream direction. The figures depict the relations between the field
contents of the AAPN slot and the transmission/reception time parameters.
The Guard Time is defined as the time during which no meaningful information can be transmitted/received. For
the AAPN prototype design the Guard Time is composed of:
1. the time taken by the optical switch to reconfigure (SwitchConfigurationTime), plus
2. the Clock Data Recovery (CDR) time; i.e., the time it takes the optical burst receiver to lock to the bit
stream being received
Time s
lot
in the optical domain
Optical
Data
t
TimeToSendOptical
Guard Time
TimeToSendOptical
+
Guard Time
Figure 4.3. Timing diagram in the optical domain, downstream direction.
Time
s
lot
in the optical domain
Optical
Data
t
Time
Received
Optical
Guard Time
Time when the data is received
at the optical transceiver
Figure 4.4. Timing diagram in the optical domain, upstream direction.
4.3. AAPN slot in the optical domain
The clock parameters and word size for the electrical and optical domains are shown in Table 4.2. Within this
context, the content of the AAPN time slot used for transmission/reception in the optical domain of the prototype
is shown in Figure 4.5.
Table 4.2. Clock and word size parameters for the optical transmission
Clock rate 5.00E+07 Hz
Clock cycle 2.0E-08 seconds
Electrical word size 32 bits
Optical word size 16 bits
16
Time slot length in the optical domain
2.00E-04 seconds
Guard Time
2.50E-05
Optical data transmitted
1.75E-04 seconds
SYNCH DATA PADDING CRC EOT
16 8000 131904 32 48 bits
8750 words
SCT CDR
1 500 8244 2 3 words
4.00E-07 2.46E-05 2.00E-08 1.00E-05 1.6488E-04 4.00E-08 6.00E-08 seconds
Electrical domain
4373 words
250 4122 1
words
Figure 4.5. Contents of the AAPN time slot
During the first stages of the project, padding was added to the AAPN slot in the optical domain in order to
compensate for a slight bottleneck that was foreseen in the Ethernet interface. The reason for the padding to be
so large at the end of the project was that, although the maximum line rate achieved by the Ethernet network
cards is 100 Mbps, the achieved line rate between the PC and the FPGA was measured as only ~10 Mbps. This
drastic bottleneck and thus limitation in the implementation was the result of two factors:
The Ethernet port on the Altera development board is linked only to one of the FPGA’s chipset (the Stratix
FPGA), which was used to extract and buffer the Ethernet data, and then transfer this data via a custom
designed interface to another FPGA chipset (the Stratix GX). The inclusion of this Ethernet data extraction
and hand-off forces a constraint in the throughput.
The PC-FPGA Ethernet interface on the Stratix chipset was designed to use an embedded microprocessor
(NIOS RISC processor) and off-chip memory. The direct memory access (DMA) turned out to be the main
slowing component of the interface, an issue that did not come to light until the last few stages of the project
timeline.
In the downstream direction, the cyclic redundancy check (CRC) bits at the end of the optical data transmitted are
calculated before transmission with the TimeReceivedOptical field set to zero. Upon reception at the
destination edge node, the CRC is calculated and checked -before- populating the TimeReceivedOptical
and the ClockDifference fields. The most significant bit of the CRC field is then set to 1 if there was a
transmission error in the optical domain (or left equal to 0 if the optical transmission had no errors).
Table 4.3. Description of the time slot contents in the optical domain
Field name Size
[bits]
Size
[optical
words]
Size
[electrical
words]
Description Comments
SCT - - - Optical switch configuration time
CDR - - - Clock Data Recovery time: time taken by the
optical burst-mode receiver to identify the
clock rate of the incoming bit stream
17
SYNCH 16 1 - Synchronization bits: A specific bit sequence
used to recognize the start of a block of
information
DATA 8000 500 250 AAPN block of data: the bit stream with the
meaningful network information
PADDING 131904 8244 4122 Bit sequence appended to the AAPN data in
order to fill the remaining space in the 200µs
time slot
A sequence of
alternating ones and
zeroes
CRC 32 2 1 Cyclic Redundancy Check Calculated for the DATA
and PADDING fields
EOT 48 3 - End Of Transmission: A specific bit sequence
used to indicate the end of the block of
information
5. PC-FPGA Interface
5.1. Overview
The PC interfaces with the development board via an Ethernet connection. To connect to the Ethernet hardware,
the Stratix FPGA was used. However, to connect to the optical transceivers the Stratix GX must be used, thus
the implementation required the use of both FPGAs on the development board. The Stratix FPGA communicated
with the PC over Ethernet and it communicated with the Stratix GX via a high speed serial link referred to as the
Asynchronous Serial Bus (ASB). Within the Stratix FPGA a SoC design was implemented with Altera’s SOPC
builder to link the PC to the Stratix GX FPGA, as shown in Figure 5.1. It contains an Ethernet interface, an
embedded NIOS II processor, a RAM and the custom designed ASB.
Figure 5.1. Block diagram of the PC-FPGA interface within the edge node prototype
18
5.2. Detailed description of the Ethernet interface
To process the data interchanged with the PC via the Ethernet interface, one could adopt either the software
approach or the hardware approach. Processing an Ethernet frame all by hardware (digital circuits) is
advantageous in terms of transmission speed; however, the development in general would take a much longer
time and larger manpower than the software approach. Altera provides a very easy soft processor (NIOS II) that
can be instantiated on an Altera FPGA device to implement the data processing functions using software in a
relatively short time. The instantiation even has some hardware modules in the NIOS system, such as DMA, that
can be used to speed up the data processing. Given the time constraint that the project team had, this was the
option chosen to design the interface. Chapter 11 describes some problems that were later found with this
approach.
Raw Ethernet frames are used between the PC and the Ethernet Interface (lan91c111). The Ethernet frame
structure is shown in Figure 5.2. The Ethernet interface program only deals with the following fields: Destination
Address, Source Address, Length, and AAPN Data. The AAPN Data field is processed as a whole piece without
knowing what is inside of the data structure.
Starting
Delimiter
Destination
Address
Source
Address Length
Payload
(AAPN Data Slot
+
Control Information)
Frame Check
Sequence
1 byte 6 bytes 6 bytes 2 bytes 46 - 1500 bytes 4 bytes
Figure 5.2. Ethernet frame field structure
The interface program can be roughly divided into three functional blocks as shown in Figure 5.3: Initialization,
Receiving Slot (from the PC) and Sending Slot (towards the PC).
Figure 5.3. Flowchart describing the Ethernet interface program
The Initialization Stage performs the following procedure:
1. initialize the operating system
2. initialize the internet interface hardware
3. initialize the drivers
4. initialize the Asynchronous Serial Bridge (ASB)
After the Initialization Stage, the software enters the communication part, which runs receiving and sending
functions simultaneously. The Sending Function (upstream direction, towards the PC) executes the following
steps:
1. wait for slot data from the GX FPGA
2. allocate packet memory
19
3. fill in the Destination Address with the PC’s MAC address
4. fill in the Source Address with the Altera board’s MAC address
5. fill in the length (check whether one slot = multiple packets)
6. use the DMA to transfer data from the ASB to the packet memory
7. send the packet to the PC using hardware DMA
The Receiving Function (downstream direction, from the PC to the FPGA board) executes the following steps:
1. wait for a packet from the PC
2. allocate packet memory
3. check whether the Destination Address and Source Address fields are correct
4. check the length of the data (check whether one slot = mutiple packets)
5. DMA the ethernet payload (control information plus slot data) in the buffer to ASB
The interface program employs the TCP/IP stack included in the NIOS operating system, where the following
packet buffers are used to store the incoming Ethernet packets:
extern queue bigfreeq; /* big free buffers */
extern queue lilfreeq; /* small free buffers */
All the received Ethernet frames are stored in a data structure defined as:
extern queue rcvdq; /* queue of all received (but undemuxed) packets */
When an Ethernet packet is received, the corresponding IRQ interruption of the NIOS operating system will be
launched to fill a buffer using DMA, and then stores it in rcvdq. The program thus keeps checking the length of
rcvdq to see whether there is any Ethernet frame received from the PC.
In the upstream direction, to check whether there is any slot data coming from the ASB, the following command
is used:
IORD_ASB_STATUS(ASB_BASE);
The Ethernet interface program iteratively checks the data from the ASB and the data from the PC and then
does the corresponding processing.
5.3. Detailed description of the Asynchronous Serial Bridge API
Packets stored in the on-chip RAM could be transferred to the Stratix GX by writing to the registers of the ASB.
A DMA is implemented within the ASB and only requires the start address, and length of the packet in order to
setup a transfer. Once the transfer is initiated the ASB starts buffering up the packet within a circular buffer. At
the same time, once at least one 32-bit word is buffered, the ASB hardware separates the word into 4 bytes and
sends them out over four separate 100 Mbps parallel to serial converters. To implement this, four wires that link
the Stratix to the Stratix GX are used, in addition to a reset wire.
On the Stratix GX side the packets are received asynchronously via four serial to parallel converters also running
at 100 Mbps. The bytes are reconstructed into 32-bit words and are buffered. The system has access to empty
and full flags in addition to a register that contains the number of words buffered. Packets sent from the Stratix
GX to the Stratix works much the same way except on the Stratix GX there is no DMA and the other application
hardware interface directly to the circular buffers.
When using the ASB within an embedded system, the embedded software must reference the “ASB.h” header
file:
#include "ASB.h"
This file provides the user with the necessary commands in order to work with the ASB. The commands
available are:
IORD_ASB_RD_ADDRESS(base)
IOWR_ASB_RD_ADDRESS(base, data)
20
IORD_ASB_WR_ADDRESS(base)
IOWR_ASB_WR_ADDRESS(base, data)
IORD_ASB_RD_LENGTH(base)
IOWR_ASB_RD_LENGTH(base, data)
IORD_ASB_WR_LENGTH(base)
IOWR_ASB_WR_LENGTH(base, data)
IORD_ASB_CTRL(base)
IOWR_ASB_CTRL(base, data)
IORD_ASB_FLAG_LENGTH(base)
IOWR_ASB_FLAG_LENGTH(base, data)
IORD_ASB_STATUS(base)
These commands read and write to the ASB’s registers:
READ ADDRESS Register
WRITE ADDRESS Register
READ LENGTH Register
WRITE LENGTH Register
FLAG LENGTH Register
CONTROL Register
STATUS Register
The registers and there associated commands are detailed in the following sections.
5.3.1. READ ADDRESS Register
The READ ADDRESS register, shown in Table 5.1, contains the starting memory address where data will
be read from for a transmission. To write to the register the IOWR_ASB_RD_ADDRESS(base, data) instruction
should be used. To read the register, the IORD_ASB_RD_ADDRESS(base) instruction should be used.
Table 5.1. Register X-1: READ ADDRESS Register
bit 31 - 0 RD_ADDRESS: Memory address of first word to transmit
Example Code:
// Assuming ASB_BASE represents the base address of the ASB
// Replace 0x010000 with the desired address
int x;
// write to register
IOWR_ASB_RD_ADDRESS( ASB_BASE, 0x010000 );
// read register
x = IORD_ASB_RD_ADDRESS(ASB_BASE);
printf( "RD_ADDRESS: %x\n",x);
21
5.3.2. WRITE ADDRESS Register
The WRITE ADDRESS register, shown in Table 5.2, contains the starting memory address where received data
will be written to. To write to the register the IOWR_ASB_WR_ADDRESS(base, data) instruction should be
used. To read the register the IORD_ASB_WR_ADDRESS(base) instruction should be used.
Table 5.2. Register X-2: WRITE ADDRESS Register
bit 31 - 0 WR_ADDRESS: Destination memory address of first word received
Example Code:
// Assuming ASB_BASE represents the base address of the ASB
// Replace 0x010000 with the desired address
int x;
// write to register
IOWR_ASB_WR_ADDRESS( ASB_BASE, 0x010000 );
// read register
x = IORD_ASB_WR_ADDRESS(ASB_BASE);
printf( "WR_ADDRESS: %x\n",x);
5.3.3. READ LENGTH Register
The READ LENGTH register, shown in Table 5.3, contains the number of words that will be read from memory
and transmitted. To write to the register the IOWR_ASB_RD_LENGTH(base, data) instruction should be used.
To read the register the IORD_ASB_RD_ LENGTH(base) instruction should be used.
Table 5.3. Register X-3: READ LENGTH Register
bit 31 - 16 Unimplemented: read as ‘0’
bit 15 - 0 RD_LENGTH: The number of words to be read sequentially from
memory and transmitted.
The largest value it can store is 65,535.
At the time of reading the register will always contain a value on
number lower than what was written in. This is done to make the
hardware simpler.
Example Code:
// Assuming ASB_BASE represents the base address of the ASB
int x, txlen;
txlen = 400;
// write to register
22
IOWR_ASB_RD_LENGTH( ASB_BASE, txlen );
// read register
x=IORD_ASB_RD_LENGTH(ASB_BASE);
printf( "RD_LENGTH: %d\n",x);
5.3.4. WRITE LENGTH Register
The WRITE LENGTH register, shown in Table 5.4, contains the number of received words that will be written into
memory. To write to the register the IOWR_ASB_WR_LENGTH(base, data) instruction should be used. To
read the register the IORD_ASB_WR_LENGTH(base) instruction should be used.
Table 5.4. Register X-4: WRITE LENGTH Register
bit 31 - 16 Unimplemented: read as ‘0’
bit 15 - 0 WR_LENGTH: The number of received words to be sequentially
written into memory
The largest value it can store is 65,535
At the time of reading the register will always contain a value on
number lower than what was written in. This is done to make the
hardware simpler.
Example Code:
// Assuming ASB_BASE represents the base address of the ASB
int x, rxlen;
rxlen = 400;
// write to register
IOWR_ASB_WR_LENGTH( ASB_BASE, rxlen );
// read register
x=IORD_ASB_WR_LENGTH(ASB_BASE);
printf( "WR_LENGTH: %d\n",x);
5.3.5. FLAG LENGTH Register
The FLAG LENGTH register, shown in Table 5.5, contains the minimum number of received words required to
trigger the SLOT_RDY flag. To write to the register the IOWR_ASB_FLAG_LENGTH(base, data) instruction
should be used. To read the register the IORD_ASB_ FLAG_LENGTH(base) instruction should be used.
Table 5.5. Register X-5: FLAG LENGTH Register
bit 31 - 16 Unimplemented: read as ‘0
23
bit 15 - 0 FLAG_LENGTH: The minimum number of received words required to
trigger the SLOT_RDY flag
The largest value it can store is 65,535
Example Code:
// Assuming ASB_BASE represents the base address of the ASB
int x, flaglen;
flaglen = 400;
// write to register
IOWR_ASB_FLAG_LENGTH( ASB_BASE, flaglen);
// read register
x=IORD_ASB_FLAG_LENGTH(ASB_BASE);
printf( "FLAG_LENGTH: %d\n",x);
5.3.6. CONTROL Register
The CONTROL register, shown in Table 5.6, is used to adjust the state of the ASB. It allows for the user to reset
or initialize the hardware, and start either reception or transmission. To write to the register the
IOWR_ASB_CTRL(base, cmd) instruction should be used, where cmd can be ASB_RESET, ASB_START,
ASB_GO_RX or ASB_GO_TX. To read the register the IORD_ASB_CTRL(base) instruction should be used. Note
that the ASB is full duplex, which means that transmission and reception can be executed simultaneously.
Table 5.6. Register X-6: CONTROL Register
bit 31 - 4 Unimplemented: read as ‘0
bit 3 RESET: Restart ASB
1 = Resets ASB hardware
0 = Normal operation
bit 2 START: Initialize ASB
1 = Puts ASB to listening state
0 = Normal operation
bit 1 GO_RX: Initiate reception
1 = Start reception
0 = ASB reception idle
Used as signal and is not registered, thus reading it will always
produce a ‘0’.
24
bit 0 GO_TX: Initiate transmission
1 = Start transmission
0 = ASB transmission idle
Used as signal and is not registered, thus reading it will always
produce a ‘0’.
Example Code:
// Assuming ASB_BASE represents the base address of the ASB
/* Initialize Hardware. */
IOWR_ASB_CTRL(ASB_BASE, ASB_RESET);
IOWR_ASB_CTRL(ASB_BASE, ASB_START);
// receive
IOWR_ASB_CTRL( ASB_BASE, ASB_GO_RX);
// transmit
IOWR_ASB_CTRL( ASB_BASE, ASB_GO_TX);
5.3.7. STATUS Register
The STATUS register, shown in Table 5.7, contains the status of the ASB, the number of words available to read,
the slot ready flag and busy flags. The STATUS register can only be read by the user’s application and not be
written to. To read the register the IORD_ASB_STATUS(base) instruction should be used.
Table 5.7. Register X-7: STATUS register
bit 31 - 13 Unimplemented: read as ‘0’
bit 12 - 3 DATA_RDY: Number of words availably to read
bit 2 SLOT_RDY: Slot ready
1 = number of words available >= flag_length
0 = number of words available < flag_length
For the SLOT_RDY flag to be triggered there must be more than zero
words available, even if the flag_length register is set to zero
bit 1 WR_BUSY: Write busy
1 = ASB busy writing received data to memory
0 = ASB reception idle
bit 0 RD_BUSY: Read busy
1 = ASB busy writing transmitting data from memory
0 = ASB transmission idle
Example Code:
25
// Assuming ASB_BASE represents the base address of the ASB
int status;
status = IORD_ASB_STATUS(ASB_BASE);
printf( "STATUS: [SLOT_RDY:%d][RX_BUSY:%d][TX_BUSY:%d][nwords:%d]
\n",(status&4)>>2,(status&2)>>1,status&1,(status)>>3);
6. Slotted Transmission
6.1. Overview
The slot transmission for an AAPN slot is primarily responsible for taking an AAPN data slot received from the
higher level aggregation software on the PC, digitally processing the contents of the slot and forwarding it at the
scheduled time to an optical transceiver where the digital stream is converted to an optical signal. The details of
this chapter will be focused on the Slotted Transmission module the point just before the digital data is
forwarded to the optical transceiver.
ASB FIFO
Figure 6.1. Main functional blocks of the Slotted Transmission module
The Slotted Transmission module retrieves data from the Asynchronous Serial Bridge and stores them in a large
FIFO created to hold more than one block of data, which are then processed one by one. Each block of data
consists of Control Information (SwitchConfiguration = 32 bits, PCSendingTime = 64 bits and
TimeToSendOptical = 64 bits) and the AAPN data slot (TimeToSendOptical, TimeReceivedOptical
and "other data"). The Slotted Transmission module controls the overall transmission (including configuration of
the optical switches) to make sure that the AAPN slots are sent to the core at the right scheduled time and after
26
the appropriate guard times. It also makes sure that padding bits and a cyclic redundancy check (CRC) field are
included in the data transmitted.
The processing requires the appropriate finite state machine architecture to control the type of data that is
inserted into the slot. The slot transmission on the FPGA needs to consider transmission allowance control for
the outgoing slot from a large buffer to respect the mismatch in the clock domains. The main functional blocks of
the Slotted Transmission are the Control and Data blocks, as shown in Figure 6.1.
6.2. Detailed description of slot transmission
Once the AAPN data slot is retrieved from the PC-FPGA interface, the Slotted Transmission module extracts the
SwitchConfiguration information for the optical switch from the Control Information data fields, it then saves
the PCSendingTime which is used to calculate the clock difference between the PC and FPGA and stores it in
a 64 bit register. This information is necessary for synchronization between PC and FPGA time: when any other
different data slot is received, during its upstream transmission towards the PC, the clock difference is added to
the control information by the Slot Reception block. In addition to the previous functions, Slotted Transmission
also extracts the TimetoSendOptical information which is used to enable the transmitter and make the data
available to the Optical Transmission module at the appropriate time dictated by the schedule calculated in
software.
The TimeToSendOptical is crucial element in Slotted Transmission; and entails taking into account all
possible scenarios of incoming data, such as:
1. When TimetoSendOptical < FPGA Time, the packet of data is dropped because its time to be
transmitted to the optical core has passed and transmitting it would most probably cause a collision.
2. When TimetoSendOptical = FPGA Time, the Slotted Transmission module performs the following:
a) The SwitchConfiguration information is loaded (i.e. optical switches are configured),
b) The Guard Time is accounted for (an FPGA counter is activated as soon as the switch is loaded
– at TimetoSendOptical) , and finally
c) The optical transmitter is enabled and data sent to the Optical Transmission module in the
following order: slot data, padding bits (1010….sequence) and CRC at the end of transmission
3. When TimetoSendOptical > FPGA Time, the Slotted Transmission module waits until the
TimetoSendOptical is equal to the FPGA time and then performs step 2 (a, b and c).
4. When TimetoSendOptical = 0 the Slotted Transmission block performs step 2 (a, b and c)
immediately. This option is used by the network synchronization protocols and other special cases
when the network is not operating in TDM mode.
Figure 6.2 provides a block diagram of the custom design of the slot transmission circuit within the FPGA. The
ASB FIFO shown in this figure is located in the Altera Stratix chipset on the development board. This chipset is
responsible for obtaining the slot information from the Ethernet interface. The information displayed to the right
of the ASB FIFO is integrated within the Stratix GX chipset on the development board. A custom state machine
provides control signals to each FIFO (ASB, Assembly and TCVR). The Assembly FIFO is dimensioned with
slightly more than twice the size (i.e. 512 words deep; where 1 word = 32 bits) of the slot. It is responsible for
temporarily buffering the slot for the duration of time that the state machine can decode the necessary slot
information for processing. The decoding of the information from the slot is executed by a synchronous Mealy
state machine that generates control signals for demultiplexing each slot data fields and the read and write
operations to the FIFO’s.
27
32
FPGA Time
(64-bit
Counter)
50 MHz
clock
UART FIFO TRCVR FIFO
U
A
R
T
I
N
T
E
R
F
A
C
E
T
O
P
C
T
R
C
V
R
I
N
T
E
R
F
A
C
E
T
O
C
O
R
E
ASSEMBLY FIFO (word
depth = 512
16384 packet / 32 bits) Assembly AAPN Tx
State Machine
(Provide control signals
to bypass or assemble
correct AAPN slot
information to the
transceiver interface)
Switch Status
PC Sending Time
TimeToSendOptical
TimeToSendOptical
TimeReceivedOptical
CRC
Calculator &
Comparator
50 MHz
clock
Sending Time
Current Time
Slot Payload
Slot Padding
CRC Value
32
Slot Payload
32
Clock
Difference
Calculator
50 MHz
clock
Tx Enable
Calculator
50 MHz
clock
Switch Config Signal
6
ASB FIFO
A
S
B
Figure 6.2. AAPN slot transmission block diagram
The control information is processed first (refer to Figure 4.1). The switch configuration state
(SwitchConfiguration) is the first element decoded and the 6-bit signal is latched. The latch is required in
order to provide a constant voltage to the optical switch drivers. The PCSendingTime data from the slot is then
decoded. This data field is transmitted to the “Clock Difference Calculator”, who takes the current FPGA time
(obtained from a 64-bit counter) and subtracts the PCSendingTime to calculate the ClockDifference field.
The “Clock Difference Calculator” block is a 64-bit subtractor circuit and the output is latched to allow the data to
be retrieved by the “Slot Receiver” module.
The first instance of the TimeToSendOptical data field from the slot is demultiplexed and sent to an input of a
64-bit adder circuit. The second input to the adder block is a hard-wired value (constant = 0xC4E2), which
represents the number of 20 ns clock cycles needed to conform the guard time (switch reconfiguration time plus
clock data recovery time), which is needed for the optical receiver to operate properly at the edge node. This
time is compared with the current 64-bit FPGA time (counter) and as soon as the times equal, the transmitter
enable signal is set HIGH, to allow the actual AAPN slot data to be transferred to the optical transceiver block.
The remaining data fields that are then decoded represent the actual AAPN data slot and their contents are just
transferred to the optical transceiver module unchanged and unprocessed. The state machine selects the
appropriate port on the multiplexer to arrange that the correct sequence is transmitted to the transceiver FIFO
(refer to Figure 4.1 and Figure 4.5):
1. The second instance of the TimeToSendOptical
2. The TimeReceivedOptical
3. The rest of the meaningful data sent by the PC ("other data")
4. The padding generated to fill the 200µs optical slot, a sequence of “1’s” and 0”s (4122 words)
5. The 32-bit Cyclic Redundancy Check value to provide bit verification at the receiver
Three of the fields (SwitchConfiguration, PCSendingTime, TimeToSendOptical) in the control
information just processed are then discarded and the ClockDifference field is stored in the FPGA for future
use. The cyclic redundancy check method used is described in [13] [14] and [15].
6.2.1. Algorithmic State Machine
Figure 6.3 and Figure 6.4 show the Algorithmic State Machine (ASM) of the module, which depicts all the system
operations performed by the module. The top level entity signals shown in the ASM are defined in Table 6.1, the
data path signals are defined in Table 6.2 and the control path signals are defined Table 6.3.
28
Figure 6.3. Algorithmic State Machine of the Slotted Transmission module - part 1
29
Figure 6.4. Algorithmic State Machine of the Slotted Transmission module - part 2
6.2.2. Hardware blocks
Figure 6.5 depicts the main hardware blocks of the Slotted Transmission module and the Data Path block is
shown in Figure 6.6. Please zoom in your computer for a detailed view of the figures.
30
Figure 6.5. Main hardware blocks of the Slotted Transmission module
31
Figure 6.6. Data path block of the Slotted Transmission module
32
Table 6.1. Top level entity signals of the Slotted Transmission module
Signal name Type Description
clock input, std logic 50 MHz clock
FPGA_Time input, std logic vector 64 bit actual FPGA time
tx_full input, std logic status signal from the transceiver
wrreq output, std logic write request control signal to the Transmitter module
data_in input, std logic vector 32 bit data input from ASB FIFO
mux_data_out output, std logic data out from the Slotted Transmission module
CD_result output, std logic vector 64 bit result of Clock Difference, to be read by the Receiver module
Switch_config
output, std logic vector 6 bits output signals to optical switches
en_tx output, std logic output enable signal to Transmitter module
CRC_out output, std logic vector 32 bit output CRC result to Transmitter module
RDA input, std logic input signal indicating Read Data Available from ASB FIFO
Table 6.2. Data path signals of the Slotted Transmission module
Signal name Type Description
clock input, std logic clock
FPGA_Time input, std logic vector 64 bit actual FPGA time
reset input, std logic reset signal
rst_counter input std logic resets all module counters
rdreq input std logic read request control signal
ld_switch input std logic control signal to the temporary switch register
ld_tx_time_msb input std logic control signal to the tx time msb register
ld_tx_time_lsb input std logic control signal to the tx time msb register
ld_pc_time_msb input std logic control signal to the pc time msb register
ld_pc_time_lsb input std logic control signal to the pc time lsb register
ld_CD input std logic control signal to the clock difference register
ld_main_sw_cofig
input std logic control signal to the main switch configuration register
33
en_CRC input std logic enable signal for CRC
en_slot_counter input std logic enables slot counter
en_pad_counter input std logic enables pad coutner
mux_sel input std logic vector 2 bit control signals for multiplexer
slot_rdy out std logic status signal indicating that slot is ready
time_to_tx out std logic status signal indicating the time when switch is loaded
rdy_to_tx out std logic enable transmitter signal
slot_sent out std logic status signal indicating that the slot data is sent
pad_sent out std logic status signal indicating that the pad bits are sent
data_in input std logic vector 32 bit input data
mux_data_out output std logic vector 32 bit output data from the multiplexer
CD_result output std logic vector 64 bit clock difference result
Table 6.3. Control path signals of the Slotted Transmission module
Signal name Description
clock, reset global signals
tx_full status signal from the Transmitter block
wrreq control signal to the Transmitter block
slot_rdy, time_to_tx, rdy_to_tx
status signals from data to Control Path
slot_sent, pad_sent output status signals from Data Path
rst_counters, rdreq, ld_switch output control signal
ld_tx_time_msb, ld_tx_time_lsb output enable signals for Time to Send Optical
ld_pc_time_msb, ld_pc_time_lsb output enable signals for PC Sending Time
ld_CD, ld_main_sw_config output enable signal for the CD result and switch configuration
en_tx, en_CRC output signals to Transmitter block
en_slot_counter,en_pad_counter enable signals to slot and pad counters
mux_sel select signals for multiplexer to Data Path
34
The hardware design blocks of the Slotted Transmission module are shown in Figure 6.7 (main blocks), Figure
6.8 (Clock Difference block) and Figure 6.9 (Padding block and Slot Transmission block). Please note that the
ASB is called "UART" here.
Figure 6.7. Main hardware design blocks of the Slotted Transmission module
35
Figure 6.8. Clock difference hardware design block of the Slotted Transmission module
Figure 6.9. Comparator blocks for the Padding hardware design block and the Slot Transmission
hardware design block of the Slotted Transmission module
7. Optical Transmission and Reception
7.1. Overview
Optical transmission is the process of transmitting a slot at a particular edge node towards the optical core of the
network by performing a parallel-to-serial conversion (SERDES), encoding it using 8B/10B and convert the
digital data in logical electrical bit form to an equivalent digital bit form but in the optical domain. The electrical-to-
optical conversion is done by a distributed feedback laser (DFB), directly modulated within a pluggable
transceiver module. At the destination edge node, the light is captured with the use of PIN diodes which converts
the optical photon energy back into photocurrent. The photocurrent can then be converted to an electrical
voltage which can be interpreted as digital pulses of the data stream.
36
The process of conversion to the optical domain modulates the bias current of the laser by a small percentage
and in doing so, light energy experiences amplitude dependence proportional to the bit stream on/off sequence.
These optical components and the driving circuitry reside within a compact module form-factor called small-form
pluggable (SFP) transceivers. The SFP contains the electronic circuitry need to drive and modulate the laser and
convert the photocurrent from the receiver to the necessary voltage. This section will not go into detail of this
process; rather it will focus on detailing the digital state machine needed to provide data directly to the SFP from
the FPGA and the configuration of the SFP.
7.2. Detailed description
The FPGA development board provides direct access to the SFP transceivers via an AC-coupled differential
interface at 100 ohm impedance from the Stratix GX FPGA. The differential pair provides noise immunity from
any common-mode noise that may appear between the devices. The optical interface has a limitation with line
rates between 622 Mbps up to 2.7 Gbps. The SFP does, however, allow both pre-emphasis, programmable
terminations and an equalizer to account for any data skew that may occur on the evaluation PCB interface of
both devices. The effects were minimal and no pre-emphasis was provisioned on the transmitter.
In order to properly receive the incoming data stream, a clock must be derived so that the data can be sampled
appropriately. To derive such a clock the output voltage produced by the SFP receiver is provided to the Stratix
GX FPGA and an internal programmable digital clock-data recovery circuitry (CDR) is used. The CDR is itself a
phased-lock loop (PLL) using an internal oscillator on the board as a reference clock. The frequency difference
threshold for the CDR (i.e., the allowed difference between the reference and the data) was provisioned to be
1000 ppm for the demonstrator to allow for low-quality oscillators used on the evaluation board. When the CDR
synchronizes onto the incoming data, a “lock” status signal is raised.
Figure 7.1. Stratix GX Serializer and Deserializer
On the transmitter side, a phased-lock loop is used to derive a stable clock for the data transmission. The PLL
here derives both a “fast clock” and a “reference clock”. The global clock was chosen to drive the PLL because it
has the same frequency as the internal logic of the FPGA. The maximum frequency of the digital transceiver
37
interface was proven to be close the 100 MHz in simulation. The “fast clock” is used to transmit the data of the
fast register in the serializer and the reference clock is used for the rest of the digital components of the
transceivers including the CDR.
The data stream is serialized prior to the transmission and deserialized at the reception because the maximum
frequency of the digital logic is a lot lower than the serially transmitted data. As shown in Figure 7.1, the high
speed clock of the deserializer on the Stratix GX FPGA is obtained by the received data stream using the CDR
and the low speed clock is derived by dividing the fast clock by 10 in this case. The deserializer of the receiver
shifts the data serially at the reception in the registers driven by the fast clock. The slow clock loads the data
present inside the register in parallel in order to be sent to the other digital components of the receiver.
The serializer of the transmitter requires a PLL that is able to derive a fast and a slow clock. As shown in Figure
7.1, the slow clock loads the parallel data from the digital components of the transceiver into fast registers of the
serializer every n cycles, in this case n=10. The fast clock shifts the bits present in these registers to be
transmitted.
The FPGA also contains an 8B/10B encoder/decoder that is used to encode the digital data stream before it is
serialized, and decode it after it is deserialized. The encoding converts an 8-bit data sequence to a 10-bit data
sequence by stuffing bits with a specific coding algorithm to allow removing cases when there are long
sequences of “1’s” and “0’s” in the data; a situation that is detrimental to a receiver due to the nature of the clock
recovery circuitry: the circuitry expects equally probable transitions of “1” and “0” but, in the event of long train of
“1’s” or “0’s”, the signal level produces a “DC” voltage which causes the CDR circuitry to fail in deriving a clock
from the data.
7.3. GXB Transmitter Interface
To follow the transmission protocol, a Finite State Machine (FMS) has been designed that indicates to the
transmitter when to send the synchronization stream, the end of transmission stream and the actual data.
As shown in Figure 7.2, the initial state after reset is DISABLE. In this state the transmitter sends the EOT
stream (see figure Figure 4.5. Contents of the AAPN time slot) continuously since the current transmission is
finished and there is no data to be sent. The receiver is informed that the current transmission is done and does
not wait for more packets (during this time it is possible to perform a switching operation over the medium).
Figure 7.2. Finite State Machine of the GXB Transmitter Interface
When the transmitter is enabled it goes into the WAITING state, in which synchronizing streams (SYNCH) are
sent. This state is used to synchronize the receiver before sending packets and when the transmission FIFO is
38
empty (i.e. when the transmitter is idle). The transmitter can be forced to go into this state even if the
transmission FIFO is not empty when the send_sync signal is logic ‘1’. As it can be seen the send_sync signal
cannot hold the transmitter in the WAITING state for more than one clock cycle, which ensures that the
transmitter is not idling when there is data to be transmitted.
When the transmission FIFO is not empty, the next state the FSM goes into is the transmit state (TX), in which
one segment of a packet is sent per clock cycle. The transmitter goes into the TX_DONE state after the
transmission of the last segment. This state is used to inform that the transmission of a packet is completed and
afterwards if the transmitter is enabled it goes into the TX or WAITING state. The transmitter goes back in the TX
state if the transmission FIFO is not empty or no request to send a synchronized stream has been done;
otherwise, it goes into the WAITING state. When the transmitter is disabled, it goes directly into the DISABLE
state. In order to ensure completion of the current packet transmission, there is no transition from the TX state to
the DISABLE state.
The diagram of the GXB hardware block is depicted in Figure 7.3. The only signals available to the user are the
following: enable, reset, clk, send_sync, data_in, active, load, analyzer_req,
tx_ctrl_enable and data_out. The FSM is shown at the top of the diagram with its input and output
signals indicating the current state. The block diagram contains a counter, comparator, register, encoder and
multiplexers which are required to transmit according to the protocol.
The counter in Figure 7.3 is used to know when the last segment will be transmitted. A comparator is used to
verify if the counter output (count) is equal to the number of segments minus two, since it starts to count at 0. If
the count value is equal to the next to last segment, the last_segment signal has a logical value of ‘1’, else it
is ‘0’. The same signal can be used to request the traffic generator to generate the next packet (it takes a clock
cycle for the generator to generate the next packet), together with the signal analyzer request signal
(analyzer_req). This counter increases when it is in the TX state, else it is synchronously cleared.
Prior to the transmission, the current packet to be transmitted is loaded onto the transmitter buffer register
(tx_buffer). For simplicity, the content of this register is loaded when the state is not TX. In the TX state, the
data is shifted to the right by 16 bits which is equal to the segment size. The 16 LSBs, tx_buffer[n-1..0]
are used for the transmission of the current segment.
An encoder is used to generate the select signal for the data stream multiplexer (whose output is data_out),
which also selects the appropriate type of character, tx_ctrlenable, to be sent (data or control). When the
transmitter is in states TX and TX_DONE, the input 0 of the multiplexers is selected. The character sent over
tx_ctrlenable has a logical value of ‘0’ and the tx_buffer[n-1..0] output is selected in order to send a
segment of the packet over data_out.
In the DISABLE state, the transmitter must send the EOT stream and thus the output of the multiplexer must be
‘1’. The tx_ctrlenable will have a logical output of ‘1’ because the EOT_STREAM is composed of control
characters. In the WAITING state, the multiplexer’s input 2 is selected in order to send the synchronization
stream (SYNC_STREAM) over data_out with tx_ctrlenable to a logic ‘1’ to indicate that control characters
are sent.
The load signal is used as a read signal for the transmission FIFO (data_in) in order to have a new packet
loaded into the tx_buffer. The load signal is logic ‘1’ when the current state is TX_DONE or WAITING, the
transmission FIFO is not empty and the transmitter is enabled. When a request to send a synchronization stream
is done in the TX_DONE state, the load signal is logic ‘0’. Without this extra verification, a packet would be read
in the TX_DONE state and then another in the WAITING state with one read in the TX_DONE state which
results in packets lost. The verification of the enable signal is needed because the transmitter does not send
any packet when it is disabled.
The active signal indicates when the transmitter is not in the disable state. As it was mentioned previously, the
transmitter does not go in the DISABLE state before the current transmission is completed. When the active
signal is logic ‘0’, it indicates the transmitter is disabled and transmits the end of transmission stream.
The VHDL code associated with the block diagram details the actual usage. In it, the number of segments
variable (NB_OF_SEGMENTS) specifies how many segments form a packet and its minimum value is four. The
variable NB_OF_SEGMENTS_LOG2 is the logarithm base two of the variable NB_OF_SEGMENTS. The
variable SEGMENT_SIZE has to be set to 16 because two 8-bit characters are sent every clock cycle.
39
Figure 7.3. Hardware block diagram of the GXB Transmitter Interface
7.4. GXB Receiver Interface
The receiver obeys the transmitting protocol using the FSM of Figure 7.4. When the reset signal is applied to
the receiver, the FSM goes to the DISABLE state, where the receiver waits to be enabled (sync_enable);
receiving a signal (i.e. no Loss of Signal (LOS), sync_los is logic ‘0’); or its Clock Data Recovery (CDR) to be
locked on the incoming data stream which is indicated by the sync_pll_locked signal. The receiver uses a
metastable harden flip-flop to ensure proper Clock Domain Crossing (CDC) of these signals. The CDR is locked
to the data only when sync_p11_locked is logic “1”. LOS is indicated with sync_los being set to logic ‘1’.
The receiver is enabled when sync_enable is logic ‘1’. The enable signal has to be CDC because the receiver
constantly changes of transmitting source which are in a different clock domain as the signals sync_los and
sync_pll_locked. The signals before the CDC had the prefix async instead of sync. These prefixes stand
for asynchronous and synchronous to the receiver clock domain. In any state, if the receiver is either disabled,
loses its incoming signal or is not locked to the incoming data stream, it will go into the DISABLE state.
When the receiver is enabled and it receives a signal with the CDR lock on the data, it goes into the SYNCHING
state. In this state, the receiver waits for a valid synchronization stream to ensure the received words are
properly aligned. A valid synchronization stream is made of the character K28.5 followed by K28.0 which results
in the stream h”1CBC”. When the character K28.5, sync is equal to b“01”, is detected in the eight LSBs and the
eight Most Significant Bits (MSB) contain the character K28.0 which is indicated by the signal sync_detect[1],
the receiver can go in the RECEIVING state. The other possible transition to the RECEIVING state is done by
the detection of the character K28.5, sync[1], in the eight MSBs. In the RECEIVING state, the receiver will
verify if the control character K28.0 is present in the 8 LSBs, sync_detect[0], and the sync_ver signal is set.
The reception of two consecutive alignment characters, K28.5, is an example of protocol error. From the
SYNCHING state, the receiver will synchronize to the most significant synchronization character and go into the
RECEIVING state in order to verify if the synchronization stream is valid. This is done by verifying if
sync_detect[0] and sync_ver are both logic ‘1’.
40
Figure 7.4. Finite state machine of the GXB Receiver Interface
When the receiver is in the RECEIVING state, it receives packet segments. The receiver has to go in the
SYNCHING state when a protocol error occurs. The receiver FSM monitors for two types of protocol error in the
RECEIVING state: an invalid synchronization stream and the reception of an end of transmission stream. The
former is characterized by the reception of the character K28.5 without K28.0 or vice-versa. When the
synchronization stream is done in the current clock cycle, sync_detect[1] and sync[0] have to be both
logic ‘1’, which indicate the presence of the character K28.0 and K28.5 in the 8 MSBs and LSBs respectively.
The protocol error can be detected by taking the xor of the signals sync_detect[1] and sync[0]. When the
character K28.5 is detected in the 8 MSBs, sync[1], the sync_ver signal is set to logic ‘1’. At the next clock
cycle, the receiver expects the character K28.0 in the 8 LSBs, sync_detect[0]. A protocol error can be
detected by taking the xor of the signals sync_ver and sync_detect[0]. The reception of an end of stream
character in the 8 LSBs, eot[0], will force the receiver to go in the SYNCHING state. The only case where the
receiver will not go into the SYNCHING state is when the character K28.5 is detected in the 8 MSBs, sync[1]
because it is considered as being sent more recently. The reception of an EOT stream character in the 8 MSBs,
41
eot[1], will force the receiver to go immediately in the SYNCHING state because the character is considered
as being sent more recently.
In the FSM, two more conditions other than protocol errors have to be verified when the receiver is in the
RECEIVING state. The first condition to verify is if the current segment being sent is the last one. This is
indicated by the last_segment signal being logic ‘1’. When it is not the last segment, the receiver remains in
the RECEIVING state, else it can go in the ALIGNMENT_RX or RX_LAST states. The last condition is based on
the alignment signals. When the alignment signal is logic ‘1’, the receiver goes in the ALIGNMENT_RX state;
otherwise it goes in the RX_LAST state.
Before proceeding to the ALIGNMENT_RX state of Figure 7.4, the two possible cases of alignment have to be
explained. Figure 7.5 a) and b) shows the cases where the alignment character was aligned to the 8 LSBs and
MSBs accordingly, and it takes four segments to form a packet. Each row represents a packet segment being
received during one clock cycle. In a) it is possible to see that a segment is received in one clock cycle and it
takes four clock cycles for the reception of a complete packet. In b) one half of a segment is received in a clock
cycle because of the synchronization stream. In order to receive the entire packet, it takes five clock cycles. The
last cycle of the reception contains the first half of the first segment of the next packet. The ALIGNMENT_RX
state was added for this extra clock cycle it takes to receive a packet.
Figure 7.5. Packet Alignment Scenarios with Four Segments Packets
In the ALIGNMENT_RX state of the FSM of Figure 7.4, the receiver verifies for any protocol error but it is
important to know where the current packet reception is completed. According to the Figure 7.5 b), the 8 LSBs
received during the ALIGNMENT_RX state is part of the current packet being received. By knowing this
information, any control character received in the 8 LSBs is considered as a protocol error. When there is no
alignment character, K28.5, in the 8 MSBs and any control character are detected in the 8 LSBs the receiver is
force to go in the SYNCHING state in the second condition of the FSM. The sync_detect[1] and sync[0]
signals are xor in order to detect an invalid synchronization stream. This means the reception of the control
character K28.0 without K28.5 or vice-versa. The third condition takes in consideration that the 8 MSBs contains
the alignment character K28.5, sync[1], and the 8 LSBs contains any control characters. The receiver is forced
to go in the RECEIVING state because the alignment character in the 8 MSBs is considered as being sent more
recently than the 8 LSBs. In the third condition, the detection of a valid synchronization stream in the 8 LSBs will
force the receiver to go directly in the RECEIVING state. If the receiver is not sent into the SYNCHING or
RECEIVING state, it goes in the RX_LAST state.
The RX_LAST state of Figure 7.4 is used to indicate a packet has been completely received and is ready to be
store in a FIFO. In this state, any protocol error or end of transmission stream will force the receiver to go into
the SYNCHING state. The detection of an end of stream character in the 8 MSBs will force the receiver to go
directly in the SYNCHING state because it is considered as being sent more recently. In the absence of an
alignment character in the 8 MSBs, sync[1], an end of transmission stream in the 8 LSBs, eot[0], will force
the receiver to go in the SYNCHING state. An invalid synchronization sequence like the presence of the
alignment character in the 8 LSBs, sync[0], without the control character K28.0, sync_detec[0], or vice-
versa, will force the receiver to go in the SYNCHING state. Similarly, the presence of the control character K28.0
in the 8 LSBs, sync_detect[0], without the sync_ver signal being logic ‘1’ or vice-versa will force the
receiver to go in the SYNCHING state. When an end of transmission request is done (i.e. the signal eot_req is
logic ‘1’) and no alignment are present, the receiver has to go in the SYNCHING state. When the alignment
character is present in the 8 MSBs, the receiver have to go in the RECEIVING state where the synchronization
request will be validated. When the receiver is not forced to go into the SYNCHING state, it will continue the
packet reception in the RECEIVING state.
42
The FSM of Figure 7.4 gives the priority to the following events in decreasing order:
1. The enabled signal, the LOS detection signal and the CDR lock signal. The receiver is disabled.
2. The control character, K28.5 or EOT, in the 8 MSBs are assumed to be most recently sent than the 8
LSBs. For example if two synchronizations K28.5 are present, only the one in the 8 MSBs will be taken
into consideration. At the next clock cycle, there will be verification for the presence of the character
K28.0.
3. The presence of a control character in the 8 LSBs will force the receiver to verify the synchronized
stream or end the current reception.
4. The reception of the actual data.
The receiver block diagram associated with the FSM of Figure 7.4 is depicted in Figure 7.6. The reset signal is
the only global signal. The input signals from the transceiver instantiation are clk, sync, ctrl_char,
rx_in, async_los and async_pll_locked. The input signal async_enable is used to enable the
receiver. The output signals synching_rx, sync_los, sync_pll_locked and rx_done are used to know
the status of the transceiver interface. The signal rx_done is used for writing the incoming packet onto the
receiver FIFO. The currently received packet is outputted by the signal rx_out.
The VHDL code associated with the block diagram of Figure 7.6 details the number of segments
(NB_OF_SEGMENTS) variable specifies how many segments form a packet. The minimum it can be set to is
four. The variable NB_OF_SEGMENTS_LOG2 is obtained by taking the logarithm base two of the variable
NB_OF_SEGMENTS. The last variable is the segment size which has to be set to 16 because two 8 bit
characters are sent every clock cycle.
The first component to be described in Figure 7.6 is the alignment register which has a value of 0 or 1 when the
reception looks like the diagram of Figure 7.5 a) or b), respectively. It outputs the alignment signals which
indicate to the FSM of Figure 7.4 to go into the ALIGNMENT_RX or RX_LAST state from the RECEIVING state.
It is set to logic ‘1’ when the alignment character is detected in the 8 MSBs, sync[1], and the receiver is not in
the DISABLE state. It is synchronously cleared when there is no alignment character present in the 8 MSBs,
sync[1], because it is considered as being sent more recently in time. In order to be synchronously cleared, it
also requires an alignment character to be present in the 8 LSBs, sync[0], and the receiver is not in the
DISABLE state.
The receiver buffer in Figure 7.6, rx_buffer, is used to shift right the incoming segments, rx_in, upon their
arrival. Its width is the packet length plus half a segment size. This extra width is necessary when the receiver is
aligned like in Figure 7.5 b) which means half a segment is received per clock cycle. In this case, it would
contain the segment_0_0 in the 8 LSBs followed by the remaining segments and the 8 MSBs would be part of
the next packet. The ALIGNMENT_RX state of Figure 7.4 forces the receiver to shift the incoming packet to the
right one more clock cycle and store the segment_0_0 in the 8 LSBs. When the receiver receives a complete
segment in the current clock cycle like in Figure 7.5 a), the packet is contained in the MSBs of the receiver buffer
(rx_buffer). The alignment register is used to control a two-input multiplexer which takes these two alignment
cases and outputs the received packet properly through rx_out. When the alignment register has a value of 0,
the MSBs of the receiver buffer are selected; else it is the LSBs of the receiver buffer. The output of this
multiplexer has to be fed to the receiver output FIFO.
A register is needed in order to validate the synchronization stream at the next clock cycle when the alignment
character K28.5 is detected in the 8 MSBs. As it was explained using the Figure 7.4 when the sync_ver signal
was logic ‘1’ the receiver was expecting the control character K28.0 in the 8 LSBs. In Figure 7.6, this register is
named sync_ver and is synchronously set when the alignment character is detected in the 8 MSBs, sync[1],
and the current state is not DISABLE. The register is synchronously clear whenever the 8 MSBs do not contain
an alignment character and the receiver is in the DISABLE state.
In the RX_LAST state of the FSM of Figure 7.4, the receiver has completed the reception of a packet. In this
state, the signal rx_done has a logical value of “1” in Figure 7.4 which indicates that the packet is ready to be
written in the output FIFO. The RX_LAST state signal from the state machine is used to generate the rx_done
signal.
43
Figure 7.6. Hardware block diagram of the GXB Receiver Interface
When the receiver is in the DISABLE and SYNCHING state, it does not receive data. In Figure 7.4 the signal
synching_rx is generated when the state machine is not in the DISABLE or SYNCHING state. This signal is
useful in order to know when the receiver receives or not data.
44
In the receiver block diagram of Figure 7.6, a counter is used in order to know when it is the last arriving
segment. It is synchronously reset when it is in the DISABLE state because there is no segment being received.
The detection of an alignment character (i.e. the sync signal has a different value than b“00”) will synchronously
reset the counter because it signifies the beginning of a new transmission. The counter is synchronously reset
when an end of transmission character is detected in the 8 LSBs, eot[0]. When the EOT character is present
in the 8 LSBs, it signifies the current transmission is completed and there are no valid segments in the receiver
buffer (rx_buffer). The receiver is synchronously reset when an end of transmission character is detected in
the 8 MSBs and the current state is not ALIGNMENT_RX. This comes from the fact that the receiver has
received a complete packet but has to go in RX_LAST in order to inform that its reception is completed with the
signal tx_done. It is synchronously reset during the reception of the last segment in the RECEIVING state when
the alignment signal has a value of 0. The last segment is received in the RECEIVING state when the
alignment signal is ‘0’ according to Figure 7.5 a), and the counter can be synchronously reset for this reason
when the signal last_segment is logic ‘1’. The counter is synchronously reset when an end of transmission
request, eot_req, is done. It is incremented when the request to synchronously clear is not present and the
current state is RECEIVING or RX_LAST. In these states the receiver receives segments and it needs to keep a
count. When the receiver is in the ALIGNMENT_RX state and there is no request to reset it, its value is set to 1
because it already received half a segment.
The counter output count in Figure 7.6 is used for detecting when the last segment is received. An equal
comparator is used for detecting when the count output is equal to the number of segments creating a packet
minus one. This comparator generates the last segment signal (last_segment) which is used for the state
machine and synchronously resetting the counter. Four equal comparators in Figure 7.6 are used to detect the
reception of the K28.0 characters of a synchronize stream (sync_detect[1..0]) and also to detect the end of
transmission K28.6 characters (eot[1..0]). A comparator is needed for the detection of the control character
in 8 MSBs and LSBs using the receiver data input (rx_in) and the control character detection signal
(ctrl_char[1..0]). K28.6 and K28.0 are control characters and the ctrl_char[1..0] signal is used to
distinguish them from regular characters. An extra precaution was taken by verifying that the alignment character,
sync[1..0], was not detected at the same time. The detection of an alignment character with a different
disparity will result into an unknown character than could be interpreted as the K28.6 or K28.0 characters.
In Figure 7.6, a register is used for registering the end of transmission character when it is received in the 8
MSBs in the ALIGNMENT_RX state. According to the Figure 7.5 b), the last half segment can be received with
an end of transmission segment. The receiver has to go into the RX_LAST state in order to inform the packet
has been received correctly with the rx_done signal. Then from the RX_LAST state, the counter can be reset.
This register is set to one only if the end of transmission character is present in the 8 MSBs (eot[1]) and there
is no end of transmission present in the 8 LSBs. When an end of transmission character is present in the 8 LSBs
of the ALIGNMENT_RX state, it is considered a protocol error and there is no need for an end of transmission
request. When the end of transmission request register is not set, it is synchronously reset.
8. Slot Reception
8.1. Overview
After the optical-to-electrical conversion is done by the optical receiver, the processing of the slot information
occurs in the FPGA where the digital data stream is buffered and decoded. This chapter describes the details of
the slot processing once the data stream is buffered in the FPGA memory FIFOs. The processing requires the
appropriate state machine architecture to control what is decoded from the incoming slot and what is put
together to assemble the slot that will be sent to the PC module of the edge node, which includes adding the
time the slot was received, the CRC status and the PC-FPGA clock difference. The Slot Reception module also
needs to consider slot entry allowance control for the incoming and outgoing slots from the buffers and, finally,
synchronization so that the slot is processed with a clock that is equivalent to the transmission clock.
8.2. Detailed description of slot reception
The slot receiver for an AAPN slot is primarily responsible for taking an AAPN slot from an optical signal,
converting this to an electrical signal, and digitally processing the contents of the slot and forwarding this to the
45
PC where the contents of the bit stream are analyzed and used at a higher level of the network hierarchy. The
details of this chapter will be focused on the “slot receiver processor”.
Figure 8.1 captures the general block diagram of the slot receiving process of the hardware module of the AAPN
prototype. The block diagram shows the functions required for the optical signal to be converted, transmitted and
buffered electrically within an FPGA development board. It travels into a series of logical circuits located within
two FPGAs so that the data stream can propagate to the PC via an Ethernet link. The architecture shows two
FPGAs but the design could have been constructed with only one FPGA. The reason for having two is merely a
result of the restrictions of the printed circuit evaluation board used for the project. The first FPGA transfers
Ethernet frames between the PC and the FPGA development board. The second FPGA is designed to contain
the ASB receiver needed to accept the transmitted packets from the first FPGA. The second FPGA also contains
the remaining logical circuits needed for clock recovery, deserializing, buffering and, finally, the additional
processing circuits required to analyze the incoming slot data stream. Clock extraction is done on the receive
path so that all timing is equivalent to the transmitter clock.
Optical Rx
(O-E
conversion)
CDR
SERializer-
DESerializer
32 64
TRCVR FIFO
8B/10B
Decoder
Slot Receiver
Processor
UART
Buffer +
Transmitter
32
UART-to-
Ethernet PC
Transmitter
FPGA #2
FPGA #1
ASB
ASB-to-
Figure 8.1. AAPN slot receiver block diagram
The “slot receiver processor” (Figure 8.2) is primarily responsible for extracting particular fields from the
incoming AAPN slot and recreating a new slot with these fields plus additional fields needed by the higher layer
software algorithms at the PC end. It uses a custom state machine to create control signals that connect to a
“slot field” demultiplexer. The sequence of outputs of the demultiplexer (on the right side of the figure) is, as
defined in Figure 4.1 and Figure 4.2:
1. sending time (TimeToSendOptical),
2. received time (TimeReceivedOptical), still set to zero
3. slot payload ("other data"),
4. slot padding,
5. the CRC value calculated at the source node just before transmission
32
FPGA Time
(64-bit
Counter)
50 MHz
clock
UART FIFO
32
TRCVR FIFO
U
A
R
T
I
N
T
E
R
F
A
C
E
T
O
P
C
T
R
C
V
R
I
N
T
E
R
F
A
C
E
T
O
C
O
R
E
32
ASSEMBLY FIFO
(word depth = 253
8096 packet / 32 bits)
Re-assembly AAPN Rx
State Machine
(Provide control signals
to bypass or assemble
correct AAPN slot
information to th e
UART interface)
Sending Time
Current Time
Slot Payload
CRC Status
Clock Difference
CRC
Calculator &
Comparator
50 MHz
clock
Sending Time
Received Time
Slot Payload
Slot Padding
CRC Value
ASB FIFO
A
S
B
Figure 8.2. AAPN slot receiver processor block diagram
The control signals are generated by a series of decoders at the output of several counters where each control
signal represents the value of the counters when they reach the length, in 4 byte words, of each specific slot field.
Each AAPN slot field is obtained from the transceiver FIFO buffer (after the O-E conversion).
46
For the assembly of the new slot required to be transmitted to the PC, the custom state machine also generates
the control signals and transfers them to a multiplexer (on the left side of the figure) and a FIFO buffer where it
selects the “slot field” and writes this into the buffer. This buffer is called the “Assembly FIFO” and is sized
according to the number of 4-byte words equivalent to the length of the new slot and therefore only stores one
slot.
The “slot receiver processor” is synchronized with a common clock derived from the received signal from the
optical core and the data is forwarded directly to the “assembly FIFO” using the same data width from the input
from the transceiver FIFO. Since the communication path can potentially corrupt the data that is being received,
a 32-bit cyclic-redundancy check (CRC) circuit [13] [14] [15], referred as the “CRC calculator”, is also included
after the demultiplexer. The custom state machine also provides a control signal to the “CRC calculator” block so
that it is activated when each “slot field” is read from the demultiplexer and disabled otherwise. The “CRC
calculator” is asynchronous and requires these control signals to synchronize the computation with the correct
“slot field” of the received signal. A comparator is used to compare the CRC calculated and the CRC received
within the slot. One of the slot fields created for the new recreation of the slot to be transmitted to the PC is the
“CRC status” (TransmissionError). If the output of the comparator shows that both the received CRC and
the computed value are the same, a logical ‘1’ will be set; otherwise, a logical value of ‘0’ will be set.
The sequence of outputs of the multiplexer (on the left side of the figure) are, as defined in Figure 4.2:
1. sending time (TimeToSendOptical),
2. current time (which becomes TimeReceivedOptical),
3. slot payload ("other data"),
4. CRC status (which becomes the CRC field including the TransmissionError bit)
5. clock difference (ClockDifference) as it was calculated when the last slot arrived from the PC
module
The “FPGA Global Time” represents the actual time in units of counts and is used to produce the "current time"
(TimeReceivedOptical). The FPGA Global Time is a 64-bit number at the output of a counter which uses a
50 MHz reference oscillator. At this clock rate, the counter value should roll-over after approximately 11690
years ([2
64
-1] x 20 ns / 60 sec/min / 60 min/hr / 24 hrs/day / 365.25 days/year). A large data width was chosen for
this counter so that the state machine of the “slot receiver processor” could consider this scenario negligible.
The “clock difference” slot field is computed by the “slot transmitter processor” circuit when the last slot arrived to
the FPGA module and it was latched so that the “slot receiver processor” can capture the value at the moment it
was originally calculated. Since this is computed before slots are received from the optical core, the “clock
difference” slot field should always represent a valid (positive) number.
Once an entire “new slot” is stored within the “Assembly FIFO”, the custom state machine provides additional
control signals to transfer data to the ASB transmitter buffer. Specifically, read and write enable signals are
triggered to both buffers to allow the transfer to start.
In order for the “slot receiver processor” to be adaptive under cases where: 1) the incoming serial line rate can
be faster than the internal buffer bandwidth or, 2) the transmitted slot experiences delay before it is received by
the optical receiver or, 3) the ASB transmitter buffer is sized smaller than the “Assembly FIFO” buffer; the
custom state machine de-asserts all output control signals so that it disables any read action from the incoming
receiver buffer if it is empty and it disables any write actions to the ASB FIFO buffer if it is full. During the time
where the output signals are de-asserted, the state machine is placed into a suspending mode where it then
continues to monitor the full and empty signals until they are de-asserted and allows the state machine to
resume operation.
Table 8.1 details the signals and their definitions for the “slot receiver processor” and Table 8.2 details the
variables that can be modified to adjust the length of a particular slot field. Here the slot field lengths, in 4-byte
blocks, that can be modified are the “slot payload”, “slot padding” and finally the “Assembly FIFO depth”, which
represents the “new slot” length, also in 4-byte blocks.
47
Table 8.1. Slot receiver processor signal definitions
Name Type Description
CLOCK Input Global clock for the slot receiver processor state machine
RESET Input puts the state machine in the reset state (active high)
TRCVR_FIFO_DATA_IN Input 32-bit data that arrives from the transceiver FIFO
CLOCK_DIFF_RESULT Input 64-bit data signal calculated by the Slotted Transmission
module, representing the PC-FPGA clock difference
TRCVR_FIFO_EMPTY_STATUS
Input a logical one indicates that the transceiver FIFO is empty
UART_FIFO_FULL_STATUS Input a logical one indicates that the ASB FIFO is full
FPGA_GLOBAL_TIME Input FPGA time in units of counts. It is a 64-bit counter
TRCVR_FIFO_RDREQ Output Read request signal for the transceiver FIFO
PRESTORE_UART_FIFO_OUT Output 32-bit data that will be forwarded to the ASB FIFO and
represents the output of the Assembly FIFO
UART_FIFO_WRREQ Output Write request signal for the ASB FIFO
Table 8.2. Modifiable parameters in the slot receiver processor
Name and current value Type Description
payload_fragment
maximum datawidth = 256
actual size = 246
Counter with an
end-limit decoder
Total length, in 4-byte blocks for the slot payload.
datawidth is the maximum size of the counter as
an integer (should be a power of 2).
Actual size is the slot payload length.
padding_fragment_label
maximum datawidth = 8192
actual size = 4122
Counter with an
end-limit decoder
Total length, in 4-byte blocks for the slot padding.
datawidth is the maximum size of the counter as
an integer (should be a power of 2).
Actual size is the slot padding length.
uart_fifo_packet_counter_label
maximum datawidth = 256
actual size = 253
Counter with an
end-limit decoder
Total length, in 4-byte blocks, for the new slot;
equivalent to the length of the buffer.
datawidth is the maximum size of the counter as
an integer (should be a power of 2).
Actual size is the new slot length.
The algorithmic state machine of the “slot receiver processor” is detailed in Figure 8.3 (the figure spans two
pages, please zoom in your computer). The output signals are shown to the right of the state diagram indicating
their logical value at each state. Please note that the ASB is denoted as "UART" in this figure.
48
SLOT_PACKET_COUNT_EN
PAD_PACKET_COUNT_EN
RD_REQ_TRCVR
RD_REQ_RX_FIFO
INIT
RST
WRREQ_RX_FIFO
Rd/Wr Send Time
to Tem p FIFO
Pkt count =
2
1
0
Write Cur rent
Time MSB
Write Payload to
Tem p FIFO &
Compu te CRC
Pkt count =
246
1
0
CRC_ENABLE
64B_WORD_COUNT_EN
00 00 00 00 10
00 00 01 00 10
01 10 11 00 00
UART_PACKET_COUNT_EN
WRREQ_UART_FIFO
Validate Tcvr
Empty Status
00 00 00 00 00
Read Trcvr FIFO
for Receive d Time
11 01
11 11
11 01
11 01
SLOT_PACKET_COUNT_RST
PAD_PACKET_COUNT_RST
64B_WORD_COUNT_RST
UART_PACKET_COUNT_RST
UART_DATA_MUX_SEL
1
CRC_CLR
00 10 11 01 10 01 01 0
00 10 10 00 10 01 11 0
0
0
0
Write Cur rent
Time LSB
00 00 01 00 10 11 21 0
em pty_trcvr
= 0
0
1
Reset 64B Word
Counter
00 00 00 00 10 11 11 0
Read Curre nt
FPGA Time
00 00 00 00 10 11 11 0
00 00 00 00 10 11 01 0
Reset Slot Word
Counte r
Validate T cvr
Empty Status
00 00 00 00 10 10 01 0
Check_FIFO_Empty_
FLAG_0
00 00 00 00 10 01 01 0
em pty_trcvr
1
0
Reset_Se nd_Tim e
_Word_Cntr
00 00 00 00 10 11 01 0
Check_FIFO_Empty_
FLAG_1
00 00 00 00 10 01 01 0
emp ty_trcvr
1
0
Pkt count =
2
1
0
49
Write Clock
Differe nce MSB to
Tem p FIFO
Pkt count =
253
1
0
Write to UART FIFO
Goto
INIT
00 00 01 00 10
00 01 00 00 11
11 51
11 00
Read Clock
Differe nce from
TX FSM
00 00 00 00 10 11 51 1
1
1
Perfor m CRC
Compar e / Read
Tcvr
CRC_Status
= 1
1
0
00 10 00 00 10
Validate T cvr
Empty Status
00 00 00 00 10 11 31
11 31
0
0
em pty_trcvr
= 0
0
1
Wirte CRC Status
Error
00 00 01 00 10 11 41 0
Wirte CRC Status
Valid
00 00 01 00 10 11 31 0
Write Clock
Differe nce LSB to
Tem p FIFO
00 00 01 00 10 11 61 1
Read Padding &
Compu te CRC
Pkt count =
4122
1
0
00 10 10 00 10 10 01 0
em pty_trcvr
= 0
0
1
00 00 00 00 10 11 01 0
Reset Pad Word
Counte r
11 31 0
Validate UART Full
Status
00 00 00 00 10 11 00 1
uart_full = 0
0
1
00 00 00 00 10 11 01 1
Reset UART Wor d
Counte r
Increm ent UART
Word Count er
00 00 00 10 10 11 00 1
Figure 8.3. Slot receiver processor algorithmic state machine
50
9. Core Optical Switch
9.1. Overview
The intent of an optical switching fabric is to allow modulated light signals to traverse a path between any two
edge nodes without any optical-to-electrical-to-optical conversion as in traditional switching architectures. The
non-blocking architecture approach provides light paths the possibility to traverse to any edge node freely and
without any interference from other selected ports. High port count switches that can reconfigure in the scale of
nanoseconds were not yet commercially available and had not been completed by Theme 2 of the AAPN
Research Network [1] at the time this prototype project started. Therefore, a fabrication using fast switches at
lower port densities was used to obtain the size and speed required for the AAPN hardware demonstrator. The
design is also to be wavelength agnostic so that any edge node transmitting at any particular wavelength can
communicate to another edge node through the switching fabric. This is achieved through the use of Solid Free
Space technology within the Civcom devices.
9.2. Non-blocking architecture
The architecture of the core optical switch is shown in Figure 9.1. It is a 4x4 non-blocking switch that is made up
of 2x2 switches. The architecture is a Clos-type where each individual switch is actuated by one TTL control
signal. For a 4x4 architecture, six switches are required and, therefore, 6 control lines are needed (S0 to S5).
The switches are interconnected with single-mode optical fiber. This architecture was favoured for the design of
the core switch since the Civcom switches is available only in a 2x2 format or smaller [10]. Table 9.1 provides a
description of each port symbol and their connections to the AAPN network.
4 x 4 non-blocking switch
IN0
IN1
IN2
IN3
OUT0
OUT1
OUT2
OUT3
a
b
c
d
e
f
g
h
S0
S1
S2
S3
S4
S5
S0
S1
S2
S3
S4
S5
Figure 9.1. Architecture of the AAPN demonstrator optical core switch
Table 9.1. Optical core Switch Connectivity
Switch Port Port Connection Name
IN0 Master Transmitter
IN1 Edge Node 0 Transmitter
IN2 Edge Node 1 Transmitter
51
IN3 Edge Node 2 Transmitter
OUT0 Master Receiver
OUT1 Edge Node 0 Receiver
OUT2 Edge Node 1 Receiver
OUT3 Edge Node 2 Receiver
In order to establish a path between a particular edge node’s transmitter to another edge node’s receiver, the
correct settings for each switch will have to be obtained with the appropriate control signals S0 to S5. The truth
table of the switch, Table 9.2, depicts the settings of inputs (control lines) to outputs needed for the correct
operation of the prototype. The table does not include the entire number of possible combinations of settings for
the control lines since any duplicate settings that achieve the same path configuration can be removed.
Moreover, it is assumed that no edge node can transmit data to itself (however it is very easy to include these
additional cases in the future for the purposes of testing).
Table 9.2. AAPN core switch truth table settings
S0 S1 S2 S3 S4 S5 Hex Code OUT-0 OUT-1 OUT-2 OUT-3
0 0 0 0 1 1 0x03 IN-1 IN-0 IN-3 IN-2
0 0 0 1 1 1 0x07 IN-2 IN-0 IN-3 IN-1
0 0 1 0 1 1 0x0B IN-1 IN-3 IN-0 IN-2
0 0 1 1 0 0 0x0C IN-3 IN-2 IN-1 IN-0
0 0 1 1 0 1 0x0D IN-3 IN-2 IN-0 IN-1
0 0 1 1 1 0 0x0E IN-2 IN-3 IN-1 IN-0
0 0 1 1 1 1 0x0F IN-2 IN-3 IN-0 IN-1
0 1 0 1 1 0 0x16 IN-3 IN-0 IN-1 IN-2
0 1 1 0 1 0 0x1A IN-1 IN-2 IN-3 IN-0
Figure 9.2 shows the implemented 4x4 optical switch mounted on a bench top powered by an AC/DC power
supply required for the internal high-voltage drivers. The Civcom switches were packaged in the metal boxes for
easy and safe use in the lab.
52
Figure 9.2. Bench top assembly of the AAPN optical core switch
10. Experimental test results
The results obtained to confirm the correct operation of the “slot transmitter” and the “slot receiver” processors
were based on two setup configurations. The first was connecting the output of the transmitter directly to the
receiver as in Figure 10.1 in order to validate that the slot sequence being transmitted is identical to that being
received. The sequence transmitted was generated by a custom slot generator that sends 32-bit words. The
received sequence was printed on a computer screen where it was analyzed in order to be tested for
consistency.
Slot Receiver
Slot Transmitter
Optical Rx
(E-O
conversion)
Optical Rx
(O-E
conversion)
Custom Slot
Rx from PC
Custom Slot
Tx to PC
FPGA #1 FPGA #2
Optical Tx
(E-O
conversion)
Optical
Rx
(O-E
conversion)
Figure 10.1. Loopback test configuration setup
53
The second test configuration was set by replacing the optical loopback with the core switch as in Figure 10.2.
Again the sequence is analyzed for consistency with what was transmitted.
Slot Receiver
Slot Transmitter
Optical Rx
(E-O
conversion)
Optical Rx
(O-E
conversion)
Custom Slot
Rx from PC
Custom Slot
Tx to PC
FPGA #1 FPGA #2
4 x 4 Optical Core
switch
Optical Tx
(E-O
conversion)
Optical Rx
(O-E
conversion)
Figure 10.2. Loopback test configuration using core switch setup
Both configurations yielded identical correct outcomes for the sequence received. For this experiment, the
number of words being transmitted is 255 and the number of words received is 253, as expected: the received
length is exactly the size of the new slot or the length of the “Assembly FIFO”, which is 64 bits shorter than the
transmitted slot (refer to the slot format specifications in chapter 4). The custom sequence begins with 1000 and
increments until the length of the new slot is completed. The returned values of each word field can be seen in
Table 10.1, which describes the fields that are reported when the slot is received by the screening PC. As
expected, the first returned value is 1005 and not 1000 since the first words are cut out by the slot transmitter
processor: in the downstream direction the first 3 parameters that represent the SwitchConfiguration, the
PCSendingTime, and the TimeToSendOptical (a total of 5 x 32-bit words) have been removed.
The correct computation of the CRC has been verified by purposely replacing one of the slot fields with a
different value just prior to optical transmission so that the receiver calculates a different CRC compared to the
transmitter. In this case, the status reported was 0, indicating an error in transmission.
Table 10.1. Received word sequence in the loopback test configuration
Slot Word Index Received Slot
Sequence
Description
0 1005 Represents the least significant 32-bit word of the TimeToSendOptical
parameter
1 1006 Represents the most-significant 32-bit word of the TimeToSendOptical
parameter
2 0 Represents the least significant 32-bit word of the current FPGA global time
3 439940955 Represents the most significant 32-bit word of the current FPGA global time
4 1009 Represents the start of the slot payload ("other data")
5 1010
6 1011
Slot payload
54
7 1012
8 1013
9 1014
10 1015
11 1016
12 1017
13 1018
14 1019
15 1020
16 1021
Etc…. Etc….
240 1245
241 1246
242 1247
243 1248
244 1249
245 1250
246 1251
247 1252
248 1253
249 1254
250 0 Represents the 32-bit CRC status flag.
A status = 0 indicates that there has been a transmission error
251 4294966295 Represents the least significant 32-bit word of the clock difference calculation
obtained from the slot transmitter processor
252 439889910 Represents the most significant 32-bit word of the clock difference calculation
obtained from the slot transmitter processor
The critical synchronization parameters needed between the Slotted Transmission module and the Slot
Reception module are the guard time and the correct number of words to be used for padding. The length
55
(number of words) for the padding is adjusted by the amount of guard time needed in order to keep the total slot
length fixed. For the 200 µs time slot in the current design, a total number of 4122 padded words are needed.
11. Summary and Discussion
The hardware implementation of the "medium speed" demonstrator prototype has been finalized, with only a few
minor bugs still to be addressed. The demonstrator is intended to be a scaled-down version of an AAPN: it
consists of a collapsed network of one 4x4 optical core switch and 4 edge nodes, each conformed with a PC and
a FPGA development board. A 100 Mbps Ethernet network card connects the PC to the FPGA development
board and a 1 Gbps optical transceiver connects the FPGAs to the core switch. One of the edge nodes operates
as the master edge node and is used to control the fast optical core switch.
Figure 11.1. View of the FPGA component of the AAPN edge node for the optical loopback test
The design of the edge node consists of a division of labour between the FPGA component and the PC
component. High level functionality such as traffic aggregation, traffic monitoring, bandwidth allocation functions
and network synchronization protocols were implemented in the software component on the PC of the edge
nodes, which was addressed in the "Software Control Platform" parallel project described in [11] and [12]. Low
level, fast functions have been implemented in hardware, which is the work presented in this report. The
"Hardware Functionality", as it is called here, consists of: interface to the PC component, precise slotted optical
transmission, optical burst-mode reception and configuration of the network core switch. The hardware has been
implemented using custom circuits developed using programmable logical elements on a System on Chip (SoC)
field programmable gate array (FPGA). The design of the custom circuits is implemented using a Mealy finite
state machine model programmed using VHDL code. Other elements within the design are implemented using
compiled software within a hard-core microprocessor on the FPGA using on-chip memory.
56
The core optical switching fabric has also been built as part of the Hardware Functionality. It has been
implemented and tested using six Civcom's Free-X 2x2 switches in a Clos-type architecture. The switch is fully
operational and can be reconfigured in 400ns, which complies with the AAPN target of 1µs. It has, however, a
limitation in its reconfiguration frequency since it can be only 6 KHz, which dictates that only 1 AAPN data slot
can be transmitted every 166.67 µs. The original design target was to have a time slot of 10 µs.
Some of the knowledge gained while working on this project is discussed in the sequel.
The HW component presented here has been designed for a network operating in TDM mode; however, it will
also work for an AAPN operating in Optical Burst Switching mode because the HW functionality does not
determine the TimeToSendOptical, it only executes it. To use the FPGA component in an OBS-AAPN, only
the definition of the variable TimeReceivedOptical has to be changed: the Guard Time should not be
subtracted from it.
The slot pointer correct alignment in the optical domain was difficult to achieve - the transmit and receive
components need to have their slot indices aligned in order for the custom state machine within the transmitter
and receiver portion of the FPGA to encode and decode the correct slot information.
Local synchronization of the two components of the Edge Node is a complex issue. During the initialization
phase of the edge node and in normal operation, synchronization between the PC and its corresponding FPGA
was achieved by calculating, at the FPGA, the time difference between the timestamp field within the slot and
the current time in the FPGA. This time is actually an integer value (implemented as a counter) that resides on
the FPGA and is inserted into the slot as a 64-bit value. The difference between these integer components is
denoted as the clock difference, which is sent back to the PC in order to keep track of the delay offset in the
electrical domain needed for future transmissions in the optical domain. It is important that this clock difference
be calculated immediately after the PC time has been extracted from the Ethernet frame, otherwise one risks
measuring not only the clock difference but also the time elapsed between a slot being transmitted to the core
and a subsequent slot being received at the edge node (at low network loads, this may be a long time).
The efficiency of the prototype would also be greatly improved by implementing a more complex optical burst-
mode receiver since currently it takes 25µs to recover the clock. This time is acceptable relatively to the current
time slot length of 200µs (it represents 12.5% of the time slot, close to the 10% target in the AAPN literature),
though. It is important to note as well that a suitable much faster optical transceiver will be provided by
researchers working on Theme 2 "Enabling Technologies" of the AAPN Research Network [1].
Not all the functionality of the prototype has been fully tested because of several problems in the integration of
software and hardware. Every hardware and software part has been tested separately but errors still appear
when certain blocks are put together. The team is still working on this task.
The data rate between the PC to the FPGA proved to be the largest obstacle. Although the maximum line rate
achieved by the Ethernet card is 100 Mbps, the speed measured between the PC and the Stratix FPGA was
only ~10 Mbps. The limitation is a result of the Ethernet interface being implemented using an embedded
microprocessor (NIOS RISC processor) and off-chip memory, whose advantage is in general a reduction of
implementation time and effort. Due to the immaturity of the related technologies, however (especially the
operating system, and the lack of technical support), there exist a lot of bugs in both the hardware and software
systems, which resulted in a large amount of wasted time and effort after all, trying just to figure out how to
program around the system bugs. Moreover, both the IDE (Integrated Development Environment) system and
the NIOS operating system keep getting upgrades and every new version has a different set of system bugs.
Maintaining even the same code thus became a time-consuming conundrum game. In the end, the overall
development time of the software-based PC-FPGA interface took much longer than it was originally expected,
and the hardware implementation could have been implemented for the same amount of time and effort.
One can remove the speed limitation of the Ethernet interface by implementing dedicated hardware on the
FPGA (custom state machine) and use either SRAM and/or internal chip memory from the FPGA. The most
desirable alternative would be to eliminate the intermediary Stratix FPGA chipset and link the Ethernet directly to
the Stratix GX FPGA chipset; however this is not possible given the interconnections layout of the development
board. Unfortunately, the limited PC-FPGA link rate imposes a large bottleneck to the design of the AAPN edge
node as the bottleneck was expected to occur on the electrical-to-optical side based on the architecture and not
at the PC-to-FPGA interface.
Though the digital design elements (the transmit portion of the FPGA, the receive portion of the FPGA, and the
optical transmission) proved to have their limitations, the implementation choices were always made with the aim
of achieving the desired performance. In the design of the transmission to the optical core, the Mealy state
57
machine allowed full control of the flow of slots from the incoming buffer located on the PC-to-FPGA interface.
The concept of addressing boundaries of the designs using these queues avoids any synchronization issues
between different design methodologies. The same concept was applied in the reception from the optical core;
where there is a rate differential between slots coming from the optical transceivers to the lower speed PC-to-
FPGA interface. The FIFO buffers provide a level of flow control for the incoming information and holds off any
data if the opposite side is overloaded. The transceiver design also addresses this issue in a different style that
was denoted as clock-domain-crossing. Unfortunately, the different bus widths requires the design to handle the
information in smaller segments (16-bits instead of 32-bits) and extra logic is required to address this. To avoid
the harsh buffer memory requirements that are needed to deal with the synchronization of the buses between
the different digital blocks, the design could be modified within the FPGA to keep the same bus width throughout
in order to maximize speed and yet provide simplicity.
12. Future work
There are two main items for future work on the Edge Node. The first is to implement an improved version of the
optical receiver that will not need PLLs to recover the clock since a much faster operation is needed. The second
item is to simplify and improve the implementation of the Ethernet interface to the PC from the FPGA board,
which would not only remove the biggest bottleneck of the implementation but would also allow for a smoother
integration. This could be achieved either by:
designing custom developed circuitry instead of using the NIOS processor on the Stratix chipset,
and / or
using a faster and larger FPGA development board; preferably one that allows a direct connection between
the Ethernet port and the faster FPGA chipset
It is also intended to make measurements of the core switch optical impairments to establish distance penalties
of the core, relative to the edge nodes. However, given that the biggest limitation of the current optical switches
is the low reconfiguration frequency, newer and more integrated technologies are being sought such as
semiconductor optical amplifiers to address both the switching frequency and loss issues.
The funding for these activities will be addressed through other university grant proposals; mainly for projects on
Passive Optical Networks where all our knowledge gathered from the AAPN projects can be effectively used
given their strong relationship.
References
[1] The Agile All-Photonic Networks (AAPN) Research Network, 2003-2007. Available:
http://www.aapn.mcgill.ca/.
[2] T.J. Hall, S. A. Paredes and G. v. Bochmann, “An Agile All-Photonic Network”, International Conference on
Optical Communications and Networks, ICOCN 2005; Bangkok, Thailand, 14-16 December 2005.
[3] R. Vickers and M. Beshai, “PetaWeb architecture”, 9th International Telecommunications and Networking
Planning Symposium, Toronto, Canada, 2000.
[4] J. Zheng, C. Peng, G. v. Bochmann and T.J. Hall, “Load balancing in all-optical overlaid-star TDM networks”,
Proceedings of IEEE SARNOFF’06 conference, Princeton, USA, 27-28 March, 2006.
[5] C. Peng, S.A. Paredes, T.J. Hall and G. v. Bochmann, “Constructing Service Matrices for Agile All-Optical
Cores”, The 11th IEEE Symposium on Computers and Communications, ISCC 2006; Pula-Cagliari, Italy,
26-29 June 2006, pp 967-973.
[6] L. Mason, A. Vinokurov, N. Zhao, D. Plant, "Topological design and dimensioning of agile all-photonic
networks", Computer Networks, Vol. 50, No. 2, February 2006, pp 268-287.
[7] S. A. Paredes, T. J. Hall., "A Load-Balanced Agile All-Photonic Network", The Eleventh IEEE Symposium on
Computers and Communications (ISCC 2007), Aveiro, Portugal, 1-4 July 2007, pp 107-114
[8] Stratix GX Development Board, Altera Corp, 2003. Available:
http://www.altera.com/literature/ds/ds_stx_gx_dev_bd.pdf , http://www.altera.com/literature/lit-sgx.jsp ,
http://www.altera.com/
58
[9] SFP MSA Transceiver, Fujitsu Limited, January 2008. Available:
http://www.fujitsu.com/downloads/OPTCMP/lineup/sfpmsa/sfp-catalog-e.pdf
[10] Free-X
TM
Ultra-Fast Optical Switch Series, Civcom Inc., 2001-2009. Available:
http://www.civcom.com/admin/pdf/SysPic/OSdatasheet.pdf ,
http://www.civcom.com/Free_light.asp?MainID=11&Name=Free-X%20Family ,
http://www.civcom.com/admin/Articles/SPic/SFS.pdf, http://www.civcom.com.
[11] Y. Deng, "Design and Implementation of Signaling and Traffic Control for AAPN", Ph.D. thesis, School of
Information Technology and Engineering, University of Ottawa, 2007.
[12] G. v. Bochman, "Design of an agile all-photonic network", Proc. SPIE, Vol. 6784, 67842Y (2007),
DOI:10.1117/12.751911, November 2007
[13] R. F Hobson, K. L. Cheung, “A High-Performance CMOS 32-Bit Parallel CRC Engine”, IEEE Journal of
Solid-State Circuits, Vol. 34, No. 2, Feb. 1999, pp 233-235.
[14] M. Sprachmann, “Automatic Generation of Parallel CRC Circuits”, IEEE Design and Test of Computers, Vol
18, No. 3, May 2001, pp 108-114.
[15] G. Albertengo, S. Riccardo, “Parallel CRC Generation”, IEEE Micro, Vol 10, No. 5, Oct. 1990, pp 63-71.
59
Appendix 1. Project team members
Name Major tasks Minor tasks
Gregor v. Bochmann
bochmann@site.uottawa.ca
Overall project supervision
Prototype design
Jonathan Couturier
jcouturi@site.uottawa.ca
Optical transmission and
reception
Prototype design
Pino G. Dicorato
pdicorat@site.uottawa.ca
Edge node slot reception
Core optical switch
Prototype design
Testing
Edge node slot transmission
Optical transmission and
reception
Peter Farkas
farkasengineering@gmail.com
Asynchronous Serial Bridge
within the PC-FPGA interface
Custom dummy traffic generator
and analyzer
Prototype design
Trevor J. Hall
thall@site.uottawa.ca
Overall project supervision
Prototype design
Sofia A. Paredes
sparedes@site.uottawa.ca
Project management
Prototype design
Coordination with software team
Blerim Qela
bqela@site.uottawa.ca
Edge node slot transmission
Prototype design
Robert Radziwilovicz
radziwil@site.uottawa.ca
Technical support
Prototype design
James Y. Zhang
zhang_yi_ming@hotmail.com
Ethernet interface within the
PC-FPGA interface
Prototype design
Appendix 2. Hardware codes
The file aapn_prototype_hw_functionality.zip , delivered with this document, contains all the hardware
codes written for this project.
... A simple 978-1-4244-4671-1/09/$25.00 ©2009 IEEE software to control the WSSs has also been implemented, which corresponds to a first version of the control plane. It is intended to deploy a prototype of the AAPN [19] on top of an expanded version of this reconfigurable optical network with a more complete control plane. ...
Conference Paper
The advantages and issues in deploying a fast photonic network on top of a reconfigurable WDM network are discussed. The agile photonic network is deployed as another user of the reconfigurable optical WDM network (RON), with the reconfigurable optical switches setting up the optical circuits that define the virtual topology for the agile network. The services provided by the agile network are then carried over the wavelengths that are assigned to it by the global control plane of the RON. Such deployment would allow the agile network to provide the fast optical time division multiplexing (OTDM) scheduling techniques warranted for fast-changing, low-capacity traffic flows typical of metropolitan and access networks; while sustained, high-capacity flows would remain in whole lightpaths provided at the RON level to other users. Connectivity options are described for edge and core nodes, as well as the functionality requirements of the global control plane that would manage such a deployment.
Article
Full-text available
Agile All-Photonic Networks" (AAPN) is the theme of a Canadian research collaboration. An AAPN is a wavelength-division-multiplexed network that consists of several overlaid stars formed by edge nodes that aggregate traffic, interconnected by bufferless optical core nodes that perform fast switching in order to provide bandwidth allocation in sub-wavelength granularity. Specific issues addressed in this context are (a) efficient bandwidth allocation, (b) routing of MPLS flows over the AAPN, (c) allocation of protection paths, and (d) development of a demonstration prototype. This paper high-light research results and design choices related to these issues.
Conference Paper
Full-text available
This paper presents an overview of recent and current work being conducted in the "Agile All-Photonic Networks", AAPN, Research Network. An AAPN is a wavelength division multiplexed network that consists of several overlaid stars formed by edge nodes that aggregate traffic, interconnected by bufferless optical core nodes that perform fast switching in order to provide bandwidth allocation in sub-wavelength granularities. The architectures, tools and methods being developed for its operation are described, as well as the issues to be solved.
Conference Paper
Full-text available
Load balancing is an effective solution to relieving network congestion and achieving good network performance. This paper investigates routing strategies for load balancing in all-optical overlaid-star TDM networks. A random routing strategy and a least-congested-path routing strategy are first presented, based on which a weighted-least-congested-path routing strategy is then proposed. The proposed strategy takes into account both load balancing and end-to-end delay in path selection, and thus can achieve better delay performance while maintaining the same blocking performance under low traffic load as compared with the other strategies. The performance of the routing strategies is evaluated through simulation results.
Article
We present the design and analysis of an Agile All-Photonic Network (AAPN); in the context of our study, the agility is derived from sub-microsecond photonic switching and global network synchronization. We have articulated a set of circuit design alternatives in terms of switch configurations referred to as symmetric and asymmetric designs, and two-layer and three-layer designs and discuss the implications of these alternatives in terms of transmitter and receiver design and synchronization requirements. In order to evaluate performance and cost of this range of design alternatives, we developed a set of software tools and methodologies for designing and dimensioning our vision of an AAPN. The topological design problem consists of determining the optimal number, size and placement of edge nodes, selector/multiplexers and core switches as well as the placement of the DWDM links so as to minimize network costs while satisfying performance requirements of the supported traffic. A new mixed integer linear programming formulation is presented for core node placement and link connectivity. A methodology has been developed for two-layer and three-layer network topology design and implemented in software. These tools were exercised under a wide variety of equipment cost assumptions for both a metropolitan network and a long-haul network assuming a gravity model for traffic distribution and a flat community of interest factor. Key findings include the determination of near cost optimal designs for both metropolitan (two-layer design) and a Canadian wide area network (WAN, three-layer design). We also show the cost and topology sensitivity to the selector switch size and the preferred size in terms of port count and number of switches.
Conference Paper
We investigate a load-balancing method in a time division multiplexed agile all-photonic network (AAPN), which has a star topology and a buffeHess optical core. This method for bandwidth sharing is derived from a packet switch architecture that consists of three electronic buffering stages holding layered cross-point queues and two optical transpose interconnects between the stages. For AAPN, the architecture is folded: the slots (the data units) are fust sent to the same set of edge nodes acting as intermediate stage, and then sent to their final-destination edge nodes. The approach is suitable for the metro / access scenarios in which propagation delays are small and it simplifies the scheduling problem significantly since it is fixed and performed locally, without the need for signaling or centralised schedulers.
Conference Paper
A semi-analytical method based on alternate projections on a linear vector space is used to construct a service matrix from a traffic matrix, where the traffic matrix represents the bandwidth requested by the edge nodes and the service matrix represents how the bandwidth will be distributed by the core of an optical star network that operates in a Time Division Multiplexing mode. The algorithm iterates over a mathematical expression of complexity O(N^2), where N denotes the number of edge nodes. The complexity of the method is therefore O(kN^2) where k denotes the number of iterations needed to converge. With N large enough one observes that kleleN and hence this expression tends to O(N^2). Results show that the service matrices obtained with this projection method have very high measures of similarity to the original traffic matrix, with an average similarity greater than 95% for N geqslant 32 . The method is robust to inadmissible/bursty traffic and yields equal or improved delay performance in the optical network compared to other allocation methods.
Article
A parallel CRC circuit simultaneously processes multiple data bits. A generic VHDL description of parallel CRC circuits lets designers synthesize CRC circuits for any generator polynomial or required amount of parallelism.
Article
Theoretical aspects of encoding cyclic redundant codes (CRCs) are reviewed. A method of designing hardware parallel encoders for CRCs that is based on digital system theory and z-transforms is presented. It allows designers to derive the logic equations of the parallel encoder circuit for any generator polynomial. A few interesting application areas for hardware parallel encoders are pointed out.< >
Article
Design highlights for a 32-bit parallel cyclic redundancy check (CRC) generator engine are presented. In a 0.8-μm three-layer-metal CMOS process, the engine could handle about 5 Gbps data throughput. A compact layout is achieved by predecoding eight groups of four bits followed by performing a binary tree reduction on nets that are sorted by fanout. There are six gate delays plus a single-phase clock edge-triggered register
PetaWeb architecture
  • R Vickers
  • M Beshai
R. Vickers and M. Beshai, "PetaWeb architecture", 9th International Telecommunications and Networking Planning Symposium, Toronto, Canada, 2000.