Conference PaperPDF Available

High-Performance and Low-Cost Computing for Indoor Airflow

July 2009

July 2009

Conference: the 11th Conference of International Building Performance Simulation Association (Building Simulation 2009)
At: Glasgow, U.K

Authors:

Pennsylvania State University

Computational fluid dynamics (CFD) can provide detailed information of flow motion, temperature distributions and species dispersion in buildings. However, it may take hours or days, even weeks to simulate airflow in a building by using CFD on a single central processing unit (CPU) computer. Parallel computing on a multi-CPU supercomputer or computer cluster can reduce the computing time, but the cost for such high performance computing is prohibitive for many designers. Our paper introduces high performance parallel computing of the airflow simulations on a graphics processing unit (GPU). The computing time can be reduced by 10-30 times using the GPU. Furthermore, the cost of purchasing such a GPU is only $500, which is less than 2% of a multi-CPU supercomputer or a computer cluster for the same performance.

Comparison of the computing time used by the FFD model on GPU with that on CPU

…

Figures - uploaded by Wangda Zuo

Content may be subject to copyright.

Content uploaded by Wangda Zuo

Content may be subject to copyright.

HIGH-PERFORMACNE AND LOW-COST COMPUTING FOR INDOOR AIRFLOW

Wangda Zuo and Qingyan Chen

National Air Transportation Center of Excellence for Research in the Intermodal Transport Environment (RITE),

School of Mechanical Engineering, Purdue University, USA

Corresponding email: yanchen@purdue.edu

ABSTRACT

Computational fluid dynamics (CFD) can provide

detailed information of flow motion, temperature

distributions and species dispersion in buildings.

However, it may take hours or days, even weeks to

simulate airflow in a building by using CFD on a

single central processing unit (CPU) computer.

Parallel computing on a multi-CPU supercomputer or

computer cluster can reduce the computing time, but

the cost for such high performance computing is

prohibitive for many designers. Our paper introduces

high performance parallel computing of the airflow

simulations on a graphics processing unit (GPU). The

computing time can be reduced by 10 - 30 times

using the GPU. Furthermore, the cost of purchasing

such a GPU is only $500, which is less than 2% of a

multi-CPU supercomputer or a computer cluster for

the same performance.

INTRODUCTION

To design a comfortable, healthy, and energy-

efficient building, it is essential to know some key

parameters of the indoor air, such as the distributions

of air velocity, air temperature, species

concentrations, and pressure. Those data can be

obtained by computer simulations (Chen 2009;

Davidson 1989). The simulation results should be

informative so that the designers can evaluate not

only the macro environment in the entire building,

but also the microenvironment in each room. The

simulations should also be sufficiently fast to explore

various alternatives during the design process

(Hughes et al. 1994).

By solving the Navier-Stokes equations and other

transport equations with an enormous amount of

computing nodes, CFD can capture the flow details

with good accuracy (Ladeinde and Nearon 1997;

Nielsen 2004). However, when the simulated flow

domain is large and complex, such as flow in a

moderate size building, CFD can be computationally

demanding if the simulation is performed on a single

CPU computer (Lin et al. 2005; Mazumdar and Chen

2007).

In order to accelerate the CFD simulation, many

researchers, such as Crouse et al. (2002) and

Mazumdar and Chen (2008), executed simulations in

parallel on multi-CPU computers. The parallel

computing can greatly reduce the computing time.

However, this effort does not reduce the cost for

equipment purchase and installation, the space for

installing the computers, and the capacity of the

cooling system used in the space (Feng and Hsu

2004). Hence, the multi-CPU computing is luxury for

building designers. It is necessary to find high-

performance and low-cost computing hardware for

simulating flow in buildings.

Recently, GPU has attracted attention for parallel

computing. Different from CPU, GPU is the core of a

computer graphics card, which integrates multiple

streaming processors on a chip. The GPU structure is

highly parallelized for high performance graphics

processing. For example, a NVIDA GeForce 8800

GTX GPU available in 2006 integrated 128

processors so that its peak computing speed is 367

GFLOPS. Comparatively, the peak performance of

an INTEL Core2 Duo 3.0 GHz CPU available at the

same time is only about 32 GFLOPS (Kirk and Hwu

2008). Figure 1 compares peak performance of the

CPU (INTEL) and the GPU (NVIDIA). The

performance gap between the CPU and GPU has

been expanding since 2003 (NVIDIA 2007).

Furthermore, this trend is likely to continue in the

future. Besides its high performance, the cost of a

GPU is low. For example, a graphics card with

NVIDIA GeForce 8800 GTX GPU costs only around

$500 and it can easily be installed into a personal

computer.

CPU handles sequential jobs so that it increases the

computing speed principally by increasing its clock

frequency. Unlike CPU, GPU is normally used for

graphics processing that is typically a parallel job.

Thus, development of GPU is to increase its

computing capacity by adding more processors to

handle the parallel job. It is technically easy and

economically inexpensive to integrate a large

quantity of low frequency processors into one chip

(Kirk and Hwu 2008). Having many low-clock-

frequency processors working in parallel, GPU can

achieve a high computing speed. For example,

Eleventh International IBPSA Conference

Glasgow, Scotland

July 27-30, 2009

- 244 -

although the clock frequency of the NVIDIA

GeForce 8800 GTX GPU is only 575 MHz, its

computing speed can be as high as 367 GFLOPS

with 128 processors. The development strategy of

GPU makes a graphics card at low-cost and with

high-performance.

100

150

200

250

300

350

400

2003 2004 2005 2006 2007

GFLOPS

year

GPU

CPU

Figure 1 Comparison of computing speed of GPU

and CPU

In spite of its advantages in cost and performance,

the applications of GPU have been focused on image

processing. The programming on GPU requires a

deep understanding of its hardware and programming

languages. Prior to 2006, GPU programming

languages, such as OpenGL (Shreiner and OpenGL

Architecture Review Board 2008) and Direct3D

(Walsh 2006), were designed for graphics processing.

It was difficult to use the languages for solving

Navier-Stokes equations. Hence, only a few attempts

were made by experts in graphics (Ho et al. 2008;

Scheidegger et al. 2005; Wei et al. 2004)

In 2006, NVIDIA (NVIDIA 2007) provided a new

GPU programming environment, named computer

unified data architecture (CUDA). CUDA is an

extended C language. Like other advanced

programming languages, CUDA allows users to

manipulate GPU without knowing the details of the

hardware. Furthermore, CUDA is compatible with

the standard C language. If a flow simulation code is

written in C, a user only needs to rewrite the parallel

computing part in CUDA. This feature can save a lot

of time on code development. Some researchers have

started to use CUDA for GPU programming. For

example, Rodrigues et al (2008) used it for molecular

simulations and Manaveki (2007) for cryptography.

They have made the simulations 10-20 times faster

than those on a CPU. Thus, it is also interesting to

perform indoor flow simulations using the FFD on

GPU.

IMPLEMENTATION

Flow Model

This investigation applied a fast fluid dynamics (FFD)

model proposed by Stam (1999). It is a simplified

CFD model for solving continuity equation, Navier-

Stokes equations and transport equation for energy

and species concentrations for transient,

incompressible fluid flow. To efficiently solve these

partial differential equations, FFD splits them into

many simple equations and solves the simple

equations one by one. The current FFD model is first

order in time and second order in space. Applying the

FFD model, Zuo and Chen (2009) simulated different

indoor airflows at a speed 50 times faster than a CFD

model. This investigation used the FFD model for

flow simulation on GPU.

Software and Hardware

The implementation used CUDA to divide a GPU

into three levels (Figure 2). The highest level is

“grid”. Each grid consists of multiple “blocks”, and

every block has many “threads”. A thread is the basic

computing unit of GPU. Mathematic and logic

operations are performed on threads.

Host (CPU)

Device (GPU)

Grid 1

Grid 2

Grid 3, 4, …….

Grid 1 Grid 2 Grid 3, 4, ……

Block

(2,2)

Block

(1,2)

Block

(0,2)

Block

(2,1)

Block

(1,1)

Block

(0,1)

Block

(2,0)

Block

(1,0)

Block

(0,0)

Block

(2,2)

Block

(1,2)

Block

(0,2)

Block

(2,1)

Block

(1,1)

Block

(0,1)

Block

(2,0)

Block

(1,0)

Block

(0,0)

Block(0,0)

…………

Thread

(2,2)

Thread

(1,2)

Thread

(0,2)

Thread

(2,1)

Thread

(1,1)

Thread

(0,1)

Thread

(2,0)

Thread

(1,0)

Thread

(0,0)

Thread

(2,2)

Thread

(1,2)

Thread

(0,2)

Thread

(2,1)

Thread

(1,1)

Thread

(0,1)

Thread

(2,0)

Thread

(1,0)

Thread

(0,0)

Block(1,0)

…………

Block

(2,2)

Block

(1,2)

Block

(0,2)

Block

(2,1)

Block

(1,1)

Block

(0,1)

Block

(2,0)

Block

(1,0)

Block

(0,0)

Block

(2,2)

Block

(1,2)

Block

(0,2)

Block

(2,1)

Block

(1,1)

Block

(0,1)

Block

(2,0)

Block

(1,0)

Block

(0,0)

Thread

(2,2)

Thread

(1,2)

Thread

(0,2)

Thread

(2,1)

Thread

(1,1)

Thread

(0,1)

Thread

(2,0)

Thread

(1,0)

Thread

(0,0)

Thread

(2,2)

Thread

(1,2)

Thread

(0,2)

Thread

(2,1)

Thread

(1,1)

Thread

(0,1)

Thread

(2,0)

Thread

(1,0)

Thread

(0,0)

Figure 2 Schematic of parallel computing with

CUDA

This study used a NVIDIA GeForce GTX 8800 GPU.

This GPU has 16 streaming multiprocessors (SMs)

(Rixner 2002) and each SM can hold up to 8 blocks

or 768 threads at one time. Thus, the entire GPU can

simultaneously hold up to 12,288 threads. Because

CUDA does not allow splitting a block into two SMs,

the block assignment is crucial to realize the full

capacity of GPU. For example, if a block has 512

threads, then only one block can be assigned to one

- 245 -

SM and the remaining 256 threads in that SM are

unused. If a block contains 256 threads, then three

blocks can fully occupy all of the 768 threads of an

SM. Theoretically, the 8800 GTX GPU can reach its

peak performance when all 12,288 threads are

running at the same time. Practically, the peak

performance also depends on many other factors,

such as the time for reading or writing data with the

memory.

Mapping Strategy

When working in parallel, it is important to map the

thread indices (threadID.x, threadID.y) in a block

onto the coordinate of mesh nodes (i, j). The current

implementation applied the following formulas:

. . .

i blockDim x blockID x threadID x

= × +

, (1)

. . .

j blockDim y blockID y threadID y

= × +

. (2)

where blockID.x and blockID.y are the indices of the

block for the thread. blockDim.x and blockDim.y are

the block dimensions at x and y directions,

respectively. Both of them are 16 in our

implementation.

SIMULATION RESULTS

To demonstrate FFD simulations on GPU, this

investigation calculated three airflows relevant to

indoor environment and compared the results with

those on CPU and the data from literature. The three

flows were laminar and turbulent flow in a lid-driven

cavity and natural convective flow in a tall cavity.

The simulations used the exactly same meshes and

numerical settings in both GPU and CPU versions.

Laminar Flow in a Lid Driven Cavity (Re = 100)

The first case was the laminar flow in a lid-driven

cavity. Based on the length of cavity and lid velocity,

the Reynolds number of the flow is Re = 100. This

study used a mesh of 33 × 33 grids. The reference

data was the high quality CFD results obtained by

Ghia et al (1982). As shown in Figure 3, FFD on

GPU could predict the same velocity profiles as that

on CPU. Furthermore, the FFD results were similar

to the reference data. Although this is a simple case,

it proves that GPU can be used for numerical

calculations.

0.2

0.4

0.6

0.8

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2

y(m)

U(m/s)

GPU

CPU

GHIA

(a) Horizontal velocity at x = 0.5 L

-0.6

-0.4

-0.2

0.2

0.4

0.6

0 0.2 0.4 0.6 0.8 1

V(m/s)

x(m)

GPU

CPU

GHIA

(b) Vertical velocity at y = 0.5 L

Figure 3 Comparison of the calculated velocity

profiles (Re = 100) by the FFD model on CPU and

GPU with the CFD data in a lid-driven square

cavity

Turbulent Flow in a Lid Driven Cavity (Re =

10000)

The second case was a turbulent flow in the lid-

driven cavity with Re = 10000. To capture the

characteristics of the flow, this study used a very fine

mesh of 513 × 513 grids. The same amount of mesh

was also used by Ghia et al. (1982). Because FFD is

the first order in time, a very small time step size

(0.005s) was necessary to reduce the error. It is very

time consuming to run such an unsteady simulation

on a single CPU. Thus, this study only did the

simulation on GPU. As shown in the Figure 4, The

FFD model on GPU computed accurately the

horizontal and vertical velocity profiles. The

computed profiles agree with the reference data

obtained by Ghia et al. (1982).

- 246 -

0.2

0.4

0.6

0.8

-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2

y(m)

U(m/s)

GPU

GHIA

(a) Horizontal velocity at x = 0.5 L

-0.6

-0.4

-0.2

0.2

0.4

0.6

0 0.2 0.4 0.6 0.8 1

V(m/s)

x(m)

GPU

GHIA

(b) Vertical velocity at y = 0.5 L

Figure 4 Comparison of the calculated velocity

profiles (Re = 10000) by the FFD model on GPU

with the reference data in a lid-driven cavity

Figure 5 shows streamlines computed by the FFD

model on GPU for the turbulent flow. The FFD

model on GPU can properly predict a large

recirculation in the center of cavity. It also computed

several secondary recirculations at low-left, low-right

and upper-left corners. Furthermore, it captured two

third recirculations at the low-left and low-right

corners. The differences between the FFD prediction

and reference data (Ghia et al. 1982) are very

negligible.

Natural Convection in a Tall Cavity

The flows in the previous two cases were isothermal.

The FFD model on GPU was further tested for a non-

isothermal flow. The non-isothermal flow was a

natural convection flow in a tall cavity of 0.076 m

wide and 2.18 m high. The left wall was cooled at

15.1

C and the right wall heated at 34.7

C. The top

and bottom walls were insulated. The corresponding

Rayleigh number was 0.86×10

. A coarse mesh of 11

× 21 was applied to the FFD simulations on both

CPU and GPU. The reference data was from the

experiment performed by Betts and Bokhari (2000).

Figure 6 depicts that the FFD model on GPU gave

the same velocity and temperature profiles as that on

CPU. Although the results obtained by the FFD

model differ from the experimental data, the error

was caused by the FFD model, not GPU.

Figure 5 Streamlines of a turbulent lid-driven cavity

flow (Re = 10000) computed by the FFD on GPU

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07

V(m/s)

x(m)

y=0.218m

y=1.090m

y=1.926m

GPU

CPU

Experiment

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07

T(oC)

x(m)

y=0.218m

y=1.090m

y=1.926m

15.1

34.7

GPU

CPU

Experiment

(a) Velocity profiles (b) Temperature Profiles

Figure 6 Comparison of the velocity and temperature

profiles predicted by the FFD model on CPU and

GPU with the experimental data

DISCUSSION

Computing Time

To compare the FFD simulation speed on GPU with

that on CPU, this study used the computing time for

the lid-driven cavity flow at Re = 100 as an example.

The CPU simulations were conducted on a HP

workstation with an INTEL XeonTM CPU and the

GPU simulations on an NVIDIA GTX 8800 GPU.

The simulations were performed for 100 time steps

but with different meshes.

Figure 7 illustrates that the CPU computing time

increased linearly with the mesh size. When the grid

- 247 -

number was smaller than 3.6×10

, the FFD model on

CPU was faster than that on GPU. Since it took time

to transfer data during the GPU simulations, the time

could be more significant than that saved in the

parallel computing when the mesh size was small.

Hence, parallel computing on GPU should be applied

to cases with a large mesh size.

Figure 7 Comparison of the computing time used by

the FFD model on GPU with that on CPU

It should be noted from Figure 7 that the GPU

computing time was almost constant when the mesh

size was less than 4×10

. This is because the mesh

size was not large enough for the GPU utilizing fully

its capacity. When the mesh size was greater than

4×10

, the GPU computing time increased along two

paths. Those points on the solid line were for the

cases with a mesh size in multiplication of 256 and

on the dashed line the mesh size not in multiplication

of 256. As mentioned previously, each mesh node

was assigned to one thread and a block had 256

threads. If the mesh size was in the multiplication of

256, all the 256 threads of every block were utilized.

Thus, the working load among the blocks was equal.

Otherwise, some of the threads in the block were

rendered idle and the working load between the

blocks was unequal. The imbalance of the working

load can have a severe penalty on the computing

speed. For example, the simulation with 640 × 640

grids that was in multiplication of 256 took 9.977 s,

but that with 639 × 639 needed 28.875 s. Although

the latter case had fewer grids than the former, its

computing time increased by almost two times.

Nevertheless, the FFD model on GPU is still 10

times faster than that on CPU even if the grid number

was not in multiplication of 256. The difference

increased to around 30 times if the grid number was

in multiplication of 256.

Impact

This study implemented the FFD model for flow

simulations on GPU. Since the FFD model solves the

same governing equations as the CFD model does, it

is also possible to implement the CFD model on GPU.

One can also expect that the speed of CFD

simulations on GPU should be faster than that on the

CPU. For those CFD codes written in C language, the

implementation will be relatively easy since only the

parallel computing part needs to be re-written in

CUDA.

It is possible to further reduce the computing time by

using multi-GPU clusters. For example, the NVIDA

Tesla personal supercomputer has 4 GPUs with 960

processors. It is about 250 times faster than a single

CPU personal computer.

CONCLUSION

This study performed flow simulation with the FFD

model on GPU and CPU. The FFD simulation on

GPU is 10 – 30 times faster than that on CPU. The

cost of a GPU is less than 2% of a supercomputer or

computer cluster with the same performance. The

GPU can be used also for CFD simulations and other

scientific computing.

ACKNOWLEDGEMENT

This project was funded by U.S. Federal Aviation

Administration (FAA) Office of Aerospace Medicine

through the National Air Transportation Center of

Excellence for Research in the Intermodal Transport

Environment (RITE) Cooperative Agreement 04-C-

ACE-PU-002. Although the FAA has sponsored this

project, it neither endorses nor rejects the findings of

this research. The presentation of this information is

in the interest of invoking technical community

comment on the results and conclusions of the

research.

REFERENCES

Betts, P.L. and Bokhari, I.H. (2000) "Experiments on

turbulent natural convection in an enclosed

tall cavity", International Journal of Heat

and Fluid Flow, 21, 675-683.

Chen, Q. (2009) "Ventilation performance prediction

for buildings: A method overview and

recent applications", Building and

Environment, 44, 848-858.

Crouse, B., Krafczyk, M., Kuhner, S., Rank, E., and

Van Treeck, C. (2002) "Indoor air flow

analysis based on lattice Boltzmann

methods", Energy and Buildings, 34, 941-

949.

Davidson, L. (1989) Numerical simulation of

turbulent flow in ventilated rooms, Ph. D.

Thesis, Chalmers University of Technology,

Goeteborg, Sweden.

Feng, W. and Hsu, C. (2004) "The origin and

evolution of green destiny", In: Proceedings

of IEEE Cool Chips VII: An International

1.0E-02

1.0E-01

1.0E+00

1.0E+01

1.0E+02

1.0E+03

1.0E+04

1.0E+05

1.0E+02 1.0E+03 1.0E+04 1.0E+05 1.0E+06 1.0E+07

Number of Grids

Computing Time

GPU

CPU

- 248 -

Symposium on Low-Power and High-Speed

Chips, Yokohama, Japan.

Ghia, U., Ghia, K.N., and Shin, C.T. (1982) "High-

Re solutions for incompressible flow using

the Navier-Stokes equations and a multigrid

method", Journal of Computational Physics,

48, 387-411.

Ho, T.Y., Lam, P.M., and Leung, C.S. (2008)

"Parallelization of cellular neural networks

on GPU", Pattern Recognition, 41, 2684-

2692.

Hughes, J., King, V., Rodden, T., and Andersen, H.

(1994) "Moving out from the control room:

Ethnography in system design", In:

Proceedings of the 1994 ACM Conference

on Computer Supported Cooperative Work,

Chapel Hill, North Carolina, pp. 429-439.

Kirk, D. and Hwu, W.-M.W. 2008. Lecture notes of

programming massively parallel processors:

University of Illinios, Urbana-Champaign.

Ladeinde, F. and Nearon, M.D. (1997) "CFD

applications in the HVAC&r industry",

ASHRAE Journal, 39, 44-48.

Lin, C., Horstman, R., Ahlers, M., Sedgwick, L.,

Dunn, K., and Wirogo, S. (2005)

"Numerical simulation of airflow and

airborne pathogen transport in aircraft

cabins - part 1: Numerical simulation of the

flow field", ASHRAE Transactions, 111.

Manavski, S.A. (2007) "CUDA compatible GPU as

an efficient hardware accelerator for aes

cryptography", In: Proceedings of 2007

IEEE International Conference on Signal

Processing and Communications (ICSPC

2007), Dubai, United Arab Emirates.

Mazumdar, S. and Chen, Q. (2007) "Impact of

moving bodies on airflow and contaminant

transport inside aircraft cabins", In:

Proceedings of ROOMVENT 2007, Helsinki,

Finland, pp. 13-15.

Mazumdar, S. and Chen, Q. (2008) "Influence of

cabin conditions on placement and response

of contaminant detection sensors in a

commercial aircraft", Journal of

Environmental Monitoring, 10, 71-81.

Nielsen, P.V. (2004) "Computational fluid dynamics

and room air movement", Indoor Air, 14,

134-143.

NVIDIA. (2007) NVIDIA CUDA compute unified

device architecture-- programming guide

(version 1.1), Santa Clara, California,

NVIDIA Corporation.

Rixner, S. (2002) Stream processor architecture,

Boston & London, Kluwer Academic

Publishers.

Rodrigues, C.I., Hardy, D.J., Stone, J.E., Schulten, K.,

and Hwu, W.-M.W. (2008) "GPU

acceleration of cutoff pair potentials for

molecular modeling applications", In:

Proceedings of the 2008 International

Conference on Computing Frontiers, New

York, pp. 273-282.

Scheidegger, C.E., Comba, J.L.D., Da Cunha, R.D.,

and Corporation, N. (2005) "Practical CFD

simulations on programmable graphics

hardware using SMAC", Computer

Graphics Forum, 24, 715-728.

Shreiner, D. and OpenGL Architecture Review

Board. (2008) OpenGL programming guide:

The official guide to learning OpenGL

(version 2.1) (6th ed.), Upper Saddle River,

New Jersey, Addison-Wesley.

Stam, J. (1999) "Stable fluids", In: Proceedings of

26th International Conference on Computer

Graphics and Interactive Techniques

(SIGGRAPH’99), Los Angeles, pp. 121-128.

Walsh, P. (2006) Advanced visual effects with

Direct3D, Boston, Massachusetts, Course

Technology.

Wei, X., Li, W., Mueller, K., and Kaufman, A.E.

(2004) "The Lattice-Boltzmann method for

simulating gaseous phenomena", Ieee

Transactions on Visualization and

Computer Graphics, 10, 164-176.

Zuo, W. and Chen, Q. (2009) "Real-time or faster-

than-real-time simulation of airflow in

buildings", Indoor Air, 19, 33-44.

- 249 -

Advanced simulations of air distributions in buildings

Article

Full-text available

Wangda Zuo

Real-time or faster-than-real-time flow simulation is crucial for studying the airflow in buildings or other enclosed environments, such as building emergency management, public health protection, sustainable building design, and building energy performance evaluation. The simulation should be informative by providing airflow motion, temperature distribution, and species concentration. Meanwhile, it is also important for the simulation to compute quickly since the allowed time is limited. However, none of the current techniques for flow modeling and computing hardware can satisfy such requirements. Nodal models for flow simulations are simple and fast, but not informative. Computational Fluid Dynamics (CFD) is accurate, but it is too time-consuming. To obtain a quick and informative solution, this study has proposed a Fast Fluid Dynamics (FFD) method, which is an intermediate approach between the nodal model and the CFD. This investigation used the FFD method with and without turbulence treatments to systematically study four basic flows in buildings, and compared the numerical results with the corresponding CFD results and the data from the literature. The comparison showed that the FFD could offer more complete flow information than the nodal model, but less accurate results than the CFD. At the same time, the FFD was about 50 times faster than the CFD. Because of a significant numerical viscosity in the FFD model, the FFD with the laminar model had the best overall performance in terms of reasonable accuracy and simulation time. The FFD simulation was improved by advancing its numerical schemes and optimizing the implementation. By modifying the time-splitting method and optimizing the implementation, this work successfully saved the computing time by 50%. Meanwhile, a finite volume discretization scheme was applied to enhance the mass conservation. In addition, a mass correction function was proposed to provide a simple practical solution for mass conservation in any specific domain. Furthermore, a hybrid scheme was developed to minimize the numerical viscosity caused by the linear interpolation in the semi-Lagrangian method. The improved FFD program was further validated by simulating particle dispersion in a pipe. The prediction had a good agreement with the experimental data. However, this was only a simple case. The FFD needs more validations for contaminant transport. It is also possible to further enhance the computing speed by performing the computation in parallel. Multi-processor supercomputers are widely used for parallelized flow simulations. However, they are very expensive and not portable. Instead, this study conducted the FFD simulation in parallel on a Graphics Processing Unit (GPU). The implementation used a NVIDIA GTX 8800 GPU and a Compute Unified Device Architecture (CUDA) language that is compatible with C language. The FFD code on the GPU was then applied to simulate four basic indoor flows. It turned out that the FFD code on the GPU could produce the same result as the one on a Central Processing Unit (CPU). In addition, the FFD simulations on a GPU were 10 to 30 times faster than the simulations on a CPU, depending on the grid resolution. In the future, the FFD scheme can to be improved to achieve a better performance for turbulent flow. Meanwhile, the FFD prediction for species concentration and three-dimensional FFD code needs to be further validated. It is also worthwhile to extend the capability of the FFD program, such as studying the flow in a complex geometry and coupling it with an energy simulation tool.

Numerical simulation of airflow and airborne pathogen transport in aircraft cabins - Part II: Numerical simulation of airborne pathogen transport

Article

Jan 2005

An initial study to develop a numerical tool using computational fluid dynamics (CFD) methods for investigating the potential of disease transmission in commercial aircraft is completed. To gain insight of the general airflow pattern, a detailed CFD model of a small section in the passenger cabin of a B767-300 passenger cabin was built and a Reynolds-averaged Navier-Stokes (RANS) simulation was performed. By comparing with the available test data, the RANS simulation substantially underpredicted the turbulence intensity, especially in and around the breathing zone. A separate large eddy simulation (LES) was conducted to obtain a more realistic turbulent energy transport in a generic cabin model. The LES-predicted turbulence level is in fairly good agreement with the test data. Based on the LES results, the k and e equations used in the RANS simulation were modified by using a special user subroutine. A RANS simulation with adjusted turbulence was then employed to simulate the dispersion of airborne pathogen in the detailed passenger cabin model. These adjustments allow for the simulation of disease transmission using less than 1/100 of the computing hardware resources required for an equivalent LES of airflow and particle transport.

Stream Processor Architecture

Article

Jan 2002

Scott Rixner

CFD applications in the HVAC and R industry

Article

Jan 1997
ASHRAE J

A simple treatment of computational fluid dynamics and heat transfer (usually abbreviated as CFD) is presented in this article. The authors have intentionally avoided the more advanced concepts such as mesh adaptivity or parallel processing (Ladeinde, 1992), and have limited the examples to very simple, albeit useful, systems. They will define CFD and describe the questions answered by CFD results and the advantages and limitations of the approach compared to physical experiments. They will then discuss the growing popularity of CFD and existing industrial applications in mechanical engineering. The obvious applications of this technique in the HVAC and R industry are presented prior to the conclusion of this article.

OpenGL Programming Guide: The Official Guide to Learning OpenGL, Versions 3.0 and 3.1

Article

Jan 2010

Dave Shreiner

CUDA compatible GPU as an efficient hardware accelerator for AES Cryptography

Conference Paper

Dec 2007

Svetlin Manavski

This paper presents a study of the efficiency in applying modern graphics processing units in symmetric key cryptographic solutions. It describes both traditional style approaches based on the OpenGL graphics API and new ones based on the recent technology trends of major hardware vendors. It presents an efficient implementation of the advanced encryption standard (AES) algorithm in the novel CUDA platform by Nvidia. AES is currently the most widely adopted modern symmetric key encryption standard. The performance of the new fastest GPU solution is compared with those of the reference sequential implementations running on an Intel Pentium IV 3.0 GHz CPU. Unlike previous research in this field, the results of this effort show for the first time the GPU can perform as an efficient cryptographic accelerator. The developed solutions run up to 20 times faster than OpenSSL and in the same range of performance of existing hardware based implementations.

Experiments on turbulent natural convection in an enclosed tall cavity

Article

Dec 2000
INT J HEAT FLUID FL

Experiments have been undertaken to investigate the natural convection of air in a tall differentially heated rectangular cavity (2.18 m high by 0.076 m wide by 0.52 m in depth). They were performed with temperature differentials between the vertical plates of 19.6°C and 39.9°C, giving Rayleigh numbers based on the width of 0.86×106 and 1.43×106. Under these conditions the flow in the core of the cavity is fully turbulent and property variations with temperature are comparatively small. A previously used experimental rig has been modified, by fitting partially conducting top and bottom walls and outer guard channels, to provide boundary conditions which avoid the inadequately defined sharp changes in temperature gradient and other problems associated with insufficient insulation on nominally adiabatic walls. Mean and turbulent temperature and velocity variations within the cavity have been measured, together with heat fluxes and turbulent shear stresses. The temperature and flow fields were found to be closely two-dimensional, except close to the front and back walls, and anti-symmetric across the diagonal of the cavity. The partially conducting roof and floor provide locally unstable thermal stratification in the wall jet flows there, which enhances the turbulence as the flow moves towards the temperature controlled plates. The results provide a greatly improved benchmark for the testing of turbulence models in this low turbulence Reynolds number flow.

Indoor air flow analysis based on lattice Boltzmann methods

Article

Oct 2002
ENERG BUILDINGS

The modeling of convective flows based on a 3D lattice Boltzmann approach for low Mach number flows with variable density combined with a large eddy turbulence model is presented. The ability to handle non-Boussinesq density variation problems is depicted for two-dimensional Rayleigh–Bénard convection at a Rayleigh number Ra=800,000.A complex three-dimensional example shows the status of our work with respect to turbulent flow in and around a building, so far without consideration of the energy equation in the full scale 3D case. Integrated within a CAD environment, the spatial geometric model, based on an IFC building product data model, is discretized using a hierarchic data structure. Results are presented for a Reynolds number Re=75,000 computed on a high-performance parallel vector computer.State-of-the-art visualization techniques integrate the simulation results and the CAD model into a virtual reality environment. The VR environment allows also for an interactive analysis of thermal comfort criteria, being demonstrated for an indoor air flow simulation of an open-plan office.

Parallelization of cellular neural networks on GPU

Article

Aug 2008
PATTERN RECOGN

Recently, cellular neural networks (CNNs) have been demonstrated to be a highly effective paradigm applicable in a wide range of areas. Typically, CNNs can be implemented using VLSI circuits, but this would unavoidably require additional hardware. On the other hand, we can also implement CNNs purely by software; this, however, would result in very low performance when given a large CNN problem size. Nowadays, conventional desktop computers are usually equipped with programmable graphics processing units (GPUs) that can support parallel data processing. This paper introduces a GPU-based CNN simulator. In detail, we carefully organize the CNN data as 4-channel textures, and efficiently implement the CNN computation as fragment programs running in parallel on a GPU. In this way, we can create a high performance but low-cost CNN simulator. Experimentally, we demonstrate that the resultant GPU-based CNN simulator can run 8–17 times faster than a CPU-based CNN simulator.

High-Re solutions for incompressible flow using the Navier-Stokes equations and a multigrid method1

Article

Dec 1982

The vorticity-stream function formulation of the two-dimensional incompressible Navier-Stokes equations is used to study the effectiveness of the coupled strongly implicit multigrid (CSI-MG) method in the determination of high-Re fine-mesh flow solutions. The driven flow in a square cavity is used as the model problem. Solutions are obtained for configurations with Reynolds number as high as 10,000 and meshes consisting of as many as 257 × 257 points. For Re = 1000, the (129 × 129) grid solution required 1.5 minutes of CPU time on the AMDAHL 470 V/6 computer. Because of the appearance of one or more secondary vortices in the flow field, uniform mesh refinement was preferred to the use of one-dimensional grid-clustering coordinate transformations.

Ventilation performance prediction for buildings: A method overview and recent applications

Article

Apr 2009
BUILD ENVIRON

Qingyan Chen

This paper presented an overview of the tools used to predict ventilation performance in buildings. The tools reviewed were analytical models, empirical models, small-scale experimental models, full-scale experimental models, multizone network models, zonal models, and Computational Fluid Dynamics (CFD) models. This review found that the analytical and empirical models had made minimal contributions to the research literature in the past year. The small- and full-scale experimental models were mainly used to generate data to validate numerical models. The multizone models were improving, and they were the main tool for predicting ventilation performance in an entire building. The zonal models had limited applications and could be replaced by the coarse-grid fluid dynamics models. The CFD models were most popular and contributed to 70 percent of the literature found in this review. Considerable efforts were still made to seek more reliable and accurate models. It has been a trend to improve their performance by coupling CFD with other building simulation models. The applications of CFD models were mainly for studying indoor air quality, natural ventilation, and stratified ventilation as they were difficult to be predicted by other models.

High-Performance and Low-Cost Computing for Indoor Airflow

Abstract and Figures

Recommended publications

High-performacne and low-cost computing for indoor airflow

Computational Steering for Computational Fluid Dynamics

Efficient parallel computing and performance tuning for multi-block structured grid CFD applications...

Fast Parallelized Flow Simulations on Graphic Processing Units

Fast and informative flow simulations in a building by using fast fluid dynamics model on graphics p...

Advanced simulations of air distributions in buildings