Thunderhead Engineering
February 04, 2012, 10:46:01 PM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News: SMF - Just Installed
 
   Home   Help Search Calendar Login Register  
Pages: [1]
  Print  
Author Topic: parallel run error - forrtl: severe (157): Program Exception - access violation  (Read 1207 times)
Deenee
New Member
*
Posts: 3


View Profile
« on: September 07, 2010, 09:15:26 AM »

Hello,
when I am running parallel FDS I encountered a 157 ....... access violation error similar to one posted on http://www.thunderheadeng.net/forum/index.php/topic,304.msg543.html#msg543 thread previously. I had followed the advice on that thread and use FDS 5.4.3 but it still could not work.
 In order to isolate the problem I ran same model but with only one mesh ID (Fire zone) and it runs okay
Could you advise on what is likely going to be the problem?
The input file is attached.


PS - the screen dump of the error message and the successful single mesh run is attached.

* VR10FullModel.fds (108.49 KB - downloaded 105 times.)

* error.png (18.09 KB, 668x331 - viewed 138 times.)

* success run.png (20.23 KB, 668x331 - viewed 98 times.)
« Last Edit: September 07, 2010, 09:19:08 AM by Deenee » Logged
Bryan Klein
Thunderhead
*****
Posts: 15



View Profile WWW Email
« Reply #1 on: September 07, 2010, 02:09:24 PM »

Access violation 157 is usually due to some kind of memory read/write error.
With almost 10 million mesh cells, you may be running into out of bounds or overflow errors.

How many computers are you using to run the simulation?
How much memory is available on each computer?
How many meshes are you allocating to each computer?

-Bryan Klein
Logged
Deenee
New Member
*
Posts: 3


View Profile
« Reply #2 on: September 07, 2010, 03:17:28 PM »

Bryan,
as you may see from the attached screen shot, there are ten processes each supposed to handle one mesh - that was why the domain is divided into ten mesh. 
I tried to make each mesh domain about a million cells each - although some are slightly more - in order to be within a million cells per core guidance.
R
Logged
Bryan Klein
Thunderhead
*****
Posts: 15



View Profile WWW Email
« Reply #3 on: September 07, 2010, 03:29:25 PM »

I looked at the screenshots, but my questions still remain.
10 processes/meshes does not equal 10 computers, 10 processors (cores) or 10GB of RAM (minimum).

For a simulation like this, you would probably need to be running FDS on a 64bit computer with at least 16GB of RAM.
Or, you would need to have distributed it over a few separate computers each with enough RAM to handle the number of cells distributed to it.

As a rough estimate you will need about 1GB of RAM for each million grid cells.  As the number of millions increase, so does the overhead above the 1GB per million cell estimate.

Could you please give a bit more information about the computer(s) you are trying to run this on?

-Bryan
Logged
Deenee
New Member
*
Posts: 3


View Profile
« Reply #4 on: September 07, 2010, 03:55:04 PM »

Bryan
The modelling is distributed over 7 computers all 32bit. Three of the computers handle 2 mesh each (6 no), the remainder of the computers (4 no) handle one mesh each. We have a network of computers which are used for Modelling. Most of them are quad core, few dual. The Minimum RAM is 2gig for dual core and higher for high specs.
I wont be able to tell you exactly which system has which spec as  i am not responding from workplace but can tell you tomorrow (UK time) if you want the specifics. let me know of you need that infor. 
R
Logged
Bryan Klein
Thunderhead
*****
Posts: 15



View Profile WWW Email
« Reply #5 on: September 07, 2010, 04:31:45 PM »

My first pass recommendation is to confirm that you have enough resources per computer to handle the requirements of the simulation.
Even though there is 2GB of physical RAM on a computer, does not mean that there will be enough available for all of the processes (OS overhead, etc).

These errors are always tricky, as it is not easy to determine if it is a hardware resource issue, an OS memory handling issue or a bug in the program itself.
Another approach is to step up incrementally from one mesh to all ten and see when it fails. 

Since this is happening before the first time step, my guess is that it is a memory overflow/error problem one one of the nodes in the network.

-Bryan
Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.13 | SMF © 2006-2011, Simple Machines LLC Valid XHTML 1.0! Valid CSS!