I moved to build 10.0.0.2391 recently and I have a piece of problem code that was deployed last Friday. About 4 runs later (twice daily batch processing), this issue seems to have arisen. We’ve run this processing load twice daily for the last 3 years and not had this problem reported before - and the users would be very aware had it happened.
What I seem to be seeing is a parallel loop executing twice for a given item in a collection :-
for parallel lLoopA:Integer := 0 to cSiteMaster.Items.Count -1 do
begin
var lSiteMasterItem : KKDSiteMasterItem;
lSiteMasterItem := cSiteMaster.Items[lLoopA];
PutLog(String.Format('Starting : {0}',[lSiteMasterItem.Description]));
The PutLog method is a straight forward text logger to the console and a text file :-
class method Program.PutLog(const aValue:String);
begin
Console.WriteLine(aValue);
cLogFile.WriteLine(String.Format('{0} : {1}',[DateTime.Now.ToString('dd MMM yyyy HH:mm:ss'),aValue]));
end;
cSiteMaster is a class level object that has a list of items within it. Once loaded from a database (which happens earlier in the process), that list is not sorted, or accessed anywhere else, other than to iterate through the list and process each “site”.
For some reason the second run today logged two “Starting :” entries for the same entity. The “Bristol” site is also the first in the list.
15:42:30 | Starting : Bristol |
---|---|
15:42:30 | Starting : Bristol |
15:42:50 | Progress 11.87% ( 1,496 of 12,600 for Process Bristol) |
15:43:12 | Progress 51.78% ( 6,524 of 12,600 for Process Bristol) |
15:43:32 | Progress 87.10% (10,974 of 12,600 for Process Bristol) |
15:43:52 | Progress 95.92% (12,086 of 12,600 for Process Bristol) |
15:43:52 | Finished : Bristol |
The subsequent logging shows no signs however of the duplicate running through the processing. In theory that could be my code not picking up an exception, but it should do - and has done in the past when issues have arisen inside parallel code.
All other sites in the list process normally and have reported their correct outputs. The “Bristol” site however produced a mixed set of results that I cannot explain other than data corruption in the processing i.e. data is potentially being lost / overwritten due to the dual processing of the site.
A subsequent run of the process did not show the problem, and neither did the 3 runs prior to the problem run, all of which used the .2391 build. It does seem like a big coincidence that this has not shown up in the 3 past years though with the code running twice daily, every day.
Any thoughts on whether this may be related to the .2391 build and use of “parallel” ?