Evaluating EFDC+ Test Results
Automated testing systems are crucial in software development and quality assurance processes. For DSI, they help ensure that our software functions as intended and meets required standards. One important aspect of automated testing is the use of tolerances for each test, which determine the acceptable deviation from a baseline run. When a test metric exceeds these tolerances, the test is considered to have failed. This blog post will explore how the testing journal provides a high-level overview of test results. It includes how time series data are used for further analysis, and how different tools and techniques can be utilized to resolve test failures or improve model accuracy.
The automated testing system has a set of tolerances for each test. If the test metric exceeds this tolerance compared to the baseline run, the test is considered to have failed. These tolerances can be adjusted as needed.
The very first evaluation is a very high-level overview that is provided by the testing journal. We can see if any tests were outside of the accepted threshold, as these will be reported as a “Fail”.
Max Absolute Error – The maximum absolute difference between the two model runs. This means that the models are compared cell by cell for each timestep. The largest absolute difference across the model domain is recorded for each timestep. When looking at the test results, if this value is zero, everything else will be zero (meaning the models are identical).
Mean Error – This is technically the difference in the mean values of the compared models as an absolute value.
Each test metric produces a value for each time step. A time series of these values is generated, but the test journal just displays the largest value for each metric at any point in time. The metrics reported for each parameter (e.g. – temperature, salinity) are the max absolute error, mean error, scaled root mean squared error, and relative root mean squared error.
If a failed test is observed, we can use the auto-generated time series (output as HTML) to provide additional insight into the results. The example below shows an increasing max absolute error for DOC, DON, & DOP. We can hover over the data lines and see exactly where in the model domain these errors are occurring.
At this point, we might examine the commit to see if there is anything in the code we can easily attribute the changes to. Or we might open up EFDC+ Explorer to provide a more detailed look at the model behaviors. This would include looking the plan view of the model for various parameters such as water level, velocity, or temperature, at different times to determine instability or unexpected behaviour in the model. In addition, we often plot time series of other related parameters, as well as vertical profiles, slice views, and model comparison plots.
If it is determined that there is a bug, or if the new behavior is anticipated and considered more accurate, a test resolution is logged with a description of the findings, the related commit, and the related tests.
In conclusion, automated testing systems with defined tolerances play a vital role in ensuring the quality and accuracy of our software. The testing journal provides an initial overview of test results, and time series data can be used to gain deeper insights into the behavior of the software. Analysis tools such as EFDC+ Explorer can be utilized for in-depth analysis, and resolutions for test failures or anticipated behavior changes can be logged for further reference. By leveraging these testing methodologies and techniques, our software developers and quality assurance teams can identify and resolve issues efficiently, resulting in more reliable and accurate software systems.