Introduction With the arrival of Microsoft Visual Studio® .NET 2003 with integrated support for Smart Device Applications it is possible to develop applications for a broad range of devices using managed code. Software developers can now use new exciting languages like Microsoft Visual Basic® .NET and Microsoft Visual C#® for device development. Although this sounds promising one question is still to be answered. Is it possible to make use of the real time capabilities of Windows CE .NET while using managed code to write applications for an embedded device? In this paper we will answer that question and we suggest a possible scenario in which real-time behavior can be combined with .NET functionality.
A Managed and an Unmanaged World
Some of the advantages of a managed environment like Microsoft’s Common Language Runtime such as writing safer and platform independent software might turn out to be a disadvantage in a real-time environment. Typically, we cannot afford to wait for a just-in-time compiler to compile a method prior to using it, and we cannot wait on a garbage collector to perform its duty, clearing previously allocated memory by removing unused resources. Both these features might interfere with deterministic system behavior. It is possible to force the garbage collector to do its duty, calling GC.Collect(). However, we want the GC to perform its task by itself, since it is highly optimized. To allow hard real-time behavior, it would be great if there was a way in which we could distinguish between hard real-time functionality, written in native or unmanaged Microsoft Win32® code and other functionality, written in managed code. Making use of Platform Invoke or P/Invoke, we can just do that.
Platform Invoke at Work
The simple definition in MSDN help states that P/Invoke is the functionality provided by the common language runtime to enable managed code to call unmanaged native DLL entry points. In other words, P/Invoke gives us an escape route from managed .NET code to unmanaged Win32 code. To be able to use this mechanism within Windows CE .NET, native Win32 functions that we want to call must be defined extern public within a dynamic link library. Since the managed .NET environment does not know anything about Microsoft Visual C++® name mangling, the functions to be called from within a managed application should have C naming conventions as well. To be able to use functionality from within a DLL, we need to build a wrapper class around the function entry points from within our managed application. Listing 1 shows an example of a small, unmanaged DLL and a wrapper class in managed code.


Listing 1a & 1b: Calling into unmanaged code

Listing 1c: Win32 DLL to be called from within managed code
Using the wrapper class, it is possible to call functions that exist inside the DLL. Since this mechanism works for all exported DLL functions and since almost all Win32 APIs are exported in coredll.dll, this mechanism also provides a way to call into almost any Win32 API. P/Invoke is used in our test to have a managed application calling into an unmanaged real-time thread.
A Real-time Scenario
Imagine the following scenario: A system needs hard real-time functionality to retrieve information from an external source. The information is stored in the system and will be presented to the user in some graphical way. Figure 1 shows a possible scenario for this problem.

We see a real-time thread receiving an interrupt from an external source. The thread processes the interrupt and stores relevant information to be presented to the user. On the right-hand side, a separate UI thread, written in managed code, reads information that was previously stored by the real-time thread. Given the fact that context switches between processes are expensive, we want the entire system to live within the same process. If we separate real-time functionality from user interface functionality by putting real-time functionality in a dynamic link library and providing an interface between that DLL and the other parts of the system, we have achieved our goal of having one single process dealing with all parts of the system. Communication between the UI thread and the RT thread is possible by means of P/Invoking into the native Win32 code.
The Actual Test
We want to make the test representative, yet as simple as possible so it can be repeated easily on other systems as well. For that purpose, the source code to run the experiment yourself is available for download. Our test requires a way to feed interrupts into the system and a possibility to output probes to be able to measure the performance of the system. We feed the system using a block wave, generated by a signal generator. Of course the Windows CE .NET platform should be capable of hosting the .NET CF. Paul Yao has written an article indicating which Windows CE .NET modules and components should be present to run managed applications (see “Microsoft .NET Compact Framework for Windows CE .NET” on msdn.microsoft.com). The aim of the test is not only to be representative and reproducible. Just find a suitable interrupt source for input. Listing 2 shows how to hook a physical interrupt to an interrupt service thread (IST).

Listing 2: Connecting a physical interrupt to an interrupt service thread
To test the real-time behavior of an application making use of managed code and the .NET Compact Framework we have created a Windows CE platform, based on Standard SDK. We also included the RTM version of the .NET Compact Framework in the platform. The operating system runs on a Geode GX1 at 300 MHz. We feed the system with a block wave, immediately connected to the IRQ5 line on the PC104 bus (pin 23). Figure 2 shows the system used for the experiment. The frequency of the block wave is 10 kHz. On uprising flanks, an interrupt is generated. The interrupt is processed by an interrupt service thread (IST). In the IST we send out probe pulses to the parallel port to be able to view an output signal. We also store the time at which the IST was activated making use of the high resolution QueryPerformanceCounter API. To be able to measure timing information over a long period of time, we also store maximum and minimum time as well as average time. The time from interrupt occurrence to probe output is an indication of IRQ – IST latency. The timing information acquired by the high resolution timer indicates when the IST is activated. Ideally this value should be 100 µ sec. for an interrupt rate of 10 kHz. All timing information is passed to the graphical user interface on regular intervals.

As the .NET CF itself can not be used in hard real-time situations as explained earlier, we decided to use it for presentation purposes only and to use a DLL, written in eVC++ 4.0 for all real-time functionality. For communication between the DLL and the .NET CF GUI a double buffering mechanism is used in combination with P/Invoke. The GUI requests new timing information on regular intervals, making use of a System.Threading.Timer object. The DLL decides when it has time available to pass information to the GUI. Until data is ready, the GUI is blocked. The refresh rate of the information presented in the GUI is user selectable. For our test we used a refresh rate of 50 msec.
The following pseudocode explains the operation of the IST and the mechanism by which the GUI retrieves information, stored in the native Win32 DLL.
Interrupt Service Thread:
Managed code periodical update of display data:

During the test we hooked up an oscilloscope and made printouts of both the scope and the Windows CE graphical display 10 minutes into the experiment. In figure 3 the interrupt latency, measured with an oscilloscope is displayed. Best case, the latency is 14.0 µ sec., worst case the latency is 54.4 µ sec, meaning a jitter of 40.4 µ sec. In figure 4 the periodic time is displayed when the IST is activated. This figure is a screen shot of the actual user interface. Ideally the IST should run every 100 µ sec, which is also the average time during our measurement (the blue line in the middle). We also measured overall minimum (green) and maximum (red) times, as well as minimum and maximum times over the sample period of 50 msec (the white block). The deviation we found during the test period is limited to ± 40 µ sec.


The Results
We measured over a longer period of time to make sure that both the Garbage Collector and the JIT compiler were frequently active. Thanks to the folks at Microsoft, we were able to monitor the behavior of the .NET CF because they provided us with a performance counters registry key. Using this key, a number of performance counters within the .NET CF are activated. We mainly used this performance information to verify that JITter and Garbage Collector actually ran. It also gave a nice indication about the number of objects used during the cause of the test.

Listing 5: Handling timer messages in a managed world
As you can see in listing 5, we instantiate a number of objects each time we periodically update the screen. These objects, 2 pens and a graphics object are created during each screen update. Both functions td.ShowValue and td.SetTimerPointer also create brushes. Since td.SetTimerPointer is called twice per screen update, a total of 6 objects are created during each update of the screen. Since we update the screen every 50 msec. a total number of 120 objects are created per second. Over 10 minutes of execution, 72000 objects are created. All these objects are potentially subject to garbage collection. In table 1, the number of allocated objects roughly corresponds to these theoretical values.
We have included performance counter results for both a 10 minute and a 100 minute run. This data was recorded during our actual test. As you can see, after running 10 minutes, garbage collection occurred without noticeable fallbacks in performance. Table 2 shows the performance counters for a run of approx. 100 minutes. In this run full garbage collection occurred. During this run, only 461499 objects were created instead of the 720000 expected objects. This is approximately 35% less than expected. The difference is likely to be caused by the performance counters which, according to Microsoft result in a performance penalty of about 30% within the managed application. However, real-time behavior of the system was not influenced as you can see in the following figure.

Extra proof for the fact that the garbage collector and the JITter did not influence real-time behavior can be found in the remote process viewer. In the next figure you can see a screen dump of the remote process viewer for the managed application. All threads in the application (except the real-time thread with priority 0) run at normal priorities (251). During our measurements we did not find that the JITter or garbage collector needed kernel blocking to perform their tasks.

Table 1: .NET CF performance results after running the test for ten minutes
|
Counter |
Value |
n |
Mean |
min |
max |
|
Execution Engine Startup Time |
492 |
0 |
0 |
0 |
0 |
|
Total Program Run Time |
603752 |
0 |
0 |
0 |
0 |
|
Peak Bytes Allocated |
1115238 |
0 |
0 |
0 |
0 |
|
Number Of Objects Allocated |
66898 |
0 |
0 |
0 |
0 |
|
Bytes Allocated |
1418216 |
66898 |
21 |
8 |
24020 |
|
Number Of Simple Collections |
0 |
0 |
0 |
0 |
0 |
|
Bytes Collected By Simple Collection |
0 |
0 |
0 |
0 |
0 |
|
Bytes In Use After Simple Collection |
0 |
0 |
0 |
0 |
0 |
|
Time In Simple Collect |
0 |
0 |
0 |
0 |
0 |
|
Number Of Compact Collections |
1 |
0 |
0 |
0 |
0 |
|
Bytes Collected By Compact Collections |
652420 |
1 |
652420 |
652420 |
652420 |
|
Bytes In Use After Compact Collection |
134020 |
1 |
134020 |
134020 |
134020 |
|
Time In Compact Collect |
357 |
1 |
357 |
357 |
357 |
|
Number Of Full Collections |
0 |
0 |
0 |
0 |
0 |
|
Bytes Collected By Full Collection |
0 |
0 |
0 |
0 |
0 |
|
Bytes In Use After Full Collection |
0 |
0 |
0 |
0 |
0 |
|
Time In Full Collection |
0 |
0 |
0 |
0 |
0 |
|
GC Number Of Application Induced Collections |
0 |
0 |
0 |
0 |
0 |
|
GC Latency Time |
357 |
1 |
357 |
357 |
357 |
|
Bytes Jitted |
14046 |
259 |
54 |
1 |
929 |
|
Native Bytes Jitted |
70636 |
259 |
272 |
35 |
3758 |
|
Number of Methods Jitted |
259 |
0 |
0 |
0 |
0 |
|
Bytes Pitched |
0 |
0 |
0 |
0 |
0 |
|
Number of Methods Pitched |
0 |
0 |
0 |
0 |
0 |
|
Number of Exceptions |
0 |
0 |
0 |
0 |
0 |
|
Number of Calls |
3058607 |
0 |
0 |
0 |
0 |
|
Number of Virtual Calls |
1409 |
0 |
0 |
0 |
0 |
|
Number Of Virtual Call Cache Hits |
1376 |
0 |
0 |
0 |
0 |
|
Number of PInvoke Calls |
176790 |
0 |
0 |
0 |
0 |
|
Total Bytes In Use After Collection |
421462 |
1 |
421462 |
421462 |
421462 |
Table 2: .NET CF performance results after running the test for hundred minutes
|
Counter |
Value |
n |
mean |
min |
max |
|
Execution Engine Startup Time |
478 |
0 |
0 |
0 |
0 |
|
Total Program Run Time |
5844946 |
0 |
0 |
0 |
0 |
|
Peak Bytes Allocated |
1279678 |
0 |
0 |
0 |
0 |
|
Number Of Objects Allocated |
461499 |
0 |
0 |
0 |
0 |
|
Bytes Allocated |
8975584 |
461499 |
19 |
8 |
24020 |
|
Number Of Simple Collections |
0 |
0 |
0 |
0 |
0 |
|
Bytes Collected By Simple Collection |
0 |
0 |
0 |
0 |
0 |
|
Bytes In Use After Simple Collection |
0 |
0 |
0 |
0 |
0 |
|
Time In Simple Collect |
0 |
0 |
0 |
0 |
0 |
|
Number Of Compact Collections |
11 |
0 |
0 |
0 |
0 |
|
Bytes Collected By Compact Collections |
8514912 |
11 |
774082 |
656456 |
786476 |
|
Bytes In Use After Compact Collection |
1679656 |
11 |
152696 |
147320 |
153256 |
|
Time In Compact Collect |
5395 |
11 |
490 |
436 |
542 |
|
Number Of Full Collections |
2 |
0 |
0 |
0 |
0 |
|
Bytes Collected By Full Collection |
397428 |
2 |
198714 |
1916 |
395512 |
|
Bytes In Use After Full Collection |
79924 |
2 |
39962 |
17328 |
62596 |
|
Time In Full Collection |
65 |
2 |
32 |
2 |
63 |
|
GC Number Of Application Induced Collections |
0 |
0 |
0 |
0 |
0 |
|
GC Latency Time |
5460 |
13 |
420 |
2 |
542 |
|
Bytes Jitted |
19143 |
356 |
53 |
1 |
929 |
|
Native Bytes Jitted |
95684 |
356 |
268 |
35 |
3758 |
|
Number of Methods Jitted |
356 |
0 |
0 |
0 |
0 |
|
Bytes Pitched |
85304 |
326 |
261 |
35 |
3758 |
|
Number of Methods Pitched |
385 |
0 |
0 |
0 |
0 |
|
Number of Exceptions |
0 |
0 |
0 |
0 |
0 |
|
Number of Calls |
21778124 |
0 |
0 |
0 |
0 |
|
Number of Virtual Calls |
1067 |
0 |
0 |
0 |
0 |
|
Number Of Virtual Call Cache Hits |
1029 |
0 |
0 |
0 |
0 |
|
Number of PInvoke Calls |
1996991 |
0 |
0 |
0 |
0 |
|
Total Bytes In Use After Collection |
5632119 |
13 |
433239 |
84637 |
493054 |
Pitfalls
During the test, increasing the frequency of the block wave led to unexpected results in the managed application. Especially in the situation where the screen needed frequent repaints (because areas of the screen were invalid), the application randomly hung up the system. Investigation of this problem showed unexpected behavior for experienced Win32 programmers. In a Win32 application, using a timer results in a WM_TIMER message each time a timer expires. However, in the message queue WM_TIMER messages are low priority messages, only posted when there are no other higher priority messages to be processed. This behavior can possibly lead to missing timer ticks, but since CreateTimer does not give us an accurate timer to begin with. This is no problem, especially if the timer is used to update a graphical user interface. However, in the managed application, we use a System.Threading.Timer object to create a timer. This timer calls a delegate every time the timer expires. The delegate is called from within a separate thread that exists in a thread pool. If the system is too busy with other activities, like repainting an entire screen, more timer delegates, each in separate threads, are activated before previously activated delegates are finished. This might lead to consuming all available threads from the thread pool, causing the system to hang. The solution to prevent this behavior is found in listing 3. Each time a timer delegate is activated, we stop the timer object by invoking the Change method of the Timer object, to indicate that we do not want the next timer message until we have processed the current one.
Proof of Results
To be able to compare the results of our experiment with typical results in the same setting, we also wrote a Win32 application that invoked the same DLL with real-time functionality. The Win32 application is functionally identical to the managed application. It provides the system with a graphical user interface in which timing information is displayed in a window. This application paints timing results upon reception of WM_TIMER messages, solely making use of Win32 APIs. The update rate of the screen for both applications is user selectable, but for both applications we chose an update rate of 50 milliseconds. Basically we did not find any difference in performance, as figures 6 and 7 show. In figure 6 the interrupt latency is again measured with an oscilloscope. For the Win32 application, the latency is 14.4 µ sec. Worst case the latency is 55.2 µ sec, meaning a jitter of 40.8 µ sec. These results are identical to the test run with a .NET CF managed application. In figure 7 the periodic time is displayed when the IST is activated, again for the Win32 application. Again, the results are identical to the results of a .NET CF managed application. The source for the Win32 application is also downloadable so you can compare the behavior of the two different applications yourself.


Conclusion
First we need to make absolutely sure that you understand that we are not suggesting the .NET CF for any real-time work by itself. We suggest that it can be used advantage as a presentation layer. In such a system, the .NET CF can "peacefully co-exist" with real-time functionality, not affecting the real time behavior of Windows CE .NET. In this article we have not benchmarked the graphics capabilities of the .NET CF. In our situation we did not find any significant difference in an application, written entirely in Win32 or an application, partly written in a managed environment with Visual C#. Given the higher programmer productivity and the richness of the .NET Compact Framework, there are many advantages in writing presentation layers in managed code and writing hard real-time functionality in unmanaged code. The clear distinction between these different types of functionality is something you will get for free, using this approach.
Acknowledgments
We have been thinking quite a while about testing the usability of the .NET Compact Framework in real-time scenarios. This test was only possible by cooperating with people and companies that could provide us with the proper hardware and measuring equipment. Therefore we like to thank Willem Haring of Getronics for his support, ideas and hospitality during this project. We also like to thank the folks at Delem for their hospitality and for providing us with the necessary equipment to execute our tests.
About the Authors
Michel Verhagen works at PTS Software bv in the Netherlands. Michel is a Windows CE consultant, has 4 years experience with Windows CE. His main expertise lies in the area of Platform Builder.
Maarten Struys also works at PTS Software bv. There he is responsible for the real-time and embedded competence center. Maarten is an experienced Windows (CE) developer, having worked with Windows CE since its introduction. Since 2000, Maarten is working with managed code in .NET environments. He is also a freelance journalist for the two leading magazines on embedded systems development in The Netherlands. He recently opened a website with information about .NET in the embedded world.
Call to Action and Resources
http://www.pts.nl/
http://www.getronics.nl/
http://www.delem.nl/
www.microsoft.com/embedded
To download the source files used in this article, visit Microsoft's MSDN website.
Acronyms and Terms
· DLL Dynamic Link Library
· GC Garbage Collector
· GUI Graphical User Interface
· IST Interrupt Service Thread
· P/Invoke Platform Invoke
· RT Real-time
· RTM Ready to Manufacture
· SDK Software Development Kit
· UI User Interface
|