Why was this project started?

Back in 2012 we needed an interprocess communication (IPC) system for streaming and logging large amounts of scientific data across multiple systems. We looked into a number of "reliable communications" patterns and protocols before deciding on zeromq (ZMQ) as the most promising candidate. Initially we did bindings directly to the ZMQ DLL, but the difficulty of translating the patterns across to LabVIEW paradigms caused a significant number of headaches. In particular, platform instability due to the required operation of the protocol could lead to lost data, which was unacceptable. To encapsulate troubleshooting, this part of the IPC project was spun off into its own project and LVZMQ was born. It has been developed in Martijn Jasperse's spare time ever since.[ ↑ top ]

Is the library stable?

In short, yes and has been since 2016. Early effort concentrated on the particular use-case that led to LVZMQ's creation so early versions of the project encountered some reliability issues due to LabVIEW's intrinsically multi-threaded program flow when not used in specific ways. Furthermore, platforms other than 32-bit Windows could encounter severe crashes due to differences in LabVIEW's internal memory alignment algorithm which are not necessarily trivial to correct. Both of these issues have been heavily debugged in the years since to prevent any ongoing problems.[ ↑ top ]

Is the library being actively developed?

Now that the library is considered stable, the project is in maintenance mode. Features may be added at request, and from time to time new releases will be built to keep compatibility with the current stable branch of ZMQ itself. However, substantial rewrites and further detailed examples are unlikely to be provided due to the author's limited time availability.[ ↑ top ]

Why do I get Error 7 at Call Library Node?

Error 7 is LabVIEW's "File Not Found" error, and if it is being raised by a Call Library Node, it means LabVIEW cannot find the required DLLs. For VIs executing inside the LabVIEW environment, these DLLs are installed by VIPM in "/addons/zeromq". In the runtime environment (built applications), these need to be in the "Data" subdirectory (see below).[ ↑ top ]

How should I package an application that uses LVZMQ?

For various technical reasons, LabVIEW will likely not identify the required DLLs are dependencies when building an application, and this will result in Error 7 when running the compiled program. The solution is to force LabVIEW to include the required DLLs. An example project file is provided to show the correct settings.

Firstly, add the DLLs necessary for the targest architecture to the project (both lvzmq*.dll and libzmq*.dll for the relevant architecture). Then modify the Build Specifications to "Always Include" the DLLs (under "Source Files" options), and then tell LabVIEW to put those DLLs in the Support Directory (under "Source File settings"). Now when you build the application, the DLLs will be bundled as well.[ ↑ top ]

What changed in version 3?

Simply put, contexts and sockets changed from pointers (numbers) to objects. The reasons for this are discussed below, but it means that all subVIs using the library must be relinked since the connection pane of every VI in the library has changed, and LabVIEW will not substitute them - they must be replaced.

Keeping in line with changes made in libzmq-v3, the following VIs have changed names:

[ ↑ top ]

Why change LVZMQ in v3?

When labview-zmq started, it was a bare-thin wrapper around the DLL of the official Windows build of ZMQ. Raw pointers, direct function calls. The idea was to be as simple as possible and as fast as possible, ideally using zero-copy techniques. It became clear very quickly however, that this was not a feasible thing to do.

LabVIEW is intrinsically multithreading and ZMQ is not thread-safe. ZMQ employs blocking calls that can only be unblocked in a particular way. ZMQ's memory management is incompatible with LabVIEW's and LabVIEW does not support function pointers to bridge the gap.

A helper DLL was required to avoid multi-instancing ZMQ causing segmentation faults. But tracking pointers became a critical issue and data structures had to be created to solve interlinking problems. Because of ZMQ's philosophy that it's the "user's problem" to handle such book-keeping, handling abort situations without terminating the LabVIEW Runtime Environment became tricky, requiring indirection.

To manage the book keeping, either the helper C code had to become significantly more complicated (and error prone), or we use objects in LabVIEW instead of pointers. It is considered that objects are more intuitive than pointers, so the change is desirable for both reliability and ease-of-use.

LabVIEW treats pointers as just a 64-bit numeric, this is confusing since it does not behave at all like a numeric, and it would be very bad if numeric operations were performed on it, etc. Contexts and sockets are themselves objects internally in ZMQ and it makes sense to represent them as such in LabVIEW. It also makes block diagrams substantially easier to read.

Importantly tracking information can be stored as part of the object's associated internal data. This significantly reduces the complexity of the book-keeping, and allows future modifications to be made to the internal data structure without breaking compatibility.

It is worth noting that ZMQ is evolving quickly itself, and these language bindings must be able to adapt.[ ↑ top ]

Why is there no v2?

Since the change to using objects is a signficant shift, it justifies a major version number change. It was decided to start numbering LVZMQ after the version of libzmq it uses under the hood - so you immediately know which features it supports and which it does not.[ ↑ top ]

Why not use zmq_ctx_shutdown()?

The API call zmq_ctx_shutdown() was only introduced in libzmq-v4.0, several years after this project was started. The purpose of this API call is to provide a mechanism to overcome the chicken/egg scenario in safely aborting blocking socket operations, for which the encapsulation solution was devised. LVZMQ could probably be rewritten to use this and other new library calls instead of the existing pointer-tracking solution but given the success of the current solution there is no motivation to change over.[ ↑ top ]

What is "The Guide"?

ZeroMQ sockets are an advanced technology which provide for lots of new design patterns and protocols. The manual, tutorial and all-around reference for using ZMQ is The Guide, which although it does not provide examples in LabVIEW, it discusses techniques and methods of relevance to all implementations. As an open-source project, we cannot commit to porting the many examples to LabVIEW, but we may be able to implement specific listings by request if it would prove useful.

If you find an aspect of ZMQ confusing, The Guide should be your first port of call. It is a long document, but it doesn't have to be read from start to finish - just search for terms related to your problem and start there.[ ↑ top ]

What is an "endpoint"?

An endpoint is a URI that specifies a protocol, a host name/ip address and a port number. It is a string of the form
proto://host:port
Valid protocols are tcp, ipc (interprocess communication, not available on Windows), inproc (in-process communication, only permits connections from other sockets in the same context). End-points are used when binding or connecting a socket to specify the details of the connection in a single place. [ ↑ top ]

What is a ZMQ "context"?

A context is like a socket manager - it handles threads, memory management and message passing. Your application should only need one context. If you find yourself using multiple contexts, it is likely that your protocol could be redeveloped to be more efficient using standard ZMQ design patterns. Create a context when your application starts, use it to create sockets (see below), and terminate it when the application ends.[ ↑ top ]

What is a ZMQ "socket"?

Sockets in ZeroMQ are spawned from a context (above) using lvzmq_socket.vi. Unlike standard sockets which either "listen" or "connect", ZMQ sockets come in many types, each of which has different behaviour for a specific purpose. The socket type is defined when the socket is created, and cannot be changed. The basic types and their general use-cases are: Further details can be found in The Guide and on the relevant API documentation page. Sophisticated schemes can be derived by combining different socket types on front- and back-ends, which The Guide discusses in detail.[ ↑ top ]

What is the difference between "bind" and "connect"?

Bind is the equivalent of "listen" in regular socket communications; it waits on a port for a connection to be established from a "connect". Bind and connect commands require an "endpoint" (see above); for bind commands this specifies the endpoint to allow a connection from (the IP address can be "*" to allow connection from anywhere) and the endpoint for connect is where to connect to.

Some socket types can bind/connect to multiple endpoints; check the API documentation (see above) for more details.[ ↑ top ]

What is the correct structure of a ZMQ program?

The typical structure of a zmq-based application is as below: There are many variations, but the theme is one context, many sockets. It is strongly recommended that context and socket wires are never branched to prevent access violations (see a later response). All VIs that do not close an object will provide an output terminal from which to continue using it, so there is never a need to branch an object wire. The reason for this is that each socket can only be accessed by one thread at a time, and branching wires will cause program slowdown in the best case, and instability in the worst.

Finally, you should always clean up your sockets at the end of your program. Relying on the automatic "garbage handler" is not recommended. Ensure to close all sockets correctly and terminate all contexts. Failing to do so may cause LabVIEW to run out of memory or lose stability.[ ↑ top ]

What is "termination" and "reaping"?

ZeroMQ uses a number of blocking calls to provide particular functionality, and it is necessary that they unblock at the right time (see an earlier entry). Blocking calls only return when the owning context is terminated, with the error code ETERM. It is invalid behaviour to close a socket that is involved in a blocking operation. Think of it as the socket being "in use" and closing it "pulls the rug out from under it". Termination behaviour is therefore very important

Context termination is a valid part of a zmq-based application, and doesn't necessarily mean the program is finishing, so you should plan for blocking calls to potentially return the ETERM error. However, calls to zmq_ctx_destroy do not return until all sockets are closed. That is to say, you should check for ETERM, and if ETERM is raised you are required to close the socket. This is straightforward to do in C where execution is linear and IF statements compact, but annoying in LabVIEW. That is why LVZMQ provides the "reap" feature.

If you call lvzmq_ctx_destroy with "Reap?" set to True, then the context will be destroyed and all sockets within that context will be closed, preventing you from having to do the cleanup yourself. You should do this at the end of your application when you are tidying up. When "Reap?" is set to False, it is assumed you want to do the clean-up yourself (there are some reasons for this) and the call should block until you complete the task. It should be noted that while all sockets and contexts will be "garbage collected" by special handler functions when the VI finishes executing, this should not be relied upon in production code.[ ↑ top ]

How do multi-part messages work?

One of the primary benefits of ZMQ is that it provides for multi-part messages. A message is only received if all of its parts arrive intact, and if any parts are dropped ZMQ automatically negotiates resending them, transparently to the user. This prevents having to concatenate message parts into a single buffer, or organising splitting large buffers into sequential smaller ones because it is built into the communications protocol itself.

When sending a message with zmq_send.vi, setting more? to true indicates the message is part of a multi-part message and there are more parts to come through further send commands. Alternatively, an array of string can be sent using zmq_send_multi.vi.

When receiving a message with zmq_recv.vi, the output more? indicates whether this message is part of a multi-part and another receive call is required to collect the next part. Alternatively, using zmq_recv_multi.vi will return an array containing all the message parts, and can be used for single-part messages as well, in which case it will return an array containing one element.[ ↑ top ]

What are the standard design patterns?

Standard design patterns can be found throughout The Guide, and several examples of different methods are distributed with LVZMQ (namely REQ/REP and PUB/SUB). Beyond the basic implementions, the interesting patterns are the so-called Pirate patterns which implement reliable transport mechanisms (see below for an example).[ ↑ top ]

How can I implement a reliable reply system?

Methods for establishing a reliable reply system are covered in Chapter 4 of The Guide. The advantage of using a reliable reply method is that lost messages are automatically resent, and lost connections automatically re-established. Generally this is what is desired in REQUEST-REPLY protocols; using one of the standard reliable strategies greatly simplifies implementation.

The Lazy Pirate Pattern is the simplest version of this, and an implementation is distributed with LVZMQ in the examples/ directory.[ ↑ top ]

What is a "socket monitor"?

As mentioned earlier, it's possible to connect to an endpoint before a listener has bound to it. It is even possible to send data to that endpoint which will be queued and received by a listener when it does bind. This "late binding" approach has many advantages, but can result in unexpected behaviour, as the only way to know if you have actually connected your socket is to use a two-way socket paradigm and query for an "I'm alive!" response. To get around this and enable a wider range of protocols to be more easily implemented, zmq provides the "socket monitor", which creates broadcasts "events" such as when a socket is waiting, and when it actually connects.

The way it works is that once you create your socket, you create a socket monitor, which spawns a background thread. When an event occurs, it sends a message using the PAIR protocol to any sockets connected to the specified endpoint that describe the event. The zmq_get_monitor VI is provided to receive these events and decode them into readable format. This monitor continues running until the original socket is closed.

It should be noted that the socket monitor currently only works on TCP connections, not INPROC endpoints. It will appear that a monitor has successfully started, but it will never receive any events.[ ↑ top ]

How does polling work?

In the LabVIEW environment, we're used to using events to handle incoming data and processing on the fly, as opposed to breaking up into parallel blocking receive calls. The interface zeromq provides to handle multiple sockets is the zmq_poll call (http://api.zeromq.org/3-2:zmq-poll) which checks a list of sockets and asks "what is the status of the socket right now?". This means you can pass in a bunch of sockets and find out if any have data waiting to be recv'd. You can then call zmq_recv on just these sockets, and they will return immediately since there is already data there.

An obvious way to integrate this with event handling is to spawn an event when zmq_poll indicates data is present at the socket. However, zmq_poll will indicate data is present until that data is removed - i.e. until zmq_recv is called by the event structure, potentially accumulating LOTS of events.

One solution would be to put zmq_poll after zmq_recv in the event structure - but if there are 2 or more sockets, one socket will accumulate events for the other socket, doubling the number of events resulting in BLOCKING recv calls.

The solution implemented (zmq_recv_event) is to integrate the zmq_recv call with the zmq_poll code, so that one thread "manages sockets" and produces events, and another loop consumes the events. This may be somewhat slower and is open to abuse from receiving data faster than it can be processed; this is considered an applications-level concern as it is a problem for any producer-consumer paradigm.

TL;DR look at the example code (polling_events.vi) to see how to receive ZMQ data via an event structure.

Finally, note that the poll call cannot be interrupted since it is a blocking call that involves multiple sockets and it is unclear how chained interrupts should apply (see ImplementationDetails). Never invoke it with an infinite timeout as LabVIEW will hang until the timeout expires - including if you Abort or an error occurs. It is recommended to use small timeouts in a loop where possible (e.g. order 100ms).[ ↑ top ]

Why do I get a "resetting VI" message after aborting?

This is caused by a background thread failing to terminate. For example, ZMQ spawns many background threads to process messages, and if these threads do not terminate, LabVIEW considers execution to not to have finished yet, and will wait forever for them to stop to prevent loss of data. This can only be solved by closing LabVIEW with the Task Manager.

A lot of work has been put into tracking these threads correctly to prevent any linger threads from existing. If you experience this error, please immediately lodge a bug report with exact details[ ↑ top ]

How is encapsulation handled?

Because LabVIEW is a naturally multi-threaded environment, where threads are automatically spawned and managed, it can be very hard to ensure that calls that must be sequential are actually carried out sequentially. In particular, libzmq is not thread safe and accessing the same object (socket or context) from two different threads is undefined behaviour. Which is to say, it may work or it may crash your system.

LVZMQ overcomes this by using two layers of data encapsulation, one at the C-level and one inside LabVIEW. In LabVIEW, the DVR paradigm is used to force thread serialization, which uses an "in place" node and a Data Value Reference (DVR) to ensure the same pointer cannot be accessed simultaneously. This prevents accidentally multithreading calls that were intended to be synchronous (and causing LabVIEW to crash). This approach is essentially the LabVIEW version of wrapping calls in a mutex.

The second method is encapsulating the raw socket pointer in a data structure in the helper library that contains additional information about the socket. This is necessary for automated clean-up, as the minimalist philosophy of ZMQ requires that applications do the book-keeping necessary to terminate correctly. This leads to the brute-force approach of terminating threads that are "misbehaving", but in LabVIEW this is not possible without terminating the host process. LVZMQ uses tracking structures and abort handlers to ensure that clean-up can be carried out in a way that is familiar to LabVIEW users, without the penalty of having to restart the LabVIEW process.[ ↑ top ]

Can I pass a LVZMQ socket directly to another library call?

The short story is no it is not possible to use LVZMQ sockets for another purpose. There are two reasons for this. Firstly, the "pointer" UINT that LVZMQ uses to represent a socket is actually a pointer to a C data structure, not a pointer directly to the socket. Other applications/libraries will not know how to interpret this structure correctly.

The second reason relates to how shared libraries work. Each time a library is loaded, it has its own "private" memory store that it uses to hold information. There are very specific rules that prevent libraries (and applications) from modifying each other's memory - whether accidentally or maliciously. This disallowed behaviour is called an "access violation" and generally causes the host operating system to terminate the application (because it's either a virus or something has gone horribly wrong). However, as a result of this, sockets created in LVZMQ's memory cannot be accessed from other libraries - including other instances of libzmq. This means, for instance, that it's not possible to use a CLF node to call a libzmq function even if you take care of the first point. This is also why all libzmq functions are wrapped in the helper DLL, even the ones that no not provide any extra functionality (besides error handling!).[ ↑ top ]

Can I extend LVZMQ to add a missing function?

Absolutely, although you will almost certainly need to modify the helper DLL to implement any other functionality, as LVZMQ sockets cannot be directly passed into libzmq calls (see the above question). New features are being progressively added to zmq, and are implemented in LVZMQ on a as-requested basis. If there is a particular feature you're interested in, it's best to ask about it on the forums.[ ↑ top ]

How are errors handled?

ZMQ uses return codes to indicate success or failure and a C-style errno/strerror description system. All VIs in LVZMQ check the code returned by each function call, and if it indicates an error occurred, an error cluster is created in accordance with standard error handling procedures. The error code in the cluster is a platform-independent representation of ZMQ's internal error codes. These are defined in the XML document zmq-errors.txt contained in the standard installation directory LabVIEW 20XX/project/errors. This defines both the error codes and their descriptions, which are automatically used by LabVIEW.

In accordance with standard conventions, most VIs will not execute if an error is fed in, though clean-up functions (lvzmq_close and lvzmq_term) will execute unconditionally. Internal error checking prevents further errors arising from calling these VIs on invalid objects.[ ↑ top ]

How are aborts handled?

As discussed earlier, termination behaviour is very important, particularly when blocking calls are used. LabVIEW users are accustomed to hitting the "stop" button without thinking about the consequences, expecting to be able to debug. However, something needs to tell blocking calls to unblock so that you can continue to use LabVIEW. This is done by an abort handler.

All blocking calls have abort handlers so that when execution is stopped in LabVIEW ("aborted"), the abort handler fires and terminates the contexts of all blocking sockets. This then promptly causes the blocking calls to unblock. However, the termination call itself then blocks until all of its respective sockets are closed. This results in some problematic book-keeping, as it is invalid behaviour to close a socket twice, or close a socket that is currently blocking. LVZMQ takes care of these details for you, so that when you abort the clean-up is automatic.

LabVIEW's abort functionality is a debugging tool only and should not be a regular part of your application's operation (NI say so themselves), so it is possible that you experience reduced stability if you use abort. If you encounter a situation where aborting repeatedly crashes LabVIEW, please report it in the forums.[ ↑ top ]

What are assertion fails?

ZMQ uses assertion statements internally to ensure it does not fall into an undefined state. If one of these assertions fails, the host program is terminated, which in this case is the LabVIEW run-time environment itself. However, starting with v1.6 an error-catch-continue scheme is included on Windows to prevent LabVIEW from crashing.

Implementation details: Assertions are handled by a custom ZMQ_ASSERT macro (err.hpp) which is implemented (err.cpp) on Windows as a RaiseException Win32 API call. This can be caught by an UnhandledExceptionFilter handler, and overwriting the associated structure to force execution to continue. Note that a single API call can result in several exceptions raised for the same reason.[ ↑ top ]

What is socket exhaustion? [Windows]

When a socket it closed, it is not immediately available for reuse, there is some linger time dictated by the operating system. This means that if you are opening and closing sockets in a loop, you can run out of "free" sockets while waiting for closed sockets to be released. This is primarily a concern on Windows, and results in a "Call Library Node" error (not a ZMQ error) in zmq_init, with all subsequent calls failing.

Because this occurs at the OS-level, you must redesign your protocol to use fewer sockets. It is unlikely that any correctly implemented protocol could exhaust all of the OS's sockets. You can find more information in this ZMQ bug report and the MSDN page "Avoiding TCP/IP Port Exhaustion". It is true that there are some registry tweaks you can use to improve the bottleneck (see the resolution section of the MSDN page), but it is very likely that if you are exhausting the socket pool, increasing the pool slightly will not fix the issue.

At the C-level, the associated underlying error is one of the following:

Assertion failed: Address already in use (..\..\..\src\signaler.cpp:80)
Assertion failed: No buffer space available (..\..\..\src\signaler.cpp:260)

If this error occurs, either LabVIEW must be restarted (just closing the VI is insufficient) or you must wait sufficiently long for Windows to process the socket releases.[ ↑ top ]