Anatomy of a simple CEP engine

[Versiunea romaneasca] [ in romana] [English edition]

If there wouldn’t be that CEP abbreviation there you’d probably ask how that something inherently complex could become simple. Thing is, everything starts by being simple at first then evolving to something more complicated.

Even if you might have read my Progress Apama article, I remind you here the definition given by Dan Hubscher to CEP : “the ability to relate two or more events, even if these events occur in a span of time, and then determining meaning from that relationship immediately, such as detecting a trading signal, or an alpha-generating profit opportunity”.

So, given that events have to be related, within a time constraint, there must be an event queue, which stores individual events , as they are being received. Sure, storing is not made ad infinitum, in an ever increasing queue ; the queue size has to be dynamic, but it’s length in time should not exceed a predetermined amount of time, say 5 minutes. This is why a CEP engine, that is mainly targetted for HFT and Level II trading, doesn’t have to mixed with Level I trading signals, because these span on a way higher amount of time, lots of minutes to hours or even more, making the event stack not only to grow to an unmanageable size, but also to slow down the engine speed.

The covered amount of time is the first limitation of the engine.

Second, is the link between simple events. Simple events are not fully independent, as the Progress Apama webinar slide might show. For instance, you are considering a cross of higher Bollinger Band as an event, but also the cross of the lower Bollinger Band. These events are completely opposite. Should both happen, at the time the second happens, the first one must elliminated from the queue, as you see in the scheme below that event 4 is deleted. There are two paths that may be chosen, either opposite events are removed from the event queue after some time – which is diffcult to determine, or before final validation of a complex event there are searched opposite simple events coming after the ones that make up the complex event.

Third, avoiding event redundancy. Bad implementations can cause redundancy, meaningly same event to be reported over and over again. For instance, “MSFT is below 5-minute moving average”. A parenthesis here – I don’t mean the 5 minute chart, rather the moving average of all ticks occurred in last 5 minutes, that’s why I said MetaTrader should have tick data and tick period enabled.When it happens, it is an event. But the next second, at the next tick, it will be reported again, and the engine has to reject it, because it already is in the queue, as there is the case with the orange event from the scheme, which doesn’t even gets a number, because it is rejected from the beginning. Such situations should be avoided from start – the engine mustn’t lose time for redundancy check, otherwise the engine should be able to filter for these events , but it’s better that the event stream is cleaned before.

This is the schematics of the engine. The tick events, or the more complex calculus events originate from an OnChart() event that is triggered by more Tick Event EAs, as they are described in the article Tick events in MQL5: complicated for now, but functional .

The first column is event number in the queue, from the most recent to the oldest. Each simple event, that is linked to the data, has a number allocated to it. The same event, happening on a different instrument, has to be given a different number, so same event sequence making up a complex event will be made up of same simple events but with different identfiers. The engine must know which events are completely opposite, for instance 1 and 3, also which event is allowed to appear multiple times in the queue, because some may be allowed (for instance execution events (the ones beginning with E in the scheme)). All other events will be forbidden from appearing multiple times – but however this should not be done by the engine per se, the one that moves the event queue, rather these must be forbidden from the pre-processing phase.

At the same time, the CEP engine must recognize each complex event knowing all simple event arrangements.
For instance, if a complex event for Microsoft (MSFT) is made of “10 followed by (11 or (12 and 13))” within 3 minutes, then the binder will look for, within 3 minutes: 10, 11 ; 10, 12, 13 ; 10, 13, 12 meaning for all alternate paths. If the same event would mapped for SP500 (^GSPC), given that its sub events will have different numbers, the condition might look like “5 followed by (15 or (16 and 17))”, having the arrangements: 5, 15 ; 5, 16, 17 ; 5, 17, 16. Such arrangements can be given directly to the engine, written directly inside, or generated inside with an algorithm, transforming the binary expression, with the notable difference that “FOLLOWED BY” is a non commutative AND. Because the rules are defined as series of events appearing in a certain order, the negation can’t exist as in a binary expression. It can be implemented only a a interdiction for some events to appear over the entire time constraint or within some sequences, as simple forbidden entries between required events.

You can see that I included in the scheme two bind columns. That is , one simple event can be binded to more than one complex event, because otherwise some rules would never trigger. Complex events would have a separate ID set, that is not to intersect with the simple events ID sets. The engine core will send the complex events to be processed by user in the EventsCallback(), but at the same time, the user could pipeline these events to another CEP engine.

The matching algorithm begins with the construction of the simple events matrix per each complex event. The simple event matrix will contain on each row an arrangement. Because checking is done backward, from the later events to older events, arrangements have to be written in the matrix in a reversed manner. Each complex event will need passing thru the events queue only for as long as the time constraint spans, in a vertical manner : first pass will check events on the first column, and simple events found this way will be pre-binded ; second pass will check the second column only for arrangements that had a pre-bind on the first. When no new pre-binds take place, matching is aborted for that specific complex event. However, to minimize the time needed, each pass will regard all events, before jumping to the next pass. When searches are cancelled for all complex events, algorithm ends.
When an arrangement has all simple-events pre-binded, it’s time to check for negations. First, there will be an entire period check for global negations – events forbidden during entire complex event, then for events forbidden between pre-binded events. When all negations are checked,a final check has to be done to ensure they still abide the time constraint, and only in this case, simple events are binded to the complex event, provided that in the bind array of each simple event this bind is not previously done – to avoid re-binding to same complex event. The resulting complex event is then put in a bind queue and passed to EventsCallback(). Sending events to EventsCallback() must take place after the algo is done, otherwise there might appear recursive engine calls that will not end their execution.For the simple events that are being binded, the current bind array position will be filled with the complex event, the position is then incremented by one.

Here is an example of different complex events in different stages of pre-binding:

For the MSFT event, the engine finds event 12 on the first pass, and pre-binds it on the third arrangement. On the second pass, it finds event 13, and pre-binds it to the second arrangement. It finds 12 too, but it can’t pre-bind it to the second arrangement because it occurred before 13. If on the next pass finds the older 10 event, third arrangement is completed, being no negations there, and if time still abides constraint, provided that complex event 1 was not binded yet, then simple events on the third arrangement are binded to complex event 1, bind counter incremented, event placed in the bind queue. For the ^GSPC event (complex event 2), the engine has completed already the first arrangement. Event 5 does not appear checked for second and third arrangement because events on the previous columns are not checked.

As for the execution events, these are sent by the EnumCallbacks() event of the DealHandler class from the Distinguishing quasi simultaneous fills – class to report last deals article. Simple execution events can also be mapped in more complex execution events because deals have magic numbers and comments, and if you buy two times in row same 1000 stocks, you will know which buy was part of which complex execution event. This is why such events could be treated either seperately or in combination with market events.

We will come with a coding example in a future article.