DLL Hell, MQL5 edition : UNICODE vs ANSI

[Versiunea romaneasca] [MQLmagazine.com in romana] [English edition]

Many many years ago, when we were kids, in the beginning years of the crazy 90s, two languages were in battle in developer world. Pascal, with a down-to-earth, easy to understand syntax, well suited to a high level language, and C++, with a more cryptic, but faster to use syntax, well suited to its medium level. C++ won the battle, and everything that was done in Windows became compiled in C++ and beared its marks : null-terminated strings and what was known at that time as standard calling convention.

The null-terminated strings were normal strings, known as ANSI strings, but at that time there was no UNICODE yet. Every character was a single byte and the strings had a dynamic lenght, as they were supposed to end with a null (a zero byte). Thus applications were receiving a pointer to indicate were to read these strings from, and knew where the strings end, by looking for a zero byte. As for the standard calling convention, on procedure call, C++ compiler was pushing parameters on the stack starting from the last and finishing with the first.

Null-terminated string (ANSI)

|---------------|
|c1|c2|....| 0  |
|b1|b2|....|bn+1|
|---------------|

Pascal was the absolute reverse of C++ in all these matters. Strings were ANSI too, one byte per each character, but strings had a fixed length of 255 bytes, or compiler defined. They had an extra byte in the front, specifying the logical length of the string (how many bytes were actually used). As for the calling convention, this was perfectly reversed, as in the pascal calling convention parameters were pushed on the stack from the first to the last.

Standard Pascal string (ANSI)

|------------------|
|ln|c1|c2|....|c255|
|b1|b2|b3|....|b256|
|------------------|

This is why Pascal strings could have been sent entirely to functions, without the need to send by reference, which is the unique mode that string sending is possible in C++.

As C++ won the battle, Pascal compiler had to adapt, and calling convention was an easy task. As for the strings, the problem became complicated, as developers had to struggle with PCHAR, a name given to fixed arrays of one byte per element, which were supposed to hold C++ null-terminated strings and were sent by reference.

As these were not enough for developers, the UNICODE standard came in.

UNICODE is a complicated standard, and I don’t know it entirely. The difference from the ANSI is that UNICODE characters are wider, generally they span on two bytes each, but there are also 4-byte character coded strings. In the beginning, UNICODE strings seemed to be something new and awckward. Thus, they were called wide strings. Windows API Functions working with wide strings were had a name terminated with W ; pointers to null-terminated ANSI strings were char* , thus pointers to null-terminated UNICODE had to be called wchar_t* .

Null-terminated strings (UNICODE)

|------------------------------------------|
|  c1 |  c2 |....|      cn   |      0      |
|b1|b2|b3|b4|....|b 2n-1|b 2n|b 2n+1|b 2n+2|
|------------------------------------------|

MQL5, as most of the programming environments nowadays, is UNICODE. Even simple strings that you use regularly are still UNICODE. They have an ANSI look, but internal representation is UNICODE. This is because ANSI can be packed in UNICODE, filling unneed bytes with 0.

ANSI packaged in UNICODE (MQL5 normal strings)

|------------------------------------------|
|  c1 |  c2 |....|      cn   |      0      |
|b1|0 |b3|0 |....|b 2n-1| 0  |b 2n+1|  0   |
|b1|b2|b3|b4|....|b 2n-1|b 2n|b 2n+1|b 2n+2|
|------------------------------------------|

So, in a UNICODE-packed ANSI, every even byte is 0.
But what if you have an older C++ DLL, who uses null-terminated ANSI strings ?
That means it expects and returns null-terminated ANSI strings.

So, if you are to send an “ABC” string to such a DLL, it have mapped in bytes: 65, 0, 66, 0, 67, 0.
The DLL will see the first 0 as the null terminating the string and will understand only “A” from the entire string.

If you are to receive an “ABC” from this type of DLL, you would receive in bytes: 65, 66, 67, 0.
The UNICODE MQL5 will understand first character as 65 and 66 (making something chinese-like), and the second character as 67 and 0, mapping to “C”. Then it will continue reading, if there is no access violation, until it finds 0 and 0, making up for the null, resulting in a complete jabber. The access violation might be avoided because MT5 might allocate enough space for string receival.

Sadly, MQL5 doesn’t have an ansistring type to handle conversions automatically. But, for the good part, at least in both cases strings are sent by reference, so it is actually a problem of meaning instead of a conflict in value/reference sending.

This means you have to send UNICODE strings that are to be correctly decoded as ANSI, and receive in ANSI strings that you have to convert to UNICODE for using.

When you are to receive an ANSI string in a UNICODE form, start reading UNICODE characters by typecasting each character to a unsigned short, then divide this in the two ANSI, (by modulo 256), add to resulting UNICODE string the modulo (as ANSI code), and the remainder (as ANSI code). So each 2 bytes of the original ANSI map into 4 bytes (2 UNICODE characters).

When you want to pack an ANSI-encoded UNICODE string, like an MQL5 string, as an ANSI, you read every two UNICODE chars in a row, then forcibly typecast them to unsigned char, like the size of ANSI characters. Then pack up new UNICODE character with the first read as modulo and second as remainder into a larger unsigned short, that you will add as code of the new character to the resulting UNICODE string.

The following is the code of two conversion functions, written as a include file. Make sure you make this file to be a include file, in the include folder, saving it as stringlib.mqh.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
//+------------------------------------------------------------------+
//|                                                    stringlib.mqh |
//|                                       Copyright Bogdan Caramalac |
//|                                           http://mqlmagazine.com |
//+------------------------------------------------------------------+
#property library 
#property copyright "Bogdan Caramalac"
#property link      "http://mqlmagazine.com"
#property version   "1.00"
 
string ANSI2UNICODE(string s)
  {
   ushort mychar;
   long m,d;
   double mm,dd;
   string img;    
   string res="";
   if (StringLen(s)>0)
     {
      string g=" ";
      for (int i=0;i<StringLen(s);i++)
         {          
          string f="  ";          
          mychar=ushort(StringGetCharacter(s,i));
          mm=MathMod(mychar,256);
          img=DoubleToString(mm,0);
          m=StringToInteger(img);
          dd=(mychar-m)/256;
          img=DoubleToString(dd,0);
          d=StringToInteger(img);
          if (m!=0)
            {
             StringSetCharacter(f,0,ushort(m));
             StringSetCharacter(f,1,ushort(d));
             StringConcatenate(res,res,f);
            }//if (m!=0)
          else
            break;                      
         }//for (int i=0;i<StringLen(s);i++)
      }//if (StringLen(s)>0)
   return(res);
  }
 
string UNICODE2ANSI(string s)
  {
   int leng,ipos;
   uchar m,d;
   ulong big;
   leng=StringLen(s);
   string unichar;
   string res="";
   if (leng!=0)
     {    
      unichar=" ";
      ipos=0;      
      while (ipos<leng)
        { //uchar typecasted because each double byte char is actually one byte
         m=uchar(StringGetCharacter(s,ipos));
         if (ipos+1<leng)
           d=uchar(StringGetCharacter(s,ipos+1));
         else
           d=0;
         big=d*256+m;                
         StringSetCharacter(unichar,0,ushort(big));         
         StringConcatenate(res,res,unichar);    
         ipos=ipos+2;
        }
     }
   return(res);
  }

When using the include you simply write

1
#include <stringlib.mqh>

as in the following example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
//+------------------------------------------------------------------+
//|                                                  teststrings.mq5 |
//|                                       Copyright Bogdan Caramalac |
//|                                           http://mqlmagazine.com |
//+------------------------------------------------------------------+
#property copyright "Bogdan Caramalac"
#property link      "http://mqlmagazine.com"
#property version   "1.00"
 
#include <stringlib.mqh>
 
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
void OnStart()
  {
   string original_unicode,ansi,converted_unicode;
   original_unicode="EvenString";
   ansi=UNICODE2ANSI(original_unicode);
   converted_unicode=ANSI2UNICODE(ansi);
   Print(original_unicode," -> ",ansi," -> ",converted_unicode);
   original_unicode="OddString";
   ansi=UNICODE2ANSI(original_unicode);
   converted_unicode=ANSI2UNICODE(ansi);
   Print(original_unicode," -> ",ansi," -> ",converted_unicode);
  }
//+------------------------------------------------------------------+

10 Responses to “ DLL Hell, MQL5 edition : UNICODE vs ANSI ”

  1. Olaf van wijk on October 17, 2011 at 11:09 am

    Thanks a lot for this.
    Great help for string communication between MQL and DLL’s

  2. Investeo on December 18, 2011 at 10:17 pm

    Now I know how to bite the chinese DLL string , thanks ;)

  3. Marek on January 22, 2014 at 9:16 pm

    Great job!

    I have one suggestion:

    Instead of “StringConcatenate(res,res,unichar);”
    try “res=StringConcatenate(res,unichar);”
    and instead of “StringConcatenate(res,res,f);”
    try “res=StringConcatenate(res,f);”

    :)

  4. Bogdan Caramalac, MQLmagazine sr.editor on January 22, 2014 at 9:50 pm

    It doesn’t work, Marek! Even now! Still gives “wrong parameters count” compiler error. I expected a syntax upgrade in these years, to allow StringConcatenate to work like a normal function, but it seems they didn’t change anything about it.

  5. David Williams on February 11, 2014 at 6:58 pm

    Thanks for this code. As you probably know, since the build 600 update, all MQL4 strings are UNICODE. I’ve been looking for a method to convert some of my strings back to ASCII.

    For MQL4 support, I had to change this line:
    StringConcatenate(res,res,unichar);

    To this:
    res = StringConcatenate(res,unichar);

  6. Bogdan Caramalac, MQLmagazine sr.editor on February 11, 2014 at 7:28 pm

    Ya, I heard about this, my friend is into it, I quite got out: seems MQL5’s StringConcatenate writes by reference and returns a void, while MQL4’s StringConcatenate returns a string…

  7. Lok on July 14, 2014 at 6:02 pm

    Hey Bogdan, thank you so much! Your solution is elegant and saved me alot of time!
    And indeed for MQL4 you have to move “res” out of StringConcatenate.

  8. Nac on April 6, 2015 at 12:41 pm

    Thank you for your explanation and solution, but despite that I changed the code as suggested by Lok, David Williams and Marek, ANSI result is ‘??????????????’ . What may i do to solve it?

  9. Bogdan Caramalac, MQLmagazine sr.editor on April 6, 2015 at 1:54 pm

    Hi, can I have a look at your code (both the function body and the function call) ?

  10. Nac on April 6, 2015 at 9:04 pm

    Of course ,

    The function in the .mqh :
    int f2M_create_from_file(string path);

    In my programme :

    int ann_load (string path)
    {
    int ann;
    ann=f2M_create_from_file(path);
    if(ann!=-1)
    {
    debug(1,”ANN: ‘”+path+”‘ OK “+ann);
    }
    if(ann==-1)
    {
    debug(0,” Error !! ” + ann);
    }
    return(ann);
    }