Facebook + Unicode ≠ Love

The theme for the last days is the question marks instead of Unicode characters in various software products. In this case Facebook.

Facebook? Have you read this article written by Andrei Alexandrescu?

"The Best Programming Advice I Ever Got"

Coming back to question marks. I wanted to post a funny video on Facebook when I've noticed that Facebook gathered as description some undesired question marks:

If you go to the funny video (in Romanian) page you can see the caption as: „Primul șlagăr care te învață să scrii corect înjurăturile.”. No question marks there!

I'm sure there is an explanation why that happened... but it shouldn't have happened. Nobody said that Unicode is easy!

P.S. It seems Google+ does the same thing:

I was prepared to congratulate you Google on doing a better job than Facebook...


MSI + Unicode ≠ Love

I was upgrading my Libre Office installation to version 3.6.1 when I've noticed the question marks in the installer UI:

How can this be? Ș and ț (s and t comma below) characters are present in the Microsoft Sans Serif and Tahoma fonts since Windows 2000.

The only explanation is: ANSI installation program. No way! You would think that Windows Installer would be Unicode, but it's not! Michael Kaplan wrote about this weirdness seven years ago: MSI Databases and Unicode?

WiX also states on their help page: „Top-level elements like Product, Module, Patch, and PatchCreation support a Codepage attribute. You can set this to a valid Windows code page by integer like 1252, or by web name like Windows-1252. UTF-7 and UTF-8 are not officially supported because of user interface issues. Unicode is not supported.”

Programs like 7-zip use MSI for x64 target because NSIS installer did not have support for x64. NSIS officially doesn't support Unicode and x64, but there are forks which do (Unicode fork, x64 fork).

I've filed a bug (#54232) for LibreOffice. I guess they will use the old ş and ţ (s and t cedilla) characters to fix this problem.

Windows 95 End of Life was 1st of January 2003, Windows 98, Windows 98 Second Edition, and Windows ME End of Life was 1st of April 2007.

They haven't fixed this problem even though is has been five years since they do not support any ANSI operating system.


Qt Creator Visual C++ keyboard shortcuts

Qt Creator comes with a "MS_Visual_C++.kms" keyboard shortcuts scheme. But this scheme is not based on Visual C++ Professional, but instead of Visual C++ Express.

The two versions Visual C++ Professional and Visual C++ Express do not have the same keyboard shortcuts scheme. You can download the default keyboard scheme for Visual Studio 2010 C++ here.

If you have used Visual C++ Professional with Visual Assist you will also notice that the Visual Assist shortcuts are also missing. Visual Assist default shortcuts can be viewed here.

Visual C++ allows multiple shortcuts for the same command. Unfortunately Qt Creator doesn't allow this and at times I had to choose from multiple shortcuts only one.

Below is a table with the updates to the "MS_Visual_C++.kms" keyboard shortcuts scheme:

Command Label Shortcut Old Shortcut
Toogle Toggle Bookmark Ctrl+F2 Ctrl+K, Ctrl+K
OutputPane.nextitem Next Item F4 F6
OutputPane.previtem Previous Item Shift+F4 Shift+F6
VisualizeWhitespace Visualize Whitespace Ctrl+Shift+8 Ctrl+E, Ctrl+V
LowercaseSelection Lowercase Selection Ctrl+U Alt+U
UppercaseSelection Uppercase Selection Ctrl+Shift+U Alt+Shift+U
Sidebar.Projects Activate Projects Pane Ctrl+Alt+L Alt+X
Sidebar.Class View Activate Class View Pane Ctrl+Shift+C
AddNewFile Add New... Ctrl+Shift+A
AddExistingFiles Add Existing Files... Alt+Shift+A
RunToLine Run to Line Ctrl+F10
AttachToLocalProcess Attach to Running Local Application... Ctrl+Alt+P
FindUsages Find Usages Alt+Shift+F Ctrl+Shift+U
RenameSymbolUnderCursor Rename Symbol Under Cursor Alt+Shift+R Ctrl+Shift+R
Methods Methods and Functions Alt+Shift+S
Methods in current Document Methods in Current Document Alt+M
JumpToDefinition Follow Symbol Under Cursor Alt+G Ctrl+F12
SwitchDeclarationDefintion Switch Between Method Declaration/Definition Ctrl+F12 Ctrl+Shift+F12
SwitchHeaderSource Switch Header/Source Alt+O F4
Files in current project Files in Current Project Alt+Shift+O
Files in current project Files in Current Project Alt+Shift+O
CancelBuild Cancel Build Ctrl+ᡀ�

Cancel Build should have been Ctrl+Break. I filed this behavior as Qt Creator Bug #4609.

The complete "MS_Visual_C++_Visual_Assist.kms" can be downloaded from here. With these keyboard shortcuts Qt Creator is a able to pose as a very good substitute to Visual C++ and Visual Assist!


Romanian diacritic marks in movie subtitles

One important "screen" is the TV screen. In Romania the movies are not dubbed, they are subtitled. This means that all dialogs are presented as text for the viewer to read and understand the movie.

I remember I wanted to learn how to read because I wanted to read the movie subtitles.



By using www.cool-itv.net, which uses P2P SoapCast tehnology to distribute cable TV stations I was able to analyze which Romanian diacritics were used in movie subtitles.

Below you will see some screenshots of some TV stations:

Almost all of the TV stations used the old diacritics - S and T cedilla (şŞţŢ) - with the exception of the last one which uses the correct diacritics - S and T comma below (șȘțȚ).

The state-owned public TV broadcasters (TVR1, TVR2 and so on) did their homework and their software can handle Unicode characters and they are using the correct Romanian diacritics. Chapeau!

At least some of them use the same diacritic: Discovery, Animal Planet, ProTV, Pro Cinema, and they did not mix s cedilla with t comma below like HBO, Antena1, and Kanal D.

One interesting case was TCM which used  A Caron (ǎ) instead of A Breve (ă). Also Prima used A Tilde (ã) in their promotional clips.

The usage of the old diacritics is due to the fact that the specialized TV software used is some old software written before Microsoft started promoting Unicode.

Old software and subtitle standard from 1991 is responsible to the major usage of incorrect Romanian diacritics. The 1991 standard is the EBU (European Broadcasting Union) TECH. 3264-E  which doesn't support Unicode characters.

Hopefully all this will change with the arrival of the new EBU-TT Subtitling Specification which was announced on 31st of July 2012. In a couple of years all TV stations will be using the correct Romanian diacritics.

AVI / MKV Movies

The same grim story is presented in the underground movie subtitles scene, for example if you go to http://www.opensubtitles.org/ro and download for example Iron Sky Romanian subtitle you will get only Windows-1250 codepage subtitles. By the way Iron Sky is a cool movie :)

Nobody uses Unicode to encode the subtitles, which would not require the user to configure their media player of choice to Windows-1250 / Central European / ISO-8859-2 as default code page for subtitles.

I've created a small Windows tool (133KBytes) which automatically converts subtitles from old diacritics to correct diacritics. The tool can be downloaded from here.

Below you have a screen shot of the tool:

I hope Unicode subtitles will be more popular in the future. There is no need to stick to ANSI code pages!