2012-08-05

Romanian diacritic marks in movie subtitles

One important "screen" is the TV screen. In Romania the movies are not dubbed, they are subtitled. This means that all dialogs are presented as text for the viewer to read and understand the movie.

I remember I wanted to learn how to read because I wanted to read the movie subtitles.

 

Television


By using www.cool-itv.net, which uses P2P SoapCast tehnology to distribute cable TV stations I was able to analyze which Romanian diacritics were used in movie subtitles.

Below you will see some screenshots of some TV stations:

Almost all of the TV stations used the old diacritics - S and T cedilla (şŞţŢ) - with the exception of the last one which uses the correct diacritics - S and T comma below (șȘțȚ).

The state-owned public TV broadcasters (TVR1, TVR2 and so on) did their homework and their software can handle Unicode characters and they are using the correct Romanian diacritics. Chapeau!

At least some of them use the same diacritic: Discovery, Animal Planet, ProTV, Pro Cinema, and they did not mix s cedilla with t comma below like HBO, Antena1, and Kanal D.

One interesting case was TCM which used  A Caron (ǎ) instead of A Breve (ă). Also Prima used A Tilde (ã) in their promotional clips.

The usage of the old diacritics is due to the fact that the specialized TV software used is some old software written before Microsoft started promoting Unicode.

Old software and subtitle standard from 1991 is responsible to the major usage of incorrect Romanian diacritics. The 1991 standard is the EBU (European Broadcasting Union) TECH. 3264-E  which doesn't support Unicode characters.

Hopefully all this will change with the arrival of the new EBU-TT Subtitling Specification which was announced on 31st of July 2012. In a couple of years all TV stations will be using the correct Romanian diacritics.

AVI / MKV Movies


The same grim story is presented in the underground movie subtitles scene, for example if you go to http://www.opensubtitles.org/ro and download for example Iron Sky Romanian subtitle you will get only Windows-1250 codepage subtitles. By the way Iron Sky is a cool movie :)

Nobody uses Unicode to encode the subtitles, which would not require the user to configure their media player of choice to Windows-1250 / Central European / ISO-8859-2 as default code page for subtitles.

I've created a small Windows tool (133KBytes) which automatically converts subtitles from old diacritics to correct diacritics. The tool can be downloaded from here.

Below you have a screen shot of the tool:

I hope Unicode subtitles will be more popular in the future. There is no need to stick to ANSI code pages!

8 comments:

Dan Bujor said...

Buna treaba cu programul asta. Multumesc, a functionat de minune! Indiferent ce codari si fonturi foloseam nu reuseam sa vad anumite sa vad anumite subtitrari.

ursulupi said...

foarte bine lucrat! multumim!

Unknown said...

extraordinar! multumesc

Anonymous said...

Programelul este excelent. Multumesc!

Unknown said...

Buna ziua,
Ce inseamna : Trage si lasa sa cada? Adica ce trebuie sa fac sa-l instalez?

Unknown said...

Buna ziua, Ce inseamna Lasa sa cada? Ce trebuie sa fac sa-l instalez?

Cristian said...

@Unknown Drag & Drop ar fi traducerea la Trage și lasă să cadă.

Lansezi executabilul "subtitrări_unicode.exe" și apoi "drag & drop" unui fișier .sub/.srt

snow said...

De câteva ori pe ani revin la pagina asta să văd dacă nu cumva au apărut noutăți gen multi-drag&drop sau măcar un link de GitHub către surse în care să îmi bag degetele. Folosesc aplicația de mulți, mersi!