Return to Video

Technical Evolution of Wikipedia - Teknika Evoluo de Vikipedio

  • 0:04 - 0:09
    Technical evolution of Wikipedia by Brion Vibber - Former CTO of Wikipedia
  • 0:09 - 0:12
    Good evening, everyone.
  • 0:12 - 0:15
    It's my pleasure to present to you, Brion Vibber.
  • 0:15 - 0:23
    For years, he's worked at the Wikimedia Foundation as their chief technical officer.
  • 0:23 - 0:31
    and I'm very happy, that Lu [the German translator] could come. He runs Esperantoland.org.
  • 0:31 - 0:35
    I'll give the floor to Brion.
  • 0:35 - 0:52
    Wikipedia: is there anyone who doesn't know about Wikipedia?
  • 0:52 - 0:57
    So, a bit of Wikipedia history
  • 0:57 - 1:05
    and mostly about the technical aspect of multilingual support.
  • 1:05 - 1:14
    Originally when the Wikipedia was founded, it was only in English.
  • 1:14 - 1:22
    Now something which is nice and easy about English is that it doesn't have any accented characters.
  • 1:22 - 1:30
    But of course, an important problem with that is that the American programmers, like myself,
  • 1:30 - 1:38
    don't know much about problems concerning letters and writing systems in other languages,
  • 1:38 - 1:48
    and because of that, a lot of software and websites don't handle languages well,
  • 1:48 - 1:54
    except those from Western Europe.
  • 1:54 - 1:59
    Many are interested in supporting other languages,
  • 1:59 - 2:08
    so we can take all human knowledge to everyone on earth.
  • 2:08 - 2:16
    So, it would be good to support other languages, but that didn't work well at first.
  • 2:16 - 2:22
    I set up many websites for Wikipedia in many languages,
  • 2:22 - 2:25
    several dozen languages, in fact.
  • 2:25 - 2:33
    But many of them were totally messed up, for example the Japanese Wikipedia.
  • 2:33 - 2:41
    Now, it can be written well. It has many characters.
  • 2:41 - 2:47
    It looks good and one can read and write. Everything works well now.
  • 2:47 - 2:52
    But originally, it looked very similar to that.
  • 2:52 - 3:02
    It remained an important problem for many languages: Japanese, Chinese, Russian, Hebrew, etc.
  • 3:02 - 3:06
    Many of them didn't work at all.
  • 3:06 - 3:20
    At one time, Polish even set up its own website for its wiki,
  • 3:20 - 3:28
    which supported Eastern European letters well, but still not Japanese nor Russian, etc.
  • 3:28 - 3:36
    At that time, I started to get to know Wikipedia through Esperanto.
  • 3:36 - 3:45
    I was a university student and I learned French at a normal course.
  • 3:45 - 3:55
    But I became interested in other languages, and I taught myself Esperanto online and through books, etc.
  • 3:55 - 4:00
    On my computer and online, it was very interesting,
  • 4:00 - 4:08
    and the Esperanto Wikipedia was founded by our dear Chuck.
  • 4:08 - 4:21
    At that time, we still had a messed up character set, well for Esperanto, which has accented characters.
  • 4:24 - 4:31
    That way it has an accent on the letters.
  • 4:31 - 4:40
    But, for it to be written on the webpage, you had to write using the "x system".
  • 4:40 - 4:50
    So, "cx" changes to "ĉ", etc. It looks really ugly.
  • 4:50 - 5:07
    To make it more beautiful and show it the way it should be, I added Unicode support.
  • 5:07 - 5:19
    Unicode is a system to encode characters for every language in the world:
  • 5:19 - 5:28
    from Egyptian hieroglyphics to modern Japanese and Korean as well as many symbols
  • 5:28 - 5:35
    in one system, which can include all of them.
  • 5:35 - 5:45
    So that, we don't need a separate Polish system for Eastern Europe,
  • 5:45 - 5:49
    and French for Western Europe, etc.
  • 5:49 - 5:58
    We can have just one system, one program, one website for every language.
  • 5:58 - 6:13
    With that worldwide system, it was started in computers already 20 years ago,
  • 6:13 - 6:23
    but in 2001 or 2002, when we founded Wikipedia, Unicode was still "new" online,
  • 6:23 - 6:30
    so it was difficult to use it in "American" programs.
  • 6:30 - 6:42
    You had to kind of study how UTF8 works to put Unicode in a web page.
  • 6:42 - 6:53
    But, I was able to study it a bit, and I added support to Wikipedia's original software.
  • 6:53 - 7:04
    I gave it a converter from the "sx" to the correct "ŝ", etc.
  • 7:04 - 7:10
    But I found that it's not just for Esperanto.
  • 7:10 - 7:14
    It can work for other languages as well.
  • 7:14 - 7:22
    For example Russian, Japanese and Polish can work with Unicode.
  • 7:22 - 7:28
    Unfortunately, it was a bit more complicated,
  • 7:28 - 7:41
    because at that time we also upgraded to new Wikipedia software, which was completely different from the original.
  • 7:41 - 7:48
    It was better, but it still didn't support Unicode.
  • 7:48 - 7:57
    Of course, it was created by Western Europeans and Americans,
  • 7:57 - 8:05
    and it didn't know there were other languages other than in Western Europe and North America
  • 8:05 - 8:08
    which have other letters.
  • 8:08 - 8:17
    So, that's why it was necessary to add Unicode support three times.
  • 8:17 - 8:21
    Originally for the Esperanto Wikipedia.
  • 8:21 - 8:34
    The second time for the new system, which was originally created for the English Wikipedia and didn't need Unicode.
  • 8:34 - 8:44
    And again when we completely changed the software to speed it up,
  • 8:44 - 8:49
    but then it was completed the third time.
  • 8:49 - 8:59
    In 2002 and 2003, we tried to start new Wikipedias in many languages.
  • 8:59 - 9:13
    We reacquired Polish and were able to better unite it with the other languages.
  • 9:13 - 9:22
    For example, one language can link to a page about the same thing in another language.
  • 9:22 - 9:29
    Now with the same system for everything, one can do that.
  • 9:29 - 9:39
    It's better to combine the groups in their own language.
  • 9:39 - 9:46
    Similarly, there were other problems for the languages in the program online.
  • 9:46 - 9:54
    It was somewhat problematic, that the traditional American programmers
  • 9:54 - 10:06
    and often even the Western Europeans created their own programs only in English.
  • 10:06 - 10:16
    It was a problem when someone didn't know English or didn't know it well
  • 10:16 - 10:21
    or just wanted to use a system in their own language.
  • 10:21 - 10:32
    Because of that, we also had to add a system to translate messages from the websites,
  • 10:32 - 10:38
    so everyone can understand it in their own language.
  • 10:38 - 10:52
    For example, we can see ... Article, Discussion, History, Delete
  • 10:52 - 10:57
    "Article", "Talk", "Edit", "History", only in English.
  • 10:57 - 11:01
    It's not very good, though generally one understands English.
  • 11:01 - 11:23
    So, we created a map between the messages and a short description about every message.
  • 11:23 - 11:41
    When we have something larger, long messages, and there are sentences and paragraphs, etc.
  • 11:41 - 11:46
    It's a bit more complicated than simple words.
  • 11:46 - 11:54
    That's why we give a name for every message.
  • 11:54 - 12:10
    In the program, it doesn't have an English sentence, it just has a name which is "login-message" or the like.
  • 12:10 - 12:22
    In a file with the map for each individual language, is the name and the message.
  • 12:22 - 12:26
    The message can be translated into every language.
  • 12:26 - 12:36
    Similar systems are used in many programs of various kinds,
  • 12:36 - 12:50
    but what is most different about the Wikipedia system, is that one can also change that message.
  • 12:50 - 13:14
    If I want to change that sentence a bit, so that my Wikipedia can have a standard or rule
  • 13:14 - 13:24
    how one writes an article or choose administrators, etc.
  • 13:24 - 13:31
    It can be different in the Wikipedia system.
  • 13:31 - 13:36
    One can ... I'm not logged in, so I can't ...
  • 13:36 - 13:49
    but the website administrators can use the wiki to change their own messages.
  • 13:49 - 13:54
    [Unfortunately then, my camera stopped working.]
Title:
Technical Evolution of Wikipedia - Teknika Evoluo de Vikipedio
Description:

Brion Vibber, previous CTO of Wikimedia, speaks of the early history of Wikipedia and Esperanto's influence on the multilingualism of the project. English subtitles coming soon. This video is under the Creative Commons BY-NC-SA license.

Brion Vibber, eks-teknikestro de Vikimedio, parolas pri la frua historio de Vikipedio kaj la influo de Esperanto pri la multlingveco de la projekto. Ĉi tiu filmeto estas sub la permesilo de Krea Komunaĵo BY-NC-SA.

more » « less
Video Language:
Esperanto
Duration:
13:55

English subtitles

Incomplete

Revisions