0:00:04.285,0:00:09.285 Technical evolution of Wikipedia by Brion Vibber - Former CTO of Wikipedia 0:00:09.285,0:00:11.540 Good evening, everyone. 0:00:11.540,0:00:15.417 It's my pleasure to present to you, Brion Vibber. 0:00:15.417,0:00:23.475 For years, he's worked at the Wikimedia Foundation as their chief technical officer. 0:00:23.475,0:00:31.236 and I'm very happy, that Lu [the German translator] could come. He runs Esperantoland.org. 0:00:31.236,0:00:35.167 I'll give the floor to Brion. 0:00:35.167,0:00:52.477 Wikipedia: is there anyone who doesn't know about Wikipedia? 0:00:52.477,0:00:57.167 So, a bit of Wikipedia history 0:00:57.167,0:01:04.583 and mostly about the technical aspect of multilingual support. 0:01:04.583,0:01:14.417 Originally when the Wikipedia was founded, it was only in English. 0:01:14.417,0:01:22.042 Now something which is nice and easy about English is that it doesn't have any accented characters. 0:01:22.042,0:01:30.417 But of course, an important problem with that is that the American programmers, like myself, 0:01:30.417,0:01:38.042 don't know much about problems concerning letters and writing systems in other languages, 0:01:38.042,0:01:48.042 and because of that, a lot of software and websites don't handle languages well, 0:01:48.042,0:01:54.250 except those from Western Europe. 0:01:54.250,0:01:59.250 Many are interested in supporting other languages, 0:01:59.250,0:02:07.524 so we can take all human knowledge to everyone on earth. 0:02:07.524,0:02:16.375 So, it would be good to support other languages, but that didn't work well at first. 0:02:16.375,0:02:22.333 I set up many websites for Wikipedia in many languages, 0:02:22.333,0:02:25.167 several dozen languages, in fact. 0:02:25.167,0:02:33.417 But many of them were totally messed up, for example the Japanese Wikipedia. 0:02:33.417,0:02:40.917 Now, it can be written well. It has many characters. 0:02:40.917,0:02:47.000 It looks good and one can read and write. Everything works well now. 0:02:47.000,0:02:51.875 But originally, it looked very similar to that. 0:02:51.875,0:03:01.833 It remained an important problem for many languages: Japanese, Chinese, Russian, Hebrew, etc. 0:03:01.833,0:03:05.875 Many of them didn't work at all. 0:03:05.875,0:03:19.569 At one time, Polish even set up its own website for its wiki, 0:03:19.569,0:03:28.042 which supported Eastern European letters well, but still not Japanese nor Russian, etc. 0:03:28.042,0:03:35.737 At that time, I started to get to know Wikipedia through Esperanto. 0:03:35.737,0:03:44.769 I was a university student and I learned French at a normal course. 0:03:44.769,0:03:55.042 But I became interested in other languages, and I taught myself Esperanto online and through books, etc. 0:03:55.042,0:04:00.208 On my computer and online, it was very interesting, 0:04:00.208,0:04:08.292 and the Esperanto Wikipedia was founded by our dear Chuck. 0:04:08.292,0:04:20.947 At that time, we still had a messed up character set, well for Esperanto, which has accented characters. 0:04:23.716,0:04:31.000 That way it has an accent on the letters. 0:04:31.000,0:04:40.271 But, for it to be written on the webpage, you had to write using the "x system". 0:04:40.271,0:04:50.482 So, "cx" changes to "ĉ", etc. It looks really ugly. 0:04:50.482,0:05:07.479 To make it more beautiful and show it the way it should be, I added Unicode support. 0:05:07.479,0:05:18.542 Unicode is a system to encode characters for every language in the world: 0:05:18.542,0:05:28.273 from Egyptian hieroglyphics to modern Japanese and Korean as well as many symbols 0:05:28.273,0:05:35.375 in one system, which can include all of them. 0:05:35.375,0:05:44.793 So that, we don't need a separate Polish system for Eastern Europe, 0:05:44.793,0:05:48.667 and French for Western Europe, etc. 0:05:48.667,0:05:58.193 We can have just one system, one program, one website for every language. 0:05:58.193,0:06:12.875 With that worldwide system, it was started in computers already 20 years ago, 0:06:12.875,0:06:23.417 but in 2001 or 2002, when we founded Wikipedia, Unicode was still "new" online, 0:06:23.417,0:06:30.327 so it was difficult to use it in "American" programs. 0:06:30.327,0:06:42.208 You had to kind of study how UTF8 works to put Unicode in a web page. 0:06:42.208,0:06:52.851 But, I was able to study it a bit, and I added support to Wikipedia's original software. 0:06:52.851,0:07:04.208 I gave it a converter from the "sx" to the correct "ŝ", etc. 0:07:04.208,0:07:10.167 But I found that it's not just for Esperanto. 0:07:10.167,0:07:14.125 It can work for other languages as well. 0:07:14.125,0:07:21.504 For example Russian, Japanese and Polish can work with Unicode. 0:07:21.504,0:07:27.750 Unfortunately, it was a bit more complicated, 0:07:27.750,0:07:40.917 because at that time we also upgraded to new Wikipedia software, which was completely different from the original. 0:07:40.917,0:07:47.792 It was better, but it still didn't support Unicode. 0:07:47.792,0:07:56.822 Of course, it was created by Western Europeans and Americans, 0:07:56.822,0:08:04.583 and it didn't know there were other languages other than in Western Europe and North America 0:08:04.583,0:08:08.060 which have other letters. 0:08:08.060,0:08:16.875 So, that's why it was necessary to add Unicode support three times. 0:08:16.875,0:08:20.808 Originally for the Esperanto Wikipedia. 0:08:20.808,0:08:33.958 The second time for the new system, which was originally created for the English Wikipedia and didn't need Unicode. 0:08:33.958,0:08:43.833 And again when we completely changed the software to speed it up, 0:08:43.833,0:08:49.000 but then it was completed the third time. 0:08:49.000,0:08:59.333 In 2002 and 2003, we tried to start new Wikipedias in many languages. 0:08:59.333,0:09:12.583 We reacquired Polish and were able to better unite it with the other languages. 0:09:12.583,0:09:21.667 For example, one language can link to a page about the same thing in another language. 0:09:21.667,0:09:29.458 Now with the same system for everything, one can do that. 0:09:29.458,0:09:38.833 It's better to combine the groups in their own language. 0:09:38.833,0:09:45.750 Similarly, there were other problems for the languages in the program online. 0:09:45.750,0:09:53.667 It was somewhat problematic, that the traditional American programmers 0:09:53.667,0:10:06.157 and often even the Western Europeans created their own programs only in English. 0:10:06.157,0:10:15.750 It was a problem when someone didn't know English or didn't know it well 0:10:15.750,0:10:20.958 or just wanted to use a system in their own language. 0:10:20.958,0:10:31.750 Because of that, we also had to add a system to translate messages from the websites, 0:10:31.750,0:10:37.958 so everyone can understand it in their own language. 0:10:37.958,0:10:52.417 For example, we can see ... Article, Discussion, History, Delete 0:10:52.417,0:10:57.042 "Article", "Talk", "Edit", "History", only in English. 0:10:57.042,0:11:01.458 It's not very good, though generally one understands English. 0:11:01.458,0:11:23.375 So, we created a map between the messages and a short description about every message. 0:11:23.375,0:11:41.382 When we have something larger, long messages, and there are sentences and paragraphs, etc. 0:11:41.382,0:11:46.141 It's a bit more complicated than simple words. 0:11:46.141,0:11:54.385 That's why we give a name for every message. 0:11:54.385,0:12:10.384 In the program, it doesn't have an English sentence, it just has a name which is "login-message" or the like. 0:12:10.384,0:12:21.552 In a file with the map for each individual language, is the name and the message. 0:12:21.552,0:12:25.732 The message can be translated into every language. 0:12:25.732,0:12:36.000 Similar systems are used in many programs of various kinds, 0:12:36.000,0:12:50.080 but what is most different about the Wikipedia system, is that one can also change that message. 0:12:50.080,0:13:13.565 If I want to change that sentence a bit, so that my Wikipedia can have a standard or rule 0:13:13.565,0:13:24.000 how one writes an article or choose administrators, etc. 0:13:24.000,0:13:30.632 It can be different in the Wikipedia system. 0:13:30.632,0:13:36.344 One can ... I'm not logged in, so I can't ... 0:13:36.344,0:13:48.583 but the website administrators can use the wiki to change their own messages. 0:13:48.583,0:13:54.375 [Unfortunately then, my camera stopped working.]