1 00:00:00,630 --> 00:00:04,030 Welcome to CS 101. I'm Dave Evans. I'll be your guide on this journey. 2 00:00:04,030 --> 00:00:07,047 This course will introduce you to the fundamental ideas in computing 3 00:00:07,047 --> 00:00:09,563 and teach you to read and write your own computer programs. 4 00:00:09,563 --> 00:00:13,063 We're going to do all that in the context of building a Web search engine. 5 00:00:13,063 --> 00:00:16,363 I'm guessing everyone here has at least used a search engine before. 6 00:00:16,363 --> 00:00:19,562 The goal of the first three units in this course is to build a Web crawler. 7 00:00:19,562 --> 00:00:22,129 They will collect data from the Web for our search engine. 8 00:00:22,129 --> 00:00:24,663 And to learn about big ideas in Computing by doing that. 9 00:00:24,663 --> 00:00:29,680 In Unit 1, we'll get started by extracting the first link on a web page. 10 00:00:29,680 --> 00:00:32,730 A Web crawler finds web pages for our search engine 11 00:00:32,730 --> 00:00:37,797 by starting from a "seed" page and following links on that page to find other pages. 12 00:00:37,797 --> 00:00:43,930 Each of those links lead to some new web page, which itself could have links that lead to other pages. 13 00:00:43,930 --> 00:00:46,507 As we follow those links, we'll find more and more web pages 14 00:00:46,507 --> 00:00:50,232 building a collection of data that we'll use for our search engine. 15 00:00:50,479 --> 00:00:54,712 A web page is really just a chunk of text that comes from the Internet into your Web browser. 16 00:00:54,712 --> 00:00:56,580 We'll talk more about how that works in Unit 4. 17 00:00:56,580 --> 00:00:59,563 But for now, the important thing to understand is that 18 00:00:59,563 --> 00:01:02,497 a link is really just a special kind of text in that web page. 19 00:01:02,497 --> 00:01:07,347 When you clic on a link in your browser it will direct you to a new page. 20 00:01:07,347 --> 00:01:09,496 And you can keep following those links (...) 21 00:01:09,496 --> 00:01:14,213 What we'll do in this Unit is write a program to extract that first link from the web page. 22 00:01:14,213 --> 00:01:18,213 In later units, we'll figure out how to extract all the links and build their collection for our search engine