WEBVTT 00:00:00.630 --> 00:00:04.030 Welcome to CS 101. I'm Dave Evans. I'll be your guide on this journey. 00:00:04.030 --> 00:00:07.047 This course will introduce you to the fundamental ideas in computing 00:00:07.047 --> 00:00:09.563 and teach you to read and write your own computer programs. 00:00:09.563 --> 00:00:13.063 We're going to do all that in the context of building a Web search engine. 00:00:13.063 --> 00:00:16.363 I'm guessing everyone here has at least used a search engine before. 00:00:16.363 --> 00:00:19.562 The goal of the first three units in this course is to build a Web crawler. 00:00:19.562 --> 00:00:22.129 They will collect data from the Web for our search engine. 00:00:22.129 --> 00:00:24.663 And to learn about big ideas in Computing by doing that. 00:00:24.663 --> 00:00:29.680 In Unit 1, we'll get started by extracting the first link on a web page. 00:00:29.680 --> 00:00:32.730 A Web crawler finds web pages for our search engine 00:00:32.730 --> 00:00:37.797 by starting from a "seed" page and following links on that page to find other pages. 00:00:37.797 --> 00:00:43.930 Each of those links lead to some new web page, which itself could have links that lead to other pages. 00:00:43.930 --> 00:00:46.507 As we follow those links, we'll find more and more web pages 00:00:46.507 --> 00:00:50.232 building a collection of data that we'll use for our search engine. 00:00:50.479 --> 00:00:54.712 A web page is really just a chunk of text that comes from the Internet into your Web browser. 00:00:54.712 --> 00:00:56.580 We'll talk more about how that works in Unit 4. 00:00:56.580 --> 00:00:59.563 But for now, the important thing to understand is that 00:00:59.563 --> 00:01:02.497 a link is really just a special kind of text in that web page. 00:01:02.497 --> 00:01:07.347 When you clic on a link in your browser it will direct you to a new page. 00:01:07.347 --> 00:01:09.496 And you can keep following those links (...) 00:01:09.496 --> 00:01:14.213 What we'll do in this Unit is write a program to extract that first link from the web page. 00:01:14.213 --> 00:01:18.213 In later units, we'll figure out how to extract all the links and build their collection for our search engine