WEBVTT 00:00:02.029 --> 00:00:06.085 John DeCrey: Welcome to week 10. 00:00:06.085 --> 00:00:10.100 We're going to introduce and talk a little bit more about 00:00:10.100 --> 00:00:13.400 third party modules and 00:00:13.400 --> 00:00:15.800 what that means and how to install it. 00:00:15.800 --> 00:00:18.805 Then we've got several code longs 00:00:18.805 --> 00:00:21.620 coming up that I think will be fun. 00:00:21.620 --> 00:00:27.110 I'd like to introduce my guest. 00:00:27.110 --> 00:00:28.760 This is Jerick. 00:00:28.760 --> 00:00:34.020 We've worked together for several years now. 00:00:34.180 --> 00:00:38.014 When we first meet, was it 2014? 00:00:38.014 --> 00:00:40.800 Jerik: Yeah. That sounds right. 00:00:41.649 --> 00:00:46.200 John DeCrey: Anyway Jerick lives 00:00:46.200 --> 00:00:49.880 and breathes in Python and he's done a lot 00:00:49.880 --> 00:00:54.080 of his own personal Python scripts 00:00:54.080 --> 00:00:58.105 that I'll let him talk about and some of them are funny. 00:00:58.105 --> 00:01:04.435 Then he's an avid motorcyclist, 00:01:04.435 --> 00:01:07.455 and he'll talk a little bit about that too. 00:01:07.455 --> 00:01:10.900 We'll say, how's the saying go, 00:01:12.170 --> 00:01:17.219 was it bad decisions leads to good stories? 00:01:17.219 --> 00:01:18.414 Jerik: Yes. 00:01:18.414 --> 00:01:21.580 John DeCrey: Unfortunately I have a lot of stories. 00:01:21.580 --> 00:01:25.340 Anyway go ahead, Jerick. 00:01:26.199 --> 00:01:28.840 Jerik: Yes. Just to introduce myself, 00:01:28.840 --> 00:01:31.780 I've been working in the software industry 00:01:31.780 --> 00:01:36.410 for almost 12 years. 00:01:36.410 --> 00:01:39.690 I've mostly been in the QA side 00:01:39.690 --> 00:01:43.620 of things that's testing and finding bugs. 00:01:43.790 --> 00:01:48.210 A lot of my job though has been creating automated tests. 00:01:48.210 --> 00:01:52.990 A lot of that has been through Python 2. 00:01:53.420 --> 00:01:57.070 Our coat along exercises is going to go 00:01:57.070 --> 00:02:02.170 through some of the things I've learned doing that. 00:02:02.170 --> 00:02:05.710 I have some slides to go through at 00:02:05.710 --> 00:02:10.190 the beginning though, let me just start this. 00:02:11.180 --> 00:02:20.790 This is third party Python modules. I'll just read this. 00:02:20.790 --> 00:02:23.540 Modular programming refers to the process of breaking 00:02:23.540 --> 00:02:27.000 a large unwieldy programming tasks into separate, 00:02:27.000 --> 00:02:29.460 smaller, more manageable sub tasks or modules. 00:02:29.460 --> 00:02:31.220 Individual modules can then be 00:02:31.220 --> 00:02:33.160 coubbled together like building blocks to create 00:02:33.160 --> 00:02:37.970 a larger application. Just as an example. 00:02:37.970 --> 00:02:41.530 You guys are probably familiar with this when 00:02:41.530 --> 00:02:45.850 you do these import statements like this time. 00:02:45.850 --> 00:02:49.790 This is a module that's built into Python. 00:02:49.790 --> 00:02:54.170 It's a way that you can bring this into 00:02:54.170 --> 00:02:56.250 your program and utilize 00:02:56.250 --> 00:02:59.550 everything that's been built out around this module. 00:02:59.550 --> 00:03:02.330 It's just a way of not 00:03:02.330 --> 00:03:07.270 having to redo work that someone else has already done. 00:03:08.319 --> 00:03:12.230 John DeCrey: We have done imports throughout the course. 00:03:12.230 --> 00:03:16.270 The from is probably 00:03:16.270 --> 00:03:18.510 a little bit different than anyone has seen it. 00:03:18.510 --> 00:03:20.599 Can you talk about that a little bit? 00:03:20.599 --> 00:03:23.600 Jerik: Yeah. Let me go back to that. 00:03:25.490 --> 00:03:29.410 These firm statements are 00:03:29.410 --> 00:03:31.770 similar to the import statements, 00:03:31.770 --> 00:03:41.290 but it's digging more into the module itself. 00:03:42.469 --> 00:03:45.870 John DeCrey: Submitting a specific item from the import. 00:03:45.870 --> 00:03:55.450 Jerik: Yeah. Exactly. The advantages 00:03:55.450 --> 00:03:57.650 of modular programming are, 00:03:57.650 --> 00:04:01.410 that it's more simple, it's maintainable and reusable. 00:04:01.410 --> 00:04:05.070 When you import that time module, 00:04:05.070 --> 00:04:08.270 you don't have to do any maintenance on that. 00:04:09.230 --> 00:04:12.890 The people who run the Python project are going to 00:04:12.890 --> 00:04:16.350 be adding methods to that or fixing something. 00:04:16.350 --> 00:04:19.350 If we changed our time format, 00:04:19.350 --> 00:04:20.610 they would be the ones going 00:04:20.610 --> 00:04:22.490 and making sure that still worked 00:04:22.490 --> 00:04:27.750 with all the existing Python programs out there. 00:04:30.880 --> 00:04:35.560 Python, that time module is one that's built in. 00:04:35.560 --> 00:04:38.300 There's a bunch that Python has built in. 00:04:38.300 --> 00:04:42.600 Then there's also a bunch of third party modules. 00:04:44.120 --> 00:04:49.610 These are examples of ones that are built into Python. 00:04:49.610 --> 00:04:53.620 It's just a Python modules index. 00:04:57.650 --> 00:05:02.335 Then these are some common third party modules. 00:05:02.335 --> 00:05:03.880 They're guaranteed, 00:05:03.880 --> 00:05:05.900 not every one of them isn't listed here 00:05:05.900 --> 00:05:12.250 because I would guess there's hundreds of thousands. 00:05:12.340 --> 00:05:15.840 The thing I love about Python 00:05:15.840 --> 00:05:19.480 is any problem that you're looking to solve, 00:05:19.480 --> 00:05:21.640 someone on the Internet has probably 00:05:21.640 --> 00:05:23.900 solved it and created a module for it. 00:05:23.900 --> 00:05:26.565 You can just go and import that module, 00:05:26.565 --> 00:05:29.780 and get up and 00:05:29.780 --> 00:05:33.744 running a lot faster than having to do it yourself. 00:05:33.744 --> 00:05:35.820 John DeCrey: Chances are high. 00:05:35.820 --> 00:05:38.320 There's a module out there for it. 00:05:38.320 --> 00:05:40.480 The saying there's an app for 00:05:40.480 --> 00:05:43.379 that in Python, there's a module for that. 00:05:43.379 --> 00:05:49.710 Jerik: Exactly. You guys 00:05:49.710 --> 00:05:51.649 have probably already done this too. 00:05:51.649 --> 00:05:54.470 John DeCrey: We actually have not used pip yet. 00:05:54.470 --> 00:05:56.490 I'm trying to think, 00:05:56.490 --> 00:05:58.825 but I don't think that we have. 00:05:58.825 --> 00:06:02.204 As far as exiting installing. 00:06:02.204 --> 00:06:06.130 Jerik: To install a third party module. 00:06:06.190 --> 00:06:11.050 The syntax is pip install and then the package name. 00:06:11.050 --> 00:06:13.510 I'll show an example of that 00:06:13.510 --> 00:06:16.544 later on when we do the code along exercises. 00:06:16.544 --> 00:06:27.430 John DeCrey: We type that in at the Python console. 00:06:31.939 --> 00:06:36.610 Jerik: Now I want to just show some examples of 00:06:36.610 --> 00:06:38.430 different modules that I've used 00:06:38.430 --> 00:06:42.030 like throughout work and personal projects and stuff. 00:06:44.310 --> 00:06:47.390 This is only three lines of code, 00:06:47.390 --> 00:06:50.295 but it's going to do something really cool. 00:06:50.295 --> 00:06:54.450 This is a module called pytesseract. 00:06:58.450 --> 00:07:02.315 What it is it's image, 00:07:02.315 --> 00:07:04.850 it's a OCR engine. 00:07:04.850 --> 00:07:07.590 You can get a picture. 00:07:07.590 --> 00:07:09.505 This is just a picture of a receipt. 00:07:09.505 --> 00:07:15.050 You can get all the text data from it, 00:07:15.050 --> 00:07:17.020 which is crazy when 00:07:17.020 --> 00:07:20.430 three lines of code, and it can do that. 00:07:23.300 --> 00:07:28.980 The reason I needed this is I worked at a company 00:07:28.980 --> 00:07:34.490 where we would scrape the web for obituaries. 00:07:34.490 --> 00:07:39.100 Then we sold that the obituaries to 00:07:39.100 --> 00:07:46.940 a well known Utah based ancestry website. 00:07:48.960 --> 00:07:50.920 There's this website. 00:07:50.920 --> 00:07:52.359 John DeCrey: What company you're talking about? 00:07:52.359 --> 00:07:57.250 Jerik: Yeah. [LAUGHTER] There's 00:07:57.250 --> 00:07:59.305 this obituary website in Germany 00:07:59.305 --> 00:08:02.350 that for some reason on their web page 00:08:02.350 --> 00:08:07.315 instead of pasting the obituary like as text they had 00:08:07.315 --> 00:08:10.600 generated like a PNG or a picture file 00:08:10.600 --> 00:08:14.690 of the obituary and posted that on the website. 00:08:15.030 --> 00:08:20.410 We had our scraper go and get those images, 00:08:20.410 --> 00:08:24.040 and it would use pytesseract to get the text out of 00:08:24.040 --> 00:08:28.465 those images, to get the obituaries. 00:08:28.465 --> 00:08:31.390 But I can show this running. 00:08:31.390 --> 00:08:34.359 We have this picture. 00:08:34.359 --> 00:08:37.014 John DeCrey: That's a PNG file. 00:08:37.014 --> 00:08:40.930 Jerik: All this is going to do is open that 00:08:40.930 --> 00:08:44.500 and get the text from it and printed out. 00:08:44.500 --> 00:08:47.830 John DeCrey: We're invoking the image to string function, 00:08:47.830 --> 00:08:52.149 and then passing in the image to that function. 00:08:52.149 --> 00:08:55.150 Jerik: Yeah, so this image-to-string function is 00:08:55.150 --> 00:08:58.569 something that's built into this pytesseract module. 00:08:58.569 --> 00:09:00.230 John DeCrey: Perfect. 00:09:01.169 --> 00:09:03.009 Jerik: So running. 00:09:03.009 --> 00:09:06.640 John DeCrey: Well, look at that. Yeah, that's cool. 00:09:06.640 --> 00:09:09.010 That was a fuzzy picture too. 00:09:09.010 --> 00:09:13.330 It wasn't like, crisp, clean. 00:09:13.330 --> 00:09:14.995 Jerik: Yeah, it actually does pretty good. 00:09:14.995 --> 00:09:19.119 I was playing around with other images but. 00:09:19.119 --> 00:09:21.429 John DeCrey: Yeah. It did a good job. 00:09:21.429 --> 00:09:23.080 Jerik: To me, it was just crazy. 00:09:23.080 --> 00:09:27.490 Like that huge task of getting obituary data 00:09:27.490 --> 00:09:29.800 from picture files on the Internet 00:09:29.800 --> 00:09:32.994 was reduced to basically three lines of code. 00:09:32.994 --> 00:09:35.450 John DeCrey: Yeah, seriously. 00:09:36.089 --> 00:09:39.520 Jerik: It did not take me hours upon hours to do that. 00:09:39.520 --> 00:09:43.000 [LAUGHTER] [OVERLAPPING] 00:09:43.000 --> 00:09:45.220 John DeCrey: There's already, stuff out there, 00:09:45.220 --> 00:09:47.890 but if you wanted to design your own, like for example, 00:09:47.890 --> 00:09:52.135 you take a picture of receipts and have it parse out, 00:09:52.135 --> 00:09:54.550 saving it to your own data. 00:09:54.550 --> 00:09:56.409 There's just lots of. 00:09:56.409 --> 00:09:58.630 Jerik: Yeah, the sky's the limit. [LAUGHTER] 00:09:58.630 --> 00:10:01.015 John DeCrey: Of course stuff you can do with that. 00:10:01.015 --> 00:10:03.250 Actually, if I can talk a little bit 00:10:03.250 --> 00:10:05.440 about that for a minute. 00:10:05.440 --> 00:10:09.895 We did a project at the Uview Hospital. 00:10:09.895 --> 00:10:12.070 The hospital uses Epic, 00:10:12.070 --> 00:10:18.350 which is a huge EMR system, Electronic Medical Records. 00:10:23.430 --> 00:10:26.680 Every now and then, like nurses and 00:10:26.680 --> 00:10:28.090 the medical personnel that use 00:10:28.090 --> 00:10:29.650 it will get an error message. 00:10:29.650 --> 00:10:32.545 Sometimes the error, the screen, 00:10:32.545 --> 00:10:35.060 it's like it's a message box. 00:10:36.630 --> 00:10:41.140 They may not be sure what the recovery is or what 00:10:41.140 --> 00:10:42.790 the procedure is or process 00:10:42.790 --> 00:10:45.610 is when you encounter certain errors. 00:10:45.610 --> 00:10:48.700 [NOISE] We developed a thing 00:10:48.700 --> 00:10:53.200 where the medical personnel can take a picture from 00:10:53.200 --> 00:10:57.325 the screen of that error and 00:10:57.325 --> 00:10:59.530 it translates it to text and does 00:10:59.530 --> 00:11:01.960 a look-up from a database, 00:11:01.960 --> 00:11:04.000 finds the error code, 00:11:04.000 --> 00:11:05.680 the error number, and stuff, 00:11:05.680 --> 00:11:07.600 and then brings up the documentation right 00:11:07.600 --> 00:11:09.909 there on the handheld. 00:11:09.909 --> 00:11:11.439 Jerik: That's awesome. 00:11:11.439 --> 00:11:14.365 John DeCrey: Just give more information on the error 00:11:14.365 --> 00:11:20.755 and what best practices moving forward from that error. 00:11:20.755 --> 00:11:24.025 Lots of really great opportunities. 00:11:24.025 --> 00:11:25.090 Like you said, 00:11:25.090 --> 00:11:27.230 the sky is the limit there. 00:11:30.869 --> 00:11:35.485 Jerik: Here's another example. 00:11:35.485 --> 00:11:42.444 Where is it? This is another thing from that same job. 00:11:42.444 --> 00:11:44.319 John DeCrey: Other funeral homes? 00:11:44.319 --> 00:11:50.200 Jerik: Yeah. [LAUGHTER] We wanted to make sure that we had 00:11:50.200 --> 00:11:53.170 all the funeral home websites that 00:11:53.170 --> 00:11:57.440 has the obituary data in the US and Canada. 00:11:58.560 --> 00:12:01.540 This isn't the script itself, 00:12:01.540 --> 00:12:03.460 but the same idea. 00:12:03.460 --> 00:12:07.675 A script that would go on Google Maps, 00:12:07.675 --> 00:12:10.090 search for a funeral home in 00:12:10.090 --> 00:12:13.255 every zip code and 00:12:13.255 --> 00:12:15.460 return a list of all the funeral homes. 00:12:15.460 --> 00:12:17.665 And then I would go and add it to 00:12:17.665 --> 00:12:19.390 a master list of 00:12:19.390 --> 00:12:22.179 that would check for like duplicates and stuff. 00:12:22.179 --> 00:12:24.220 John DeCrey: Let's talk about that. 00:12:24.220 --> 00:12:26.680 Just those couple lie to code and what it's doing. 00:12:26.680 --> 00:12:29.575 We got on line four, 00:12:29.575 --> 00:12:34.570 so you got request and get and Google URL, 00:12:34.570 --> 00:12:38.935 along with an appended search string, 00:12:38.935 --> 00:12:42.985 which is how Google searches work. 00:12:42.985 --> 00:12:47.360 Line five, however, looks very cryptic. 00:12:47.420 --> 00:12:52.300 I know what that is. I do my best to avoid it. 00:12:52.300 --> 00:12:53.710 [LAUGHTER] 00:12:53.710 --> 00:12:54.850 Jerik: I probably knew what this meant at 00:12:54.850 --> 00:12:57.414 one point too. But I don't anymore. 00:12:57.414 --> 00:13:01.190 John DeCrey: This is called regular expressions. 00:13:03.300 --> 00:13:06.550 Whoever invented the syntax 00:13:06.550 --> 00:13:08.095 for regular [LAUGHTER] expressions, 00:13:08.095 --> 00:13:09.760 I don't know, should maybe 00:13:09.760 --> 00:13:11.290 go find something else to do instead. 00:13:11.290 --> 00:13:12.760 [LAUGHTER] 00:13:12.760 --> 00:13:14.420 Jerik: I agree. 00:13:14.789 --> 00:13:17.890 John DeCrey: I suppose if you spend 00:13:17.890 --> 00:13:20.680 enough time in it, it becomes memory. 00:13:20.680 --> 00:13:24.160 But anytime you have something that 00:13:24.160 --> 00:13:27.910 you have to have a guide and a cheat sheet and stuff, 00:13:27.910 --> 00:13:32.125 and ledgers and whatever you look up, 00:13:32.125 --> 00:13:35.120 I would say that it failed. 00:13:36.270 --> 00:13:41.500 That's regular expressions on 00:13:41.500 --> 00:13:47.120 Line 5 is what that stuff is in. Go ahead. 00:13:48.029 --> 00:13:50.635 Jerik: This is the third-party module 00:13:50.635 --> 00:13:52.345 that I'm using in this one. 00:13:52.345 --> 00:13:56.890 This request, it basically just gets 00:13:56.890 --> 00:14:01.855 the entire HTML from this request, 00:14:01.855 --> 00:14:12.459 and then this rejects URLs out of the HTML. 00:14:12.459 --> 00:14:15.130 John DeCrey: Just for clarification, 00:14:15.130 --> 00:14:16.765 that's just shorthand for 00:14:16.765 --> 00:14:18.550 regular expressions when you hear 00:14:18.550 --> 00:14:22.010 the rejects, same thing. 00:14:22.679 --> 00:14:25.960 Jerik: Then Google obviously has some of 00:14:25.960 --> 00:14:28.480 their own URLs in there, 00:14:28.480 --> 00:14:29.845 so I filter those out. 00:14:29.845 --> 00:14:31.930 I'll run this, you can see 00:14:31.930 --> 00:14:34.375 what [OVERLAPPING] it prints out. 00:14:34.375 --> 00:14:38.125 These are all what 00:14:38.125 --> 00:14:41.950 shows up when I search funeral homes in my zip code. 00:14:41.950 --> 00:14:44.560 [LAUGHTER] 00:14:44.560 --> 00:14:46.524 John DeCrey: That's right. Then you specified a zip code? 00:14:46.524 --> 00:14:49.570 Jerik: Yeah. When I ran this script, 00:14:49.570 --> 00:14:51.475 like for my job at that company, 00:14:51.475 --> 00:14:53.380 I think I just passed it in 00:14:53.380 --> 00:14:56.320 all the zip codes and just let it run overnight. 00:14:56.320 --> 00:15:00.950 [LAUGHTER] Then in the morning had a list of like 20,000. 00:15:01.079 --> 00:15:03.610 John DeCrey: In Line 4, 00:15:03.610 --> 00:15:04.915 I just noticed that you're 00:15:04.915 --> 00:15:07.810 converting your zip code to a string. 00:15:07.810 --> 00:15:11.570 Why wouldn't you just wrap it inside of a string? 00:15:11.970 --> 00:15:15.410 Is there a particular reason? 00:15:20.639 --> 00:15:25.330 Jerik: I probably copied this from my real script and 00:15:25.330 --> 00:15:26.980 I was probably passing in 00:15:26.980 --> 00:15:30.190 a variable of [OVERLAPPING] the zip codes there. 00:15:30.190 --> 00:15:33.460 I was just lazy 00:15:33.460 --> 00:15:35.560 when I brought it over to this example one. 00:15:35.560 --> 00:15:39.670 [LAUGHTER] 00:15:39.670 --> 00:15:41.440 John DeCrey: It's pretty cool that you can do that too. 00:15:41.440 --> 00:15:43.780 When that returns, it looks like 00:15:43.780 --> 00:15:47.660 the URLs that on Line 5, 00:15:47.660 --> 00:15:51.650 all that the results are coming into that URL's, 00:15:51.650 --> 00:15:54.134 that looks like a list. Is that right? 00:15:54.134 --> 00:15:55.114 Jerik: Yeah. 00:15:55.114 --> 00:15:56.900 John DeCrey: Okay. 00:15:57.209 --> 00:16:00.174 Jerik: I can even do this. 00:16:00.174 --> 00:16:02.320 John DeCrey: Oh, I see. Then in your fore loop, 00:16:02.320 --> 00:16:04.605 you're going through that list, 00:16:04.605 --> 00:16:06.100 and then also making sure 00:16:06.100 --> 00:16:09.040 that anything that's in the bad URLs is 00:16:09.040 --> 00:16:14.300 not in the URL's list. That's cool. 00:16:16.499 --> 00:16:18.560 Jerik: I have a breakpoint right here. 00:16:18.560 --> 00:16:21.800 Now, this res variable will have, 00:16:21.800 --> 00:16:27.540 this text is going to be the entire HTML from that, 00:16:27.540 --> 00:16:30.040 I don't know, there's a way to copy the whole thing. 00:16:30.040 --> 00:16:32.260 But anyway, it's like 00:16:32.260 --> 00:16:39.210 a big jumbling of HTML that gets returned. 00:16:39.879 --> 00:16:42.390 John DeCrey: Very cool. 00:16:43.819 --> 00:16:48.680 Jerik: This next one is a personal project that is dumb, 00:16:48.680 --> 00:16:51.200 but awesome too because 00:16:51.200 --> 00:16:53.510 it's been [OVERLAPPING] paying out. 00:16:53.510 --> 00:16:58.440 [LAUGHTER] I call this one the Being Rewarder. 00:16:58.440 --> 00:17:02.620 Being the search engine has this dumb program 00:17:02.620 --> 00:17:07.175 where it's called Being Rewards. 00:17:07.175 --> 00:17:08.880 It gives you these points 00:17:08.880 --> 00:17:12.500 every day for just searching and being. 00:17:12.500 --> 00:17:14.780 I think you can do like 20 searches 00:17:14.780 --> 00:17:17.230 on desktop and then like 15 00:17:17.230 --> 00:17:23.545 on mobile being and you get points for it. 00:17:23.545 --> 00:17:27.819 I've made this script that just searches being every day. 00:17:27.819 --> 00:17:31.524 John DeCrey: Then you can redeem the points for goods, right? 00:17:31.524 --> 00:17:35.440 Jerik: Yeah. You can get gift cards and stuff. 00:17:35.440 --> 00:17:40.740 I use it for like Xbox game pass on PC, 00:17:40.740 --> 00:17:43.889 which is like a game service. 00:17:43.889 --> 00:17:46.164 John DeCrey: What is that typically files? 00:17:46.164 --> 00:17:52.260 Jerik: I have no idea. [LAUGHTER] I 00:17:52.260 --> 00:17:55.080 bet it's like 20 bucks a month or 15 bucks a month, 00:17:55.080 --> 00:17:57.059 probably. Not too expensive. 00:17:57.059 --> 00:17:59.840 John DeCrey: I have to say more like 50 bucks a year. 00:17:59.840 --> 00:18:01.780 But if you're saying 25 [LAUGHTER] 00:18:01.780 --> 00:18:04.190 a month, that's pretty expensive. 00:18:04.349 --> 00:18:06.700 Jerik: This script has been working 00:18:06.700 --> 00:18:08.880 for like six years now. 00:18:08.880 --> 00:18:10.000 [OVERLAPPING] I thought it would 00:18:10.000 --> 00:18:11.499 have been shut down forever. 00:18:11.499 --> 00:18:12.680 John DeCrey: You've never paid for it. 00:18:12.680 --> 00:18:14.024 Because you get the being rewarded. 00:18:14.024 --> 00:18:15.330 Jerik: Yeah, exactly. 00:18:15.330 --> 00:18:20.740 [LAUGHTER] I'll talk through the code a little bit. 00:18:20.740 --> 00:18:24.560 These functions, actually, 00:18:24.560 --> 00:18:27.960 I'll run through it, and then I'll go through it. 00:18:28.110 --> 00:18:31.160 What this is going to do is just going to open, 00:18:31.160 --> 00:18:32.760 being in desktop mode, 00:18:32.760 --> 00:18:34.250 search it three times, 00:18:34.250 --> 00:18:36.740 and then open it in mobile mode, 00:18:36.740 --> 00:18:42.420 and search it three times. We'll run that. 00:18:51.570 --> 00:18:54.774 These are just random words. 00:18:54.774 --> 00:18:57.770 John DeGrey: From your generator there. 00:18:59.909 --> 00:19:01.900 Jerik: I haven't thought about this before. 00:19:01.900 --> 00:19:02.440 This is actually 00:19:02.440 --> 00:19:05.965 a good example to bring up. 00:19:05.965 --> 00:19:09.235 I have these generated words, 00:19:09.235 --> 00:19:12.850 but there was also another third party module 00:19:12.850 --> 00:19:17.150 that was for just entering in real. 00:19:17.280 --> 00:19:20.780 But from the dictionary words. 00:19:21.180 --> 00:19:23.320 I was using that for a while, 00:19:23.320 --> 00:19:28.525 but it had to get it from 00:19:28.525 --> 00:19:31.480 their online database or whatever and 00:19:31.480 --> 00:19:34.734 sometimes it didn't work and so I went back to this. 00:19:34.734 --> 00:19:36.069 John DeGrey: I see. 00:19:36.069 --> 00:19:38.425 Jerik: That's just something you have to 00:19:38.425 --> 00:19:41.140 deal with with third party modules. 00:19:41.140 --> 00:19:45.204 Sometimes they don't work the best, I guess. 00:19:45.204 --> 00:19:46.960 John DeGrey: Yeah, we saw it run 00:19:46.960 --> 00:19:49.540 three times for the desktop and three times on 00:19:49.540 --> 00:19:51.700 the mobile and you just 00:19:51.700 --> 00:19:54.579 run this automatically daily, right? 00:19:54.579 --> 00:19:56.064 Jerik: Yeah. 00:19:56.064 --> 00:19:58.460 John DeGrey: It's awesome. 00:20:02.909 --> 00:20:08.820 Jerik: I call this the KSL deal finder. It's a script. 00:20:08.820 --> 00:20:11.565 When you search for something you want, 00:20:11.565 --> 00:20:14.640 like dirt bikes for example, 00:20:14.640 --> 00:20:17.900 you have your list of search results. 00:20:17.900 --> 00:20:20.485 The top ad is going to be the newest. 00:20:20.485 --> 00:20:23.515 This script gets the top ad and then it keeps 00:20:23.515 --> 00:20:27.445 checking for new ads to show up. 00:20:27.445 --> 00:20:31.630 If a new ad is posted on my script, 00:20:31.630 --> 00:20:33.175 at least I send an email 00:20:33.175 --> 00:20:35.860 to myself that there was a new ad. 00:20:35.860 --> 00:20:42.220 It's a way to be the first person to message, 00:20:42.220 --> 00:20:44.649 the person who posts what they're selling. 00:20:44.649 --> 00:20:46.330 John DeGrey: Yeah, especially if it's a great deal 00:20:46.330 --> 00:20:47.980 and I love the story. 00:20:47.980 --> 00:20:49.900 If you wouldn't mind telling us 00:20:49.900 --> 00:20:54.470 the great deal you got on one of your motorcycles. 00:20:54.689 --> 00:20:58.975 Jerik: I was in the market for a new dirt bike 00:20:58.975 --> 00:21:04.000 and I had this script running and I was up in 00:21:04.000 --> 00:21:05.500 Heber and was that like 00:21:05.500 --> 00:21:07.870 eight o'clock at night and I was looking at 00:21:07.870 --> 00:21:09.940 a bike and didn't want it and 00:21:09.940 --> 00:21:13.854 then eight o'clock I get a email from the deal finders. 00:21:13.854 --> 00:21:14.844 John DeGrey: Eight o'clock at night? 00:21:14.844 --> 00:21:17.995 Jerik: Yes. Of this incredible deal. 00:21:17.995 --> 00:21:20.170 Like crazy deal. 00:21:20.170 --> 00:21:25.280 But it was down in Grand Junction, Colorado. 00:21:25.649 --> 00:21:29.079 John DeGrey: You're out of where? You're at? 00:21:29.079 --> 00:21:31.490 Jerik: Yeah, I'm in Orem. 00:21:32.010 --> 00:21:35.695 I think it was like a three or four hour drive. 00:21:35.695 --> 00:21:38.380 But anyways, I'm like, ding it now. 00:21:38.380 --> 00:21:41.605 I have to drive to Grand Junction, Colorado tonight. 00:21:41.605 --> 00:21:43.510 Yeah, I called the guy up. 00:21:43.510 --> 00:21:45.310 I'm like, I can drive down 00:21:45.310 --> 00:21:47.545 now but I won't be there till like 00:21:47.545 --> 00:21:50.260 midnight or 01:00 AM and he said that was 00:21:50.260 --> 00:21:53.350 fine and so I start driving down 00:21:53.350 --> 00:21:55.195 there and it was a blizzard 00:21:55.195 --> 00:21:59.740 in Spanish Fort Canyon of all places, too. 00:21:59.740 --> 00:22:04.194 I hate that road, but I almost turned back. 00:22:04.194 --> 00:22:05.320 John DeGrey: Oh, is that right. 00:22:05.320 --> 00:22:06.550 Jerik: Probably like three or four times? 00:22:06.550 --> 00:22:09.230 Yeah. I even pulled over once. 00:22:09.269 --> 00:22:11.034 John DeGrey: Oh, really? 00:22:11.034 --> 00:22:15.235 Jerik: Yeah, I almost just because the deal was so good, 00:22:15.235 --> 00:22:18.009 I thought it had to have been a scam. 00:22:18.009 --> 00:22:20.260 John DeGrey: Oh, yeah, that's through the whole way 00:22:20.260 --> 00:22:23.030 down you're thinking, please don't be a scam. 00:22:23.309 --> 00:22:28.750 Jerik: But yeah, I ended up picking up the bike. 00:22:28.750 --> 00:22:33.715 Drove back. Yeah, I still have that dirt bike. 00:22:33.715 --> 00:22:35.605 I could still sell it for 00:22:35.605 --> 00:22:37.659 1,000 more than I bought it for. 00:22:37.659 --> 00:22:38.679 John DeGrey: Wow. 00:22:38.679 --> 00:22:45.549 Jerik: It was one was that access data so over six years. 00:22:45.549 --> 00:22:50.005 John DeGrey: Wow, so because of your script, 00:22:50.005 --> 00:22:51.550 you got an alert to 00:22:51.550 --> 00:22:57.790 the ad immediately as soon as it became on it. 00:22:57.790 --> 00:23:00.940 Jerik: Yeah, so I was definitely the first caller. 00:23:00.940 --> 00:23:04.360 But I've had like buddies use this script 00:23:04.360 --> 00:23:08.120 too and find deals on stuff too. 00:23:11.129 --> 00:23:14.694 John DeGrey: Then you still went to work that same day? 00:23:14.694 --> 00:23:15.820 Jerik: Oh, yeah I got back 00:23:15.820 --> 00:23:18.640 at 04:00 AM and I still went to 00:23:18.640 --> 00:23:24.820 work and there's absolutely no way I could do that now. 00:23:24.820 --> 00:23:28.345 I think just getting over 30, 00:23:28.345 --> 00:23:31.839 I'm retired from doing stuff like that. 00:23:31.839 --> 00:23:34.494 John DeGrey: Until the next great deal comes along. 00:23:34.494 --> 00:23:36.609 Jerik: Oh yeah, probably. 00:23:36.609 --> 00:23:41.980 John DeGrey: By the way, 00:23:41.980 --> 00:23:44.140 for the listeners here, 00:23:44.140 --> 00:23:46.840 actually two things for anybody 00:23:46.840 --> 00:23:49.570 that's out of state not familiar with KSL, 00:23:49.570 --> 00:23:54.775 that KSL is a local station here in Utah 00:23:54.775 --> 00:24:01.150 and they have online classified listing. 00:24:01.150 --> 00:24:02.590 It is probably one of the most common 00:24:02.590 --> 00:24:04.795 and popular in the state. 00:24:04.795 --> 00:24:11.600 In fact, other people in other states use it frequently. 00:24:12.840 --> 00:24:18.100 It's not as famous as eBay. 00:24:18.100 --> 00:24:19.930 Well, it may be as an auction too 00:24:19.930 --> 00:24:23.650 but coniine ads and stuff. 00:24:25.230 --> 00:24:28.360 We're actually going to do one of 00:24:28.360 --> 00:24:31.900 our labs using the KSL deal finder. 00:24:31.900 --> 00:24:35.200 Then just want to do a caution 00:24:35.200 --> 00:24:39.490 because any time that 00:24:39.490 --> 00:24:42.460 this is also referred to as like screen scraping, 00:24:42.460 --> 00:24:44.680 anytime that you're doing that, 00:24:44.680 --> 00:24:48.610 you want to make sure that you don't flag yourself as 00:24:48.610 --> 00:24:50.440 being a bot that could 00:24:50.440 --> 00:24:52.990 potentially get banned from a service. 00:24:52.990 --> 00:24:55.195 Jerik, I think 00:24:55.195 --> 00:24:57.999 didn't you get banned at one time from KSL? 00:24:57.999 --> 00:25:00.459 Jerik: Yeah, they banned my IP. 00:25:00.459 --> 00:25:03.055 John DeGrey: Yeah, so they block your IP address. 00:25:03.055 --> 00:25:06.080 You can't use it anymore? 00:25:06.420 --> 00:25:09.490 I think we have the timer. 00:25:09.490 --> 00:25:12.685 This goes into a timer setting, 00:25:12.685 --> 00:25:16.960 loops and there's a sleep function 00:25:16.960 --> 00:25:18.520 that we can use to sleep for 00:25:18.520 --> 00:25:21.040 so many seconds and so he's got it set to 00:25:21.040 --> 00:25:24.129 15 which is I'm assuming is that 15 seconds? 00:25:24.129 --> 00:25:25.490 Jerik: Yeah. 00:25:25.949 --> 00:25:29.665 John DeGrey: That's probably well adequate. 00:25:29.665 --> 00:25:32.230 The thing is, you don't want to flag 00:25:32.230 --> 00:25:35.050 yourself and lower that down to maybe even 00:25:35.050 --> 00:25:41.200 like 10 or five seconds because chances are, 00:25:41.200 --> 00:25:43.300 a lot of these systems including KSL, 00:25:43.300 --> 00:25:45.820 they do have bot detectors and 00:25:45.820 --> 00:25:49.000 you'll definitely get yourself put on the radar. 00:25:49.000 --> 00:25:54.410 That's just a word of caution when using these scripts. 00:25:56.250 --> 00:25:58.779 Take it away Jarik. 00:25:58.779 --> 00:26:03.670 Jerik: I don't have any examples but there's 00:26:03.670 --> 00:26:05.575 a couple of other projects that 00:26:05.575 --> 00:26:09.025 I did that I want to talk about. 00:26:09.025 --> 00:26:11.425 There's one I called the info wall. 00:26:11.425 --> 00:26:14.170 It was me and my buddies. 00:26:14.170 --> 00:26:20.020 We did like a Bluetooth speaker start up thing that 00:26:20.020 --> 00:26:23.335 we sold it on Amazon 00:26:23.335 --> 00:26:26.320 and we wanted a pretty display 00:26:26.320 --> 00:26:29.810 that showed how many sales we got that day. 00:26:29.820 --> 00:26:32.860 I made a little script that would go 00:26:32.860 --> 00:26:36.520 and look at our Amazon portal 00:26:36.520 --> 00:26:39.010 and just get the sales for that day. 00:26:39.010 --> 00:26:40.780 Now that I think about it, there's probably 00:26:40.780 --> 00:26:43.660 an API I could have looked into to get that data, 00:26:43.660 --> 00:26:47.020 but this is what I'm used to so I did that. 00:26:47.020 --> 00:26:49.000 John DeGrey: It's fun to do your own anyway 00:26:49.000 --> 00:26:50.845 then you have more control of 00:26:50.845 --> 00:26:53.210 getting exactly what you want. 00:26:53.699 --> 00:26:59.440 Jerik: Unfortunately, the info wall 00:26:59.440 --> 00:27:00.790 didn't have very good numbers. 00:27:00.790 --> 00:27:02.140 We didn't sell very many speakers, 00:27:02.140 --> 00:27:05.180 so that was retired. 00:27:05.370 --> 00:27:07.750 Another project, 00:27:07.750 --> 00:27:10.195 this is probably the dumbest project I've done. 00:27:10.195 --> 00:27:13.510 I called it NBA bet buddy. 00:27:13.510 --> 00:27:18.470 I used to bet on NBA games a lot. 00:27:19.260 --> 00:27:24.310 I tried an experiment where I had made 00:27:24.310 --> 00:27:28.840 a script that every morning 00:27:28.840 --> 00:27:32.755 it would go and get the Vegas odds of games 00:27:32.755 --> 00:27:34.525 and then compare it against 00:27:34.525 --> 00:27:37.855 expert picks for games that night. 00:27:37.855 --> 00:27:40.570 Then anytime there was a mismatch between 00:27:40.570 --> 00:27:43.195 the odds and the expert picks, 00:27:43.195 --> 00:27:45.700 it would send those to me 00:27:45.700 --> 00:27:46.990 because those are supposedly supposed to 00:27:46.990 --> 00:27:49.105 be like good ones to bet on. 00:27:49.105 --> 00:27:51.715 End of the story, they were not 00:27:51.715 --> 00:27:55.524 so that project is retired too. 00:27:55.524 --> 00:27:57.684 John DeGrey: Still a lot of risk? 00:27:57.684 --> 00:28:03.980 Jerik: Yes. Vegas knows how to make their money still. 00:28:04.559 --> 00:28:07.254 John DeGrey: That's funny. 00:28:07.254 --> 00:28:10.990 Jerik: All right. We're good 00:28:10.990 --> 00:28:14.109 to start the code-along exercises now? 00:28:14.109 --> 00:28:16.705 John DeGrey: Yeah, let's jump into it. 00:28:16.705 --> 00:28:18.339 What's our first code-along. 00:28:18.339 --> 00:28:21.950 Jerik: This one is going to be a simple one. 00:28:23.340 --> 00:28:26.169 I'll show you what this is going to do. 00:28:26.169 --> 00:28:28.704 John DeGrey: We're going to do this ski report. 00:28:28.704 --> 00:28:32.350 Jerik: Yeah. What this script is going to do, 00:28:32.350 --> 00:28:35.080 it's going to go to this web page, 00:28:35.080 --> 00:28:38.290 SkiUtah.com snow report. 00:28:38.290 --> 00:28:40.720 This is a good day to do it. 00:28:40.720 --> 00:28:43.960 This is October 23. 00:28:43.960 --> 00:28:46.389 We just had some snow. 00:28:46.389 --> 00:28:48.310 John DeGrey: Actually got some snow last 24 hours. 00:28:48.310 --> 00:28:51.640 Jerik: We got some data to extract. 00:28:51.640 --> 00:28:53.920 This is what our script, it's just going 00:28:53.920 --> 00:28:55.960 to get for snowbird, 00:28:55.960 --> 00:28:58.450 the 24 hour snowfall, 00:28:58.450 --> 00:29:00.280 48 and then the base 00:29:00.280 --> 00:29:02.155 and since this is the first snowfall, 00:29:02.155 --> 00:29:04.090 these are going to be the same. 00:29:04.090 --> 00:29:06.320 But that is all right. 00:29:12.120 --> 00:29:15.590 Let me make a new file. 00:29:17.899 --> 00:29:21.534 John DeGrey: You're just creating a new game pip file? 00:29:21.534 --> 00:29:30.070 Jerik: Yeah. Then let's see. 00:29:30.070 --> 00:29:32.785 This is where we're going to use 00:29:32.785 --> 00:29:35.440 the pip install syntax 00:29:35.440 --> 00:29:37.855 to install the modules that we need. 00:29:37.855 --> 00:29:42.620 The first one we need is called selenium, 00:29:44.610 --> 00:29:49.670 so you just type pip install selenium into your terminal. 00:29:50.189 --> 00:29:52.990 John DeGrey: Most people, I don't know 00:29:52.990 --> 00:29:58.780 if anyone's using the IDE you're in, 00:29:58.780 --> 00:30:01.180 most of us is in PyCharm. 00:30:01.180 --> 00:30:04.610 If you open up the Python console, 00:30:05.940 --> 00:30:09.280 which is in PyCharm, 00:30:09.280 --> 00:30:11.005 down at the bottom you'll see 00:30:11.005 --> 00:30:15.250 a terminal and there will be Python console. 00:30:15.250 --> 00:30:17.575 You can also do it from the terminal, 00:30:17.575 --> 00:30:21.280 but you still have to be in Python. 00:30:21.280 --> 00:30:25.599 But you go and there's a PIP, install. 00:30:25.599 --> 00:30:30.490 Jerik: Yeah, then this 00:30:30.490 --> 00:30:34.285 will say that I already have it installed. 00:30:34.285 --> 00:30:36.549 But if you don't, it will install it. 00:30:36.549 --> 00:30:39.170 John DeGrey: See how do you spell that? 00:30:41.249 --> 00:30:44.510 Jerik: S-E-L-E-N-I-U-M. 00:30:49.529 --> 00:30:51.574 John DeGrey: Okay. 00:30:51.574 --> 00:30:53.775 John DeCrey: Mine says the same thing. 00:30:53.775 --> 00:30:55.990 Already set aside. 00:30:56.829 --> 00:31:01.355 Jerik: So Selenium. That's the module that 00:31:01.355 --> 00:31:06.400 you're using to hook to the web page, 00:31:06.400 --> 00:31:09.555 can navigate web pages with it like click on buttons 00:31:09.555 --> 00:31:12.075 or Enter and search 00:31:12.075 --> 00:31:14.715 bars like it did for the being rewarder, 00:31:14.715 --> 00:31:18.360 or you can also get text from it too, 00:31:18.360 --> 00:31:21.420 so that's what we're going to use for this script. 00:31:21.420 --> 00:31:24.690 We do need another module, 00:31:24.690 --> 00:31:27.255 this one's called WebDriverManager, 00:31:27.255 --> 00:31:34.150 so you do pip, install web driver-manager. 00:31:36.049 --> 00:31:39.855 John DeCrey: Web driver and then hyphen, 00:31:39.855 --> 00:31:41.654 is there a space? 00:31:41.654 --> 00:31:43.919 Jerik: No space, just dash manager. 00:31:43.919 --> 00:31:45.820 John DeCrey: Dash manager. 00:31:50.629 --> 00:31:55.170 Jerik: So when I was getting this lesson 00:31:55.170 --> 00:32:01.305 ready yesterday, I noticed that. 00:32:01.305 --> 00:32:03.750 When I was running the scripts, 00:32:03.750 --> 00:32:06.285 I had to add one more package, 00:32:06.285 --> 00:32:09.060 I don't know why, but just in case, 00:32:09.060 --> 00:32:11.805 I might as well install it, 00:32:11.805 --> 00:32:15.015 it's called packaging, 00:32:15.015 --> 00:32:17.830 so you do pip install packaging, 00:32:26.660 --> 00:32:30.580 and now we should be good to go. 00:32:32.269 --> 00:32:36.974 John DeCrey: Install packaging? 00:32:36.974 --> 00:32:41.230 Jerik: Yeah. 00:32:41.689 --> 00:32:45.880 John DeCrey: Requirement already satisfied. Cool. 00:32:47.479 --> 00:32:52.080 Jerik: Then for each of these Selenium scripts, 00:32:52.080 --> 00:32:54.000 I'm just going to copy 00:32:54.000 --> 00:32:57.130 some stuff that gets everything set up. 00:32:59.420 --> 00:33:01.470 I don't know if I should leave it on 00:33:01.470 --> 00:33:03.255 the screen for a while, John. 00:33:03.255 --> 00:33:05.444 I'm good to keep going. 00:33:05.444 --> 00:33:08.040 John DeCrey: Actually, if you wouldn't mind 00:33:08.040 --> 00:33:10.990 just bumping your font up a little bit, 00:33:10.990 --> 00:33:15.300 I've been using 18. 00:33:15.889 --> 00:33:17.610 Jerik: How is that? 00:33:17.610 --> 00:33:20.170 John DeCrey: There you go. That looks great. 00:33:21.740 --> 00:33:23.835 What are the instructions? 00:33:23.835 --> 00:33:24.975 Instructions are to just 00:33:24.975 --> 00:33:27.899 copy what you got there on the screen? 00:33:27.899 --> 00:33:30.779 Jerik: You can pause now and copy. 00:33:30.779 --> 00:33:33.620 John DeCrey: People watching this, 00:33:33.620 --> 00:33:35.990 you can just pause the video, 00:33:35.990 --> 00:33:38.795 go ahead and enter all that stuff 00:33:38.795 --> 00:33:43.980 in your script and then go ahead, Jack. 00:33:44.690 --> 00:33:47.294 Do you want to explain? 00:33:47.294 --> 00:33:51.520 Jerik: Yes, so I'll just explain some of these. 00:33:52.100 --> 00:33:57.330 This driver object it's 00:33:57.330 --> 00:33:59.550 basically the browser that 00:33:59.550 --> 00:34:01.275 we're going to be interacting with, 00:34:01.275 --> 00:34:04.665 so you can pass it in a bunch of arguments, 00:34:04.665 --> 00:34:06.660 one of them is headless, 00:34:06.660 --> 00:34:08.250 what headless does is it makes it 00:34:08.250 --> 00:34:10.930 so it doesn't bring up the browser. 00:34:11.030 --> 00:34:13.485 Makes the script run faster, 00:34:13.485 --> 00:34:16.799 because it times have to load images and stuff. 00:34:16.799 --> 00:34:19.515 John DeCrey: You are still running, 00:34:19.515 --> 00:34:20.880 but you don't see it. 00:34:20.880 --> 00:34:24.105 It's running as a back end process, 00:34:24.105 --> 00:34:26.860 so there's no user interfaces. 00:34:28.009 --> 00:34:30.285 Jerik: When I'm building scripts, 00:34:30.285 --> 00:34:32.310 I'll usually comment this out so I can see 00:34:32.310 --> 00:34:35.380 what's going on as it runs. 00:34:35.540 --> 00:34:41.190 Then this, I just had to add 00:34:41.190 --> 00:34:46.140 because I don't know if it's chrome or Selenium, 00:34:46.140 --> 00:34:50.580 just logs, or these two lines I 00:34:50.580 --> 00:34:53.040 guess make it so it doesn't spit 00:34:53.040 --> 00:34:58.030 out logs that look scary. 00:35:03.110 --> 00:35:07.815 This is all setting up this driver, 00:35:07.815 --> 00:35:09.975 which is the web driver 00:35:09.975 --> 00:35:12.509 or the browser that we're going to interact with. 00:35:12.509 --> 00:35:15.150 John DeCrey: The driver is what that's basically 00:35:15.150 --> 00:35:19.125 what's driving the screen scraping. 00:35:19.125 --> 00:35:22.780 That's what's loading the web browser thing, 00:35:26.269 --> 00:35:30.390 Jerik: To navigate to a web page using Selenium, 00:35:30.390 --> 00:35:33.280 you do driver.get, 00:35:35.870 --> 00:35:39.810 and then in this case 00:35:39.810 --> 00:35:42.700 we're going to go to this web page, 00:35:50.240 --> 00:35:56.025 and then there's better ways to do this, 00:35:56.025 --> 00:35:59.475 but this makes it a lot simpler. 00:35:59.475 --> 00:36:01.725 I'm just going to put a sleep there, 00:36:01.725 --> 00:36:03.345 so this is going to go to that web page 00:36:03.345 --> 00:36:04.845 and sleep for five seconds, 00:36:04.845 --> 00:36:07.545 this sleep is just going to make sure that 00:36:07.545 --> 00:36:11.560 everything's loaded before we try and do stuff with it. 00:36:13.539 --> 00:36:15.650 John DeCrey: Then five means we're 00:36:15.650 --> 00:36:18.000 sleeping it for five seconds. 00:36:21.949 --> 00:36:24.270 Jerik: The first thing we want to get is 00:36:24.270 --> 00:36:27.730 this 24 hour snowfall. 00:36:28.490 --> 00:36:37.770 To do that we can set it as a variable, 00:36:37.770 --> 00:36:41.580 so I'll just do our 24 equals, 00:36:41.580 --> 00:36:45.840 and then you do everything through the driver. 00:36:45.840 --> 00:36:48.670 driver.find_element. 00:36:53.870 --> 00:36:56.070 All these things on the page, 00:36:56.070 --> 00:36:58.390 they're called elements, 00:36:59.990 --> 00:37:05.430 and then with Selenium, 00:37:05.430 --> 00:37:07.170 there's different ways you can find 00:37:07.170 --> 00:37:09.390 or you can hook to the elements, 00:37:09.390 --> 00:37:12.270 CSS selectors is a really popular one, 00:37:12.270 --> 00:37:16.305 but I've used different ones like x path, 00:37:16.305 --> 00:37:18.060 I've had to use a few times, 00:37:18.060 --> 00:37:23.200 it just depends on what works are not, 00:37:23.630 --> 00:37:27.370 so we'll do CSS sector. 00:37:30.680 --> 00:37:39.090 Then just a quick overview on CSS selectors, 00:37:39.090 --> 00:37:41.505 so this is the page source, 00:37:41.505 --> 00:37:45.015 over on the right, you can click this little button, 00:37:45.015 --> 00:37:46.649 it's called the element picker. 00:37:46.649 --> 00:37:50.160 John DeCrey: How did you tell [OVERLAPPING]? 00:37:50.160 --> 00:37:50.624 Jerik: Sorry. 00:37:50.624 --> 00:37:52.049 John DeCrey: How you even got there? 00:37:52.049 --> 00:37:53.700 Jerik: So when you have chrome open, 00:37:53.700 --> 00:37:56.805 you can press F12 and we'll bring this up, 00:37:56.805 --> 00:37:58.289 it's called the DevTools. 00:37:58.289 --> 00:38:00.089 John DeCrey: And for mac users? 00:38:00.089 --> 00:38:03.854 Jerik: I think it's F12 on Mac 2, isn't it? 00:38:03.854 --> 00:38:06.090 John DeCrey: There's a menu option too 00:38:06.090 --> 00:38:11.440 from the Chrome menu. 00:38:11.479 --> 00:38:13.320 Jerik: Inspect, I think. 00:38:13.320 --> 00:38:14.890 John DeCrey: Inspect. 00:38:18.679 --> 00:38:23.519 Jerik: Excuse me. So this element picker. 00:38:23.519 --> 00:38:25.950 John DeCrey: So you click that little icon 00:38:25.950 --> 00:38:27.179 in the far left? 00:38:27.179 --> 00:38:27.944 Jerik: Yeah. 00:38:27.944 --> 00:38:30.300 John DeCrey: That's cool. Then you can hover over 00:38:30.300 --> 00:38:33.539 the web page and it goes right into the elements? 00:38:33.539 --> 00:38:38.310 Jerik: Yeah. For this exercise, 00:38:38.310 --> 00:38:40.110 I'm going to show you a really easy way to 00:38:40.110 --> 00:38:42.645 get the selector for whatever element you want. 00:38:42.645 --> 00:38:45.480 You click on what you 00:38:45.480 --> 00:38:48.660 want and it highlights it here in the source, 00:38:48.660 --> 00:38:53.190 and then you can right click on this and do copy, 00:38:53.190 --> 00:38:56.565 select that and that copies the CSS selector. 00:38:56.565 --> 00:39:01.889 You can just paste that in there. 00:39:01.889 --> 00:39:04.240 John DeCrey: That's really cool. 00:39:11.179 --> 00:39:15.735 Jerik: So if we left it like this and ran it, 00:39:15.735 --> 00:39:21.105 this hour 24 variable would be the Selenium element, 00:39:21.105 --> 00:39:24.480 but we need to get the text from that element, 00:39:24.480 --> 00:39:26.680 so to do that, 00:39:28.550 --> 00:39:32.760 we're going to use.getattribute, 00:39:32.760 --> 00:39:38.740 and then here you do inner HTML. 00:39:58.909 --> 00:40:01.140 John DeCrey: And by the way, for anybody 00:40:01.140 --> 00:40:03.240 that might be having problems 00:40:03.240 --> 00:40:08.715 getting that whole CSS selector stuff, 00:40:08.715 --> 00:40:11.040 I do have it posted in the assignment 00:40:11.040 --> 00:40:14.025 in Canvas that you can copy. 00:40:14.025 --> 00:40:17.130 I encourage you to try to follow 00:40:17.130 --> 00:40:19.740 Jerrek getting it live from the website, 00:40:19.740 --> 00:40:21.675 but just as a backup plan, 00:40:21.675 --> 00:40:25.630 you can get it from the Canvas page too. 00:40:27.799 --> 00:40:33.310 Jerik: Next we want 48 hour total. 00:40:36.440 --> 00:40:38.880 This is going to be really similar 00:40:38.880 --> 00:40:41.040 just a different selector, 00:40:41.040 --> 00:40:47.770 so driver.find_element by CSS selector, 00:40:47.930 --> 00:40:52.590 and I'm just going to add the get attribute, 00:40:52.590 --> 00:40:54.880 inner HTML here, 00:40:57.560 --> 00:41:02.040 then do the same thing to get 00:41:02.040 --> 00:41:06.165 that selector, click on that, 00:41:06.165 --> 00:41:10.840 right click, copy selector, 00:41:14.570 --> 00:41:17.980 and put that in there. 00:41:26.089 --> 00:41:28.590 John DeCrey: Do it once for the 24 hour 00:41:28.590 --> 00:41:30.599 and then twice for the 48. 00:41:30.599 --> 00:41:34.990 Jerik: Then, same thing for the base. 00:42:08.260 --> 00:42:13.265 Now this should have those values there, the 19. 00:42:13.265 --> 00:42:16.580 Now we just want to print it out so we can make 00:42:16.580 --> 00:42:21.600 a report variable and build out a report. 00:42:22.450 --> 00:42:28.260 Let's see, 24-hour snowfall. 00:43:00.640 --> 00:43:04.170 Then the base, 00:43:12.910 --> 00:43:17.600 and then we can print out the report. 00:43:28.109 --> 00:43:33.690 John DeCrey: Just to explain like line 25 00:43:33.690 --> 00:43:40.340 using catenation and like the slash being there. 00:43:40.340 --> 00:43:42.335 We have talked about that. 00:43:42.335 --> 00:43:45.680 Another kind of cool away we could do it would be using 00:43:45.680 --> 00:43:48.980 the string tripolation and 00:43:48.980 --> 00:43:56.730 then the triple quotes rather to get the literal. 00:43:57.640 --> 00:44:02.160 Anyway, that's all that's doing. 00:44:02.649 --> 00:44:04.940 Jerik: I'm going to try this now. 00:44:04.940 --> 00:44:10.310 I am running it, this 00:44:10.310 --> 00:44:11.660 is going to be a boring one because it's 00:44:11.660 --> 00:44:13.425 just bringing up this web page. 00:44:13.425 --> 00:44:17.140 I noticed even though that box is there, 00:44:17.140 --> 00:44:19.030 the page in the background is still 00:44:19.030 --> 00:44:21.564 loaded everything. It doesn't matter. 00:44:21.564 --> 00:44:23.630 John DeCrey: That's cool. 00:44:23.769 --> 00:44:29.225 Jerik: If we did need to click on things for our script, 00:44:29.225 --> 00:44:32.160 I bet we would have had to get rid of that. 00:44:32.440 --> 00:44:36.485 This anticipated opening dates thing. 00:44:36.485 --> 00:44:38.420 That's just things that you have to deal 00:44:38.420 --> 00:44:42.030 with writing scripts like this. 00:44:42.130 --> 00:44:45.169 That is that script all done. 00:44:45.169 --> 00:44:49.370 John DeCrey: I was just 00:44:49.370 --> 00:44:50.810 going to say you can change it to 00:44:50.810 --> 00:44:54.199 headless so we can see the difference there. 00:44:54.199 --> 00:44:56.180 Jerik: Yeah, it's just not going to bring up 00:44:56.180 --> 00:44:57.620 the browser window this time 00:44:57.620 --> 00:45:01.530 and it should just post the data there. 00:45:04.389 --> 00:45:06.900 John DeCrey: There we go. 00:45:14.620 --> 00:45:16.460 This is 00:45:16.460 --> 00:45:18.575 one of the card along assignments. 00:45:18.575 --> 00:45:20.780 If you get it running and you're good with 00:45:20.780 --> 00:45:23.375 this before we move on, 00:45:23.375 --> 00:45:26.690 you can submit your script in the canvas, 00:45:26.690 --> 00:45:30.390 and then, we can move on to the next one. 00:45:33.009 --> 00:45:41.000 Jerik: The next one. Where is 00:45:41.000 --> 00:45:44.430 the deals page? That was weird. 00:45:45.610 --> 00:45:49.370 This one is going to go to the Amazon Deal of 00:45:49.370 --> 00:45:52.910 the Day page and this script is going to 00:45:52.910 --> 00:46:02.150 get these deal titles along with the link to the deal, 00:46:02.150 --> 00:46:05.150 and it's going to just print them out, 00:46:05.150 --> 00:46:07.070 or we're going to save them 00:46:07.070 --> 00:46:10.110 as a dictionary and then print them out. 00:46:23.320 --> 00:46:26.940 Same thing with this one. 00:46:27.160 --> 00:46:31.175 You can copy it over from the last script. 00:46:31.175 --> 00:46:36.310 This is just stuff to get everything set up and 00:46:36.310 --> 00:46:43.640 then we're going to 00:46:43.640 --> 00:46:47.340 do driver.get to go to that page. 00:46:52.240 --> 00:46:55.580 Instead of using this URL, 00:46:55.580 --> 00:46:58.865 I'm just going to use this. 00:46:58.865 --> 00:47:02.760 This seems like it will work in the future better. 00:47:05.530 --> 00:47:10.350 It's Amazon.com/deal. 00:47:12.970 --> 00:47:15.200 Then same thing. 00:47:15.200 --> 00:47:17.060 We're going to let it sleep for 00:47:17.060 --> 00:47:21.690 five seconds to make sure everything is loaded. 00:47:27.250 --> 00:47:32.480 This script is going to be a little more 00:47:32.480 --> 00:47:37.370 complicated and here's the thought process 00:47:37.370 --> 00:47:40.170 to why we need to do it this way. 00:47:40.630 --> 00:47:44.360 For this one or for the last script, 00:47:44.360 --> 00:47:47.870 we just got a single element from 00:47:47.870 --> 00:47:52.940 Selenium using that fine-by CSS thing. 00:47:52.940 --> 00:47:56.540 But this we're going to want to get every single one 00:47:56.540 --> 00:47:59.510 of these titles and links. 00:47:59.510 --> 00:48:08.270 I kind of see here in the source how these are set up. 00:48:08.270 --> 00:48:11.690 They're each in their own little div. 00:48:11.690 --> 00:48:17.285 We're going to find a selector that gets all the divs. 00:48:17.285 --> 00:48:24.125 If I click here and then look in the source code, 00:48:24.125 --> 00:48:26.030 you can hover over these, 00:48:26.030 --> 00:48:34.970 and what I'm looking for now is something that is unique, 00:48:34.970 --> 00:48:39.300 that I can hook to with the CSS selector. 00:48:51.490 --> 00:48:53.930 Sorry, there's a lot here. I want to make sure I 00:48:53.930 --> 00:48:56.880 find the right one. 00:49:04.150 --> 00:49:06.665 I'm going to find a selector that finds 00:49:06.665 --> 00:49:08.570 this div that's highlighted, 00:49:08.570 --> 00:49:13.400 and then we can 00:49:13.400 --> 00:49:17.250 drill down and get the title and links from that. 00:49:19.450 --> 00:49:26.340 We'll make a variable called deal divs, 00:49:27.790 --> 00:49:31.470 be similar to the last one. 00:49:32.890 --> 00:49:35.660 This time though is really 00:49:35.660 --> 00:49:38.300 important and in the last classes, 00:49:38.300 --> 00:49:40.055 a few people always miss this. 00:49:40.055 --> 00:49:42.290 You want to make sure this one is plural. 00:49:42.290 --> 00:49:44.224 Let's find elements. 00:49:44.224 --> 00:49:51.035 John DeCrey: Element with an S. Easy to get confused. 00:49:51.035 --> 00:49:54.110 If at the end 00:49:54.110 --> 00:49:56.690 have some issues it's not running as expected, 00:49:56.690 --> 00:49:58.130 go back to this line, 00:49:58.130 --> 00:49:59.750 to the line that you have, and make sure 00:49:59.750 --> 00:50:02.400 that it's elements with S.element. 00:50:09.639 --> 00:50:13.565 Jerik: Going back to this CSS, 00:50:13.565 --> 00:50:16.200 I can search for this class. 00:50:16.630 --> 00:50:18.845 I'll show you another cool thing. 00:50:18.845 --> 00:50:21.005 When you have Chrome dev tools open, 00:50:21.005 --> 00:50:23.450 you can press Control F that brings up 00:50:23.450 --> 00:50:27.650 this search bar that you can search for selectors. 00:50:27.650 --> 00:50:29.555 I use this a lot to 00:50:29.555 --> 00:50:34.910 test selectors to make sure I'm getting the right thing. 00:50:34.910 --> 00:50:37.730 I'm just going to type in the selector I 00:50:37.730 --> 00:50:40.710 am thinking I want to use here. 00:50:45.820 --> 00:50:50.040 These selectors will be posted to, right, John? 00:50:50.079 --> 00:50:53.270 John DeCrey: We can post it. Your mouse cursor is 00:50:53.270 --> 00:50:56.790 kind of covering what you're typing there. 00:50:57.779 --> 00:51:02.365 Jerik: This isn't too important to the lessons. 00:51:02.365 --> 00:51:05.665 Just in case you're interested on 00:51:05.665 --> 00:51:10.070 like my thought process of finding selectors. 00:51:11.740 --> 00:51:17.480 Then it's that deal grid item module. 00:51:17.480 --> 00:51:21.575 This star equals is searching for a wild card, 00:51:21.575 --> 00:51:25.110 so it's going to do any class containing this. 00:51:31.920 --> 00:51:34.975 We can see that it's 00:51:34.975 --> 00:51:40.670 pining these divs, which is good. 00:51:41.940 --> 00:51:49.075 Then this found them all. 00:51:49.075 --> 00:51:52.630 I'm going to go one more down though. 00:51:52.630 --> 00:51:57.070 To do that, let's do read of n, 00:51:57.070 --> 00:52:03.170 and then div then it's going down one more level. 00:52:03.210 --> 00:52:06.790 The reason I like this search thing to test 00:52:06.790 --> 00:52:10.759 selectors is now I can go and make sure. 00:52:10.759 --> 00:52:14.830 John DeCrey: Narrow it down and confirm 00:52:14.830 --> 00:52:17.090 that you got the right one. 00:52:18.899 --> 00:52:21.385 Jerik: Now when this runs, 00:52:21.385 --> 00:52:27.700 instead of finding one of these, 00:52:27.700 --> 00:52:32.930 this is setting all of these to that variable. 00:52:35.820 --> 00:52:38.530 So that is that. 00:52:38.530 --> 00:52:41.350 Now we'll make an empty list for 00:52:41.350 --> 00:52:51.770 the titles and an empty list for the links too. 00:52:54.000 --> 00:53:01.615 Then we can iterate through 00:53:01.615 --> 00:53:06.190 the list of those div elements and get 00:53:06.190 --> 00:53:10.930 out what we need and append it to these lists. 00:53:10.930 --> 00:53:19.040 So to do that, we'll just do for deal-div in deal_div. 00:53:21.659 --> 00:53:23.920 John DeCrey: Just pay attention 00:53:23.920 --> 00:53:26.245 on one that is without the S, 00:53:26.245 --> 00:53:27.880 the one right after the for 00:53:27.880 --> 00:53:29.650 loop is your variable in the fourth in 00:53:29.650 --> 00:53:34.250 that loop just to not get confused there. 00:53:39.419 --> 00:53:47.275 Jerik: We'll go deal titles. 00:53:47.275 --> 00:53:54.840 I wanted to make that [inaudible]. 00:53:54.840 --> 00:53:58.210 Deal titles, append. 00:54:03.910 --> 00:54:08.710 This deal_div now when it gets to this point is going to 00:54:08.710 --> 00:54:12.955 be a single div. 00:54:12.955 --> 00:54:14.410 So the first time it iterates through it, 00:54:14.410 --> 00:54:18.020 it's going to be this first one. 00:54:21.360 --> 00:54:24.505 I'm going to expand this because we want 00:54:24.505 --> 00:54:30.470 to get the title. 00:54:43.530 --> 00:54:46.760 Let me do this. 00:54:49.380 --> 00:54:51.640 This text right here. 00:54:51.640 --> 00:54:54.145 These smart DIY and home tools, 00:54:54.145 --> 00:54:56.245 that's what we're trying to get. 00:54:56.245 --> 00:55:01.075 This class looks like something we could hook to. 00:55:01.075 --> 00:55:03.640 We could do another wild card to search so we don't have 00:55:03.640 --> 00:55:07.280 to get this random. 00:55:08.370 --> 00:55:15.445 We'll do deal_div.find element. 00:55:15.445 --> 00:55:16.690 This is going to be single 00:55:16.690 --> 00:55:20.829 now because we find in the single element now. 00:55:20.829 --> 00:55:22.990 John DeCrey: Yes. Make sure it's find 00:55:22.990 --> 00:55:25.314 element, not elements. 00:55:25.314 --> 00:55:26.410 Jerik: Yes. 00:55:26.410 --> 00:55:47.965 Then okay, 00:55:47.965 --> 00:55:56.230 so from what was that first one? 00:55:56.230 --> 00:56:00.115 We found this one and then we drill down to that. 00:56:00.115 --> 00:56:07.250 So from there we need to go down and a tag, 00:56:12.030 --> 00:56:15.710 so we need to go to this a tag. 00:56:20.520 --> 00:56:28.790 So to do that, we'll do a.a-text-normal. 00:56:31.100 --> 00:56:38.815 That will bring us down into this class. 00:56:38.815 --> 00:56:40.690 Then we need to go down one 00:56:40.690 --> 00:56:45.140 more into this deal content module thing. 00:56:49.710 --> 00:56:52.810 We can do a wildcard search on that just 00:56:52.810 --> 00:56:57.560 to make sure we're getting the right thing. 00:57:18.930 --> 00:57:26.230 Then before that last closing parenthese do.text, 00:57:26.230 --> 00:57:28.970 this will get the text from it. 00:57:30.600 --> 00:57:34.075 This.text is actually doing the same thing 00:57:34.075 --> 00:57:38.800 as the get.attribute interHTML. 00:57:38.800 --> 00:57:40.540 For some reason, text would 00:57:40.540 --> 00:57:42.595 not work on this website though. 00:57:42.595 --> 00:57:44.725 I have no idea why. 00:57:44.725 --> 00:57:48.025 I will do a breakpoint and 00:57:48.025 --> 00:57:51.445 this hour 24 would have the text in there, 00:57:51.445 --> 00:57:52.915 but for some reason when I would 00:57:52.915 --> 00:57:55.165 do hour 24, that text it wouldn't work. 00:57:55.165 --> 00:57:57.320 I have no idea why. 00:58:03.600 --> 00:58:06.010 Then now 00:58:06.010 --> 00:58:09.910 we just need to get the link deal, 00:58:09.910 --> 00:58:14.290 links start, append, 00:58:14.290 --> 00:58:16.780 and then using 00:58:16.780 --> 00:58:23.695 that same deal_div find_element, singular. 00:58:23.695 --> 00:58:40.195 Then this one is right here. 00:58:40.195 --> 00:58:42.340 So we just need to drill down once 00:58:42.340 --> 00:58:50.990 from that original div. 00:58:52.620 --> 00:59:02.530 Then this one we're 00:59:02.530 --> 00:59:04.750 going to do get attribute. 00:59:04.750 --> 00:59:14.600 The attribute we want is this HF that has the link to it. 00:59:22.050 --> 00:59:31.180 That's that. This is 00:59:31.180 --> 00:59:34.195 going to be a dictionary in the end. 00:59:34.195 --> 00:59:40.610 Title with links equals. 00:59:40.980 --> 00:59:48.955 This dictionary zip thing is just something I found that, 00:59:48.955 --> 00:59:51.025 where you can take two lists and 00:59:51.025 --> 00:59:54.530 this merges them into a dictionary, 00:59:56.370 --> 01:00:05.830 passing the deal_titles and deal_links, 01:00:05.830 --> 01:00:14.349 and then we can print the title with the links. 01:00:14.349 --> 01:00:16.555 John DeCrey: So ultimately that 01:00:16.555 --> 01:00:19.539 becomes a dictionary title? 01:00:19.539 --> 01:00:21.670 Jerik: Yep. This title with links is 01:00:21.670 --> 01:00:24.680 a dictionary that has both of those. 01:00:39.630 --> 01:00:41.529 There it is. 01:00:41.529 --> 01:00:42.744 John DeCrey: Hey, look at that. 01:00:42.744 --> 01:00:44.810 Jerik: Sweet. 01:00:56.469 --> 01:00:59.135 John DeCrey: See, can I get the price too? 01:00:59.135 --> 01:01:03.420 Was that in there? 01:01:06.549 --> 01:01:09.470 Jerik: They used to have like one item per price, 01:01:09.470 --> 01:01:12.230 but now they have these 01:01:12.230 --> 01:01:14.044 are going to be a bunch of deets in there. 01:01:14.044 --> 01:01:16.114 John DeCrey: Fifteen percent off. 01:01:16.114 --> 01:01:17.269 Jerik: Yes. 01:01:17.269 --> 01:01:18.960 John DeCrey: Interesting. 01:01:20.139 --> 01:01:23.610 Jerik: Yes, there is that script. 01:01:25.480 --> 01:01:29.700 Should we go ahead and move on to the KSL one? 01:01:30.309 --> 01:01:33.420 John DeCrey: Let me talk about this for a minute. 01:01:36.400 --> 01:01:39.200 There's actually business models that 01:01:39.200 --> 01:01:41.675 are around this thing. 01:01:41.675 --> 01:01:45.410 There's companies that have that screen 01:01:45.410 --> 01:01:49.340 scrap Amazon's site and showing the trends of 01:01:49.340 --> 01:01:56.975 pricing over time because sometimes you can look at 01:01:56.975 --> 01:02:01.340 certain products that might be cheaper 01:02:01.340 --> 01:02:06.665 in some time part of the year than others. 01:02:06.665 --> 01:02:10.640 Where you can maybe make a wise, 01:02:10.640 --> 01:02:12.860 maybe a smarter decision on when to buy something, 01:02:12.860 --> 01:02:14.750 especially on the cost. 01:02:14.750 --> 01:02:20.790 But even like business opportunities are there as well. 01:02:20.790 --> 01:02:26.450 I do want to also caution a little bit with ethics 01:02:26.450 --> 01:02:30.530 because with knowledge is 01:02:30.530 --> 01:02:34.130 power and it comes responsibility. 01:02:34.130 --> 01:02:37.085 There is the ethics of things, 01:02:37.085 --> 01:02:40.340 especially screen scraping and 01:02:40.340 --> 01:02:42.920 other people's content and stuff like that, 01:02:42.920 --> 01:02:45.020 so just something else to be aware 01:02:45.020 --> 01:02:49.895 and consider on certain projects. 01:02:49.895 --> 01:02:53.760 But I think that's all I have to say about that. 01:02:54.310 --> 01:02:57.635 Actually at one time I did have 01:02:57.635 --> 01:03:02.720 some code fragment to a pen to this to take 01:03:02.720 --> 01:03:06.575 all the findings and put it into a vocal database like 01:03:06.575 --> 01:03:12.125 SQL-lite but I think we'll skip that for now. 01:03:12.125 --> 01:03:16.400 But we could easily extend this script and save 01:03:16.400 --> 01:03:21.840 it to a database or even to a CSV file or something. 01:03:22.300 --> 01:03:26.360 That's pretty much for all the scripts that we're giving. 01:03:26.360 --> 01:03:29.520 We could easily save off. 01:03:30.490 --> 01:03:33.634 You want to move on to the next one? 01:03:33.634 --> 01:03:34.704 Jerik: Yes. 01:03:34.704 --> 01:03:36.200 John DeCrey: Go ahead and if you 01:03:36.200 --> 01:03:37.820 want to submit this one into 01:03:37.820 --> 01:03:43.770 canvas in the Amazon I believe it's just called, 01:03:45.520 --> 01:03:48.155 Amazon Web Scrape Lab, 01:03:48.155 --> 01:03:51.350 so you can submit that into that assignment in Canvas and 01:03:51.350 --> 01:03:55.350 then we'll go to the next one. 01:03:57.249 --> 01:04:01.770 Jerik: This one is 01:04:09.250 --> 01:04:12.560 the one my testing I 01:04:12.560 --> 01:04:15.230 found that furniture gets posted the most often. 01:04:15.230 --> 01:04:19.580 [NOISE] What this is 01:04:19.580 --> 01:04:23.540 going to do is it's going to go to this page, 01:04:23.540 --> 01:04:25.715 like we're searching for furniture to buy, 01:04:25.715 --> 01:04:29.285 it's going to get this first listing 01:04:29.285 --> 01:04:31.850 then it's going to save that off 01:04:31.850 --> 01:04:34.790 and check again every 15 seconds. 01:04:34.790 --> 01:04:38.720 Once this changes, then we know a new ad has been 01:04:38.720 --> 01:04:41.435 posted and it's going to tell us 01:04:41.435 --> 01:04:45.695 the newest ad that was listed. 01:04:45.695 --> 01:04:47.990 One thing to note on this is 01:04:47.990 --> 01:04:53.690 these first three ads are like sponsored spots. 01:04:53.690 --> 01:04:56.180 They were posted days ago, 01:04:56.180 --> 01:05:00.110 so when we do our CSS selectors, 01:05:00.110 --> 01:05:02.435 we're going to find this fourth one. 01:05:02.435 --> 01:05:06.710 But anyway, so do 01:05:06.710 --> 01:05:15.600 the new file and then same thing with the other scripts, 01:05:17.200 --> 01:05:20.340 stuff is just a set up. 01:05:24.070 --> 01:05:26.905 Then [NOISE] I'm going to 01:05:26.905 --> 01:05:30.010 put this setting section at the top. 01:05:30.010 --> 01:05:32.770 These are things that you might want to change. 01:05:32.770 --> 01:05:35.650 If you want to actually use this as a deal finder, 01:05:35.650 --> 01:05:36.970 you can put in 01:05:36.970 --> 01:05:41.360 a different search criteria 01:05:41.360 --> 01:05:43.950 or you can have it checked more often. 01:05:45.910 --> 01:05:52.265 In my experience, 15 seconds between checking is a lot. 01:05:52.265 --> 01:05:55.190 I think you'll get blacklisted 01:05:55.190 --> 01:05:57.545 pretty quick if you did that. 01:05:57.545 --> 01:05:59.570 When I was running 01:05:59.570 --> 01:06:03.380 this deal finder for 01:06:03.380 --> 01:06:06.290 its intended purpose to actually try and find deals, 01:06:06.290 --> 01:06:08.090 I would run it through 01:06:08.090 --> 01:06:12.540 a BPN2 just so I didn't get my IP banned again. 01:06:12.969 --> 01:06:15.110 John DeCrey: What's the recommendation 01:06:15.110 --> 01:06:16.609 , maybe every minute. 01:06:16.609 --> 01:06:20.495 Jerik: I think I was doing every two minutes 01:06:20.495 --> 01:06:23.045 and you could even go 01:06:23.045 --> 01:06:25.850 bigger than that depending on what you're searching for. 01:06:25.850 --> 01:06:30.935 If you're searching for something really niche, 01:06:30.935 --> 01:06:34.594 you could even do like once every 30 minutes probably? 01:06:34.594 --> 01:06:35.434 John DeCrey: Yes. 01:06:35.434 --> 01:06:36.919 Jerik: It just depends on how much. 01:06:36.919 --> 01:06:39.080 John DeCrey: I agree to, o every 15 seconds 01:06:39.080 --> 01:06:40.280 is pretty excessive. 01:06:40.280 --> 01:06:46.920 I probably avoid anything other than 30 seconds. 01:06:47.559 --> 01:06:49.800 Jerik: Yes. 01:06:50.919 --> 01:06:55.020 John DeCrey: Even every minute would be adequate. 01:07:02.740 --> 01:07:06.815 I was just thinking you could use a random generator. 01:07:06.815 --> 01:07:08.029 Random number generator. 01:07:08.029 --> 01:07:13.730 Jerik: Yes. That's actually a thing. 01:07:13.730 --> 01:07:15.200 When I was working at that job where we were 01:07:15.200 --> 01:07:17.555 scraping the funeral home websites, 01:07:17.555 --> 01:07:19.940 we did like a bunch of other things we were 01:07:19.940 --> 01:07:22.730 scraping for too but it was 01:07:22.730 --> 01:07:25.115 always like a cat and mouse game between us 01:07:25.115 --> 01:07:28.865 and the people not wanting us to scrape their stuff. 01:07:28.865 --> 01:07:30.620 But that was one of the things we did 01:07:30.620 --> 01:07:32.060 was add random timeouts, 01:07:32.060 --> 01:07:38.490 so it seemed more like a human was browsing their site. 01:07:40.479 --> 01:07:43.370 John DeCrey: Jarik and I used to work together 01:07:43.370 --> 01:07:50.945 in digital forensics and 01:07:50.945 --> 01:07:53.690 specifically mobile but we were 01:07:53.690 --> 01:07:58.340 always looking for and working with hackers that 01:07:58.340 --> 01:08:03.860 have workarounds on mobile security to 01:08:03.860 --> 01:08:09.380 get to the data for criminal investigations and whatnot. 01:08:09.380 --> 01:08:11.090 But same thing, 01:08:11.090 --> 01:08:13.530 it was always a cat and mouse. 01:08:13.570 --> 01:08:16.700 The manufacturers would always close 01:08:16.700 --> 01:08:18.020 the security holes and then 01:08:18.020 --> 01:08:19.880 the hackers would find new ones, 01:08:19.880 --> 01:08:22.880 and the forensic industry 01:08:22.880 --> 01:08:25.910 would expose and put it in their product. 01:08:25.910 --> 01:08:30.800 [LAUGHTER] 01:08:30.800 --> 01:08:32.030 Jerik: Just for this exercise, 01:08:32.030 --> 01:08:33.320 I'm going to put 15 seconds 01:08:33.320 --> 01:08:35.675 here just so it doesn't take too long. 01:08:35.675 --> 01:08:39.410 This furniture section gets posted too all the time. 01:08:39.410 --> 01:08:40.790 I'll be surprised if it doesn't 01:08:40.790 --> 01:08:43.324 find one in the first 15 seconds. 01:08:43.324 --> 01:08:47.630 John DeCrey: Yes. [NOISE] 01:08:47.630 --> 01:08:49.700 Jerik: Now I'm going to make a function that's going to 01:08:49.700 --> 01:08:53.730 get the first listings info. 01:08:54.010 --> 01:09:02.100 This listings info, call it, 01:09:03.820 --> 01:09:11.000 get_first_listing_info and this is 01:09:11.000 --> 01:09:13.800 where we'll have our first driver.get. 01:09:14.650 --> 01:09:18.900 This is going to go to that classified link. 01:09:19.840 --> 01:09:22.310 Then like the other scripts, 01:09:22.310 --> 01:09:25.440 we'll let it sleep for five seconds. 01:09:34.540 --> 01:09:38.520 Now we want to get the link 01:09:39.300 --> 01:09:47.690 driver.find element by CSS. 01:09:50.130 --> 01:09:53.390 I think on this one, 01:10:01.930 --> 01:10:06.270 let's see if we want the link. 01:10:07.900 --> 01:10:11.490 This is what we want. 01:10:15.000 --> 01:10:22.100 Let me see if I can get it by this item info title link. 01:10:22.930 --> 01:10:26.810 Except we are going to want it from 01:10:26.810 --> 01:10:30.980 this fourth listing so we can't do it by that. 01:10:30.980 --> 01:10:42.220 [NOISE] 01:10:42.220 --> 01:10:48.730 Let's see, we got search results. 01:10:48.730 --> 01:10:53.945 We'll want to go from there to this Div, 01:10:53.945 --> 01:11:01.190 to that class. 01:11:01.190 --> 01:11:02.540 I'm looking for when it goes to 01:11:02.540 --> 01:11:12.450 this first ad and then we can drill down from there. 01:11:13.960 --> 01:11:17.990 These will be posted somewhere too, right John? 01:11:17.990 --> 01:11:20.880 This will be a long one. 01:11:22.119 --> 01:11:25.355 John DeCrey: If you want to post in the chat 01:11:25.355 --> 01:11:28.369 then I'll put it in the Canvas. 01:11:28.369 --> 01:11:29.940 Jerik: Okay. 01:11:43.300 --> 01:11:44.960 I'll send it to you 01:11:44.960 --> 01:11:47.670 after the recording is done. 01:11:51.160 --> 01:12:01.410 This can be a div to a section to a div, 01:12:02.740 --> 01:12:09.200 to another div and by the way, 01:12:09.200 --> 01:12:11.550 this is going from 01:12:12.130 --> 01:12:16.950 the search result to the div section div. 01:12:19.950 --> 01:12:30.140 Then here I had to do a div 01:12:30.140 --> 01:12:34.235 and then this nth child, 01:12:34.235 --> 01:12:38.790 this is saying, get the first one. 01:12:47.190 --> 01:12:50.260 That's just saying, get this first div after 01:12:50.260 --> 01:12:52.480 this div so this is getting 01:12:52.480 --> 01:12:54.760 that and then we can do 01:12:54.760 --> 01:12:58.255 another nth child here to get the fourth section, 01:12:58.255 --> 01:13:01.010 which is the third that we want. 01:13:02.700 --> 01:13:10.345 After that the sectioned 01:13:10.345 --> 01:13:14.930 nth child the fourth one. 01:13:15.690 --> 01:13:22.180 Then from there we go 01:13:22.180 --> 01:13:30.505 to another div, 01:13:30.505 --> 01:13:33.640 the 2dh div, and 01:13:33.640 --> 01:13:42.890 then h_2 div and the link, it's a long one. 01:14:05.010 --> 01:14:09.880 It looks like there's a tag. 01:14:09.880 --> 01:14:12.580 Then we'll do the same thing we did on 01:14:12.580 --> 01:14:16.664 the Amazon one we'll get the attribute intrif. 01:14:16.664 --> 01:14:18.630 John DeCrey: Okay,. 01:14:29.709 --> 01:14:32.285 Jerik: I hope I typed that all in right, 01:14:32.285 --> 01:14:37.050 but we'll see, then the title. 01:14:48.430 --> 01:14:50.885 Then this is going to be 01:14:50.885 --> 01:15:02.000 similar from Oh, 01:15:02.000 --> 01:15:10.710 wait, this is really similar. This is the same thing. 01:15:20.920 --> 01:15:23.210 Because here's the title of this brand 01:15:23.210 --> 01:15:24.725 new velvet ivory couch. 01:15:24.725 --> 01:15:26.075 It's the same selector. 01:15:26.075 --> 01:15:32.550 We're just getting the text from it this time. 01:15:35.769 --> 01:15:39.589 John DeCrey: I don't know if I want a velvet couch. 01:15:39.589 --> 01:15:43.890 Jerik: I know. It look pretty though. 01:15:43.960 --> 01:15:46.130 That's one of those couches that it's 01:15:46.130 --> 01:15:48.244 just a show couch. You don't actually use it. 01:15:48.244 --> 01:15:49.920 John DeCrey: Yeah. 01:15:51.069 --> 01:15:53.090 Jerik: This function will return 01:15:53.090 --> 01:15:54.529 the link in the title. 01:15:54.529 --> 01:15:59.240 John DeCrey: When you do two returns like that, 01:15:59.240 --> 01:16:03.109 it's returning as a tapple, is that right? 01:16:03.109 --> 01:16:07.775 Jerik: Yeah. Now, 01:16:07.775 --> 01:16:12.480 we can call this function and save it to listing info. 01:16:22.180 --> 01:16:29.090 Then we're going to save off the link, 01:16:29.090 --> 01:16:34.249 so we can check it again in 15 seconds. 01:16:34.249 --> 01:16:37.204 John DeCrey: That's our Delta. 01:16:37.204 --> 01:16:38.569 Jerik: Yep. 01:16:38.569 --> 01:16:40.489 John DeCrey: Sort of thing. 01:16:40.489 --> 01:16:43.490 Jerik: The first listing link, 01:16:43.490 --> 01:16:49.530 temp equals listening info 01:16:49.530 --> 01:16:55.050 and it's the first thing in there, the link. 01:16:57.700 --> 01:17:08.640 The title is going to be the second thing. 01:17:12.850 --> 01:17:16.310 Scripts like this to you, I like to print 01:17:16.310 --> 01:17:17.450 out like what's happening 01:17:17.450 --> 01:17:19.504 if it's going to be running for a while. 01:17:19.504 --> 01:17:21.589 John DeCrey: That is for sure. 01:17:21.589 --> 01:17:23.390 Jerik: We'll print this out 01:17:23.390 --> 01:17:25.115 just to log what's going on. 01:17:25.115 --> 01:17:28.260 First, listing title. 01:17:28.810 --> 01:17:32.310 It's going to be the title, 01:17:33.310 --> 01:17:42.470 and then we'll print out the link to you. 01:17:42.470 --> 01:17:51.500 I'm going to add 01:17:51.500 --> 01:17:56.310 a counter so we can print out how many times this is run. 01:18:00.760 --> 01:18:06.620 Then now, I'm going to make a while loop and this 01:18:06.620 --> 01:18:12.185 is where it's just going to run until it finds a new ad, 01:18:12.185 --> 01:18:14.870 so it'd be, while true 01:18:14.870 --> 01:18:16.760 that means it's just going to run 01:18:16.760 --> 01:18:19.500 forever until I tell it to break, 01:18:21.670 --> 01:18:29.510 we'll increase the check count and 01:18:29.510 --> 01:18:37.069 then we'll sleep for our time we specified up here. 01:18:37.069 --> 01:18:37.730 John DeCrey: Okay. 01:18:37.730 --> 01:18:43.410 Jerik: You print out another log message. 01:18:58.420 --> 01:19:02.630 This will print out that it's checking 01:19:02.630 --> 01:19:08.100 and then we'll get a new listing info. 01:19:09.850 --> 01:19:13.760 That is, we'll get it from the get 01:19:13.760 --> 01:19:17.790 first listing info function we made. 01:19:19.270 --> 01:19:25.790 Then first list link 01:19:25.790 --> 01:19:34.670 can be listing info[0]. 01:19:34.670 --> 01:19:39.919 John DeCrey: So info[0] was the title or the or no? 01:19:39.919 --> 01:19:44.329 Jerik: Zero is the link and then 1 is the title. 01:19:44.329 --> 01:19:46.500 John DeCrey: Okay,. 01:19:47.289 --> 01:19:55.220 Jerik: One, and then 01:19:55.220 --> 01:19:56.510 this is where we'll check to see 01:19:56.510 --> 01:19:58.010 if it's different or not. 01:19:58.010 --> 01:20:02.150 If it's different we know it's a new ad so if 01:20:02.150 --> 01:20:06.935 first listing link temp 01:20:06.935 --> 01:20:12.155 does not equal first listing link, 01:20:12.155 --> 01:20:14.539 this is when we know it's different. 01:20:14.539 --> 01:20:16.654 John DeCrey: Lets checked in our Delta? 01:20:16.654 --> 01:20:18.210 Jerik: Yep. 01:20:20.079 --> 01:20:23.190 John DeCrey: There's the new ad. 01:20:37.949 --> 01:20:42.654 Jerik: Man, this break will stop the program. 01:20:42.654 --> 01:20:44.780 John DeCrey: Okay. 01:20:49.469 --> 01:20:53.084 Jerik: We'll see if this works. 01:20:53.084 --> 01:20:56.750 John DeCrey: There's different options that you can put in here. 01:20:56.750 --> 01:20:58.280 If you were to actually use this for 01:20:58.280 --> 01:21:05.630 your own self and looking for something, 01:21:05.630 --> 01:21:10.190 and once it found something that matches your criteria, 01:21:10.190 --> 01:21:12.410 you can have it email, 01:21:12.410 --> 01:21:14.080 send an email to you and 01:21:14.080 --> 01:21:16.724 you probably even do a text message, couldn't you? 01:21:16.724 --> 01:21:18.610 Jerik: Probably, there's probably 01:21:18.610 --> 01:21:20.965 a module for that, to be honest. 01:21:20.965 --> 01:21:22.990 I get email. 01:21:22.990 --> 01:21:25.330 It was pretty easy to set up to email. 01:21:25.330 --> 01:21:29.630 There's SMTP module that I used. 01:21:30.459 --> 01:21:36.139 John DeCrey: What's the error that we got there on the bottom? 01:21:36.139 --> 01:21:39.365 Jerik: That was that stuff. 01:21:39.365 --> 01:21:40.955 I'm not sure what's the logging that 01:21:40.955 --> 01:21:43.369 whether it's selenium or chrome. 01:21:43.369 --> 01:21:45.124 John DeCrey: We can ignore that stuff? 01:21:45.124 --> 01:21:45.679 Jerik: Yeah. 01:21:45.679 --> 01:21:47.144 John DeCrey: Okay. 01:21:47.144 --> 01:21:48.924 Jerik: I need to. 01:21:48.924 --> 01:21:49.734 John DeCrey: Objection. 01:21:49.734 --> 01:21:50.634 Jerik: A copy of that. 01:21:50.634 --> 01:21:53.370 John DeCrey: This is attempt of number one. 01:21:53.370 --> 01:21:57.845 In 15 seconds we'll check to see if there's a new ad. 01:21:57.845 --> 01:22:04.399 Number two, has there been new ads being posted? 01:22:04.399 --> 01:22:05.914 Jerik: One just got posted. 01:22:05.914 --> 01:22:07.414 John DeCrey: Sweet. 01:22:07.414 --> 01:22:09.470 Jerik: You can go cop yourself as 01:22:09.470 --> 01:22:11.794 Freeman Style grandfather clock. 01:22:11.794 --> 01:22:14.329 John DeCrey: Let's look at that URL,. 01:22:14.329 --> 01:22:17.520 Jerik: Nice. 01:22:23.079 --> 01:22:28.099 John DeCrey: Brand new ad that we just found within seconds. 01:22:28.099 --> 01:22:29.344 Jerik: Yep. 01:22:29.344 --> 01:22:31.860 John DeCrey: Very cool. 01:22:33.629 --> 01:22:35.709 Jerik: That is it. 01:22:35.709 --> 01:22:38.750 John DeCrey: Then we can change it to headless, 01:22:39.240 --> 01:22:44.600 browsers are not showing while your script is running. 01:22:47.709 --> 01:22:51.150 Jerik: That is all I have. 01:22:52.270 --> 01:22:55.040 I'll turn it back to you. Usually, ask questions, 01:22:55.040 --> 01:22:58.380 but back to you John. 01:22:58.689 --> 01:23:04.265 John DeCrey: Many thanks. This concludes that lab. 01:23:04.265 --> 01:23:06.290 If you would submit this into 01:23:06.290 --> 01:23:13.580 the canvas that's called KSL, 01:23:13.580 --> 01:23:16.110 it's just called KSL Lab. 01:23:17.290 --> 01:23:19.820 Many thanks. Jerkally, 01:23:19.820 --> 01:23:21.560 I appreciate all the time that you 01:23:21.560 --> 01:23:24.905 have spend on this and always 01:23:24.905 --> 01:23:27.530 enjoyable talking with you and hearing 01:23:27.530 --> 01:23:31.534 your stories so I really appreciate it. 01:23:31.534 --> 01:23:34.800 Jerik: No problem. Thanks for having me.