-
John DeCrey: Welcome to week 10.
-
We're going to introduce and
talk a little bit more about
-
third party modules and
-
what that means and
how to install it.
-
Then we've got
several code longs
-
coming up that I
think will be fun.
-
I'd like to introduce my guest.
-
This is Jerick.
-
We've worked together
for several years now.
-
When we first meet, was it 2014?
-
Jerik: Yeah. That sounds right.
-
John DeCrey: Anyway Jerick lives
-
and breathes in Python
and he's done a lot
-
of his own personal
Python scripts
-
that I'll let him talk about
and some of them are funny.
-
Then he's an avid motorcyclist,
-
and he'll talk a little
bit about that too.
-
We'll say, how's the saying go,
-
was it bad decisions
leads to good stories?
-
Jerik: Yes.
-
John DeCrey: Unfortunately
I have a lot of stories.
-
Anyway go ahead, Jerick.
-
Jerik: Yes. Just to
introduce myself,
-
I've been working in
the software industry
-
for almost 12 years.
-
I've mostly been in the QA side
-
of things that's testing
and finding bugs.
-
A lot of my job though has
been creating automated tests.
-
A lot of that has been
through Python 2.
-
Our coat along exercises
is going to go
-
through some of the things
I've learned doing that.
-
I have some slides
to go through at
-
the beginning though,
let me just start this.
-
This is third party Python
modules. I'll just read this.
-
Modular programming refers
to the process of breaking
-
a large unwieldy programming
tasks into separate,
-
smaller, more manageable
sub tasks or modules.
-
Individual modules can then be
-
coubbled together like
building blocks to create
-
a larger application.
Just as an example.
-
You guys are probably
familiar with this when
-
you do these import
statements like this time.
-
This is a module that's
built into Python.
-
It's a way that you
can bring this into
-
your program and utilize
-
everything that's been built
out around this module.
-
It's just a way of not
-
having to redo work that
someone else has already done.
-
John DeCrey: We have done
imports throughout the course.
-
The from is probably
-
a little bit different
than anyone has seen it.
-
Can you talk about
that a little bit?
-
Jerik: Yeah. Let me
go back to that.
-
These firm statements are
-
similar to the
import statements,
-
but it's digging more
into the module itself.
-
John DeCrey: Submitting a
specific item from the import.
-
Jerik: Yeah. Exactly.
The advantages
-
of modular programming are,
-
that it's more simple, it's
maintainable and reusable.
-
When you import
that time module,
-
you don't have to do any
maintenance on that.
-
The people who run the
Python project are going to
-
be adding methods to that
or fixing something.
-
If we changed our time format,
-
they would be the ones going
-
and making sure
that still worked
-
with all the existing
Python programs out there.
-
Python, that time module
is one that's built in.
-
There's a bunch that
Python has built in.
-
Then there's also a bunch
of third party modules.
-
These are examples of ones
that are built into Python.
-
It's just a Python
modules index.
-
Then these are some common
third party modules.
-
They're guaranteed,
-
not every one of them
isn't listed here
-
because I would guess there's
hundreds of thousands.
-
The thing I love about Python
-
is any problem that
you're looking to solve,
-
someone on the
Internet has probably
-
solved it and created
a module for it.
-
You can just go and
import that module,
-
and get up and
-
running a lot faster than
having to do it yourself.
-
John DeCrey: Chances are high.
-
There's a module
out there for it.
-
The saying there's an app for
-
that in Python, there's
a module for that.
-
Jerik: Exactly. You guys
-
have probably already
done this too.
-
John DeCrey: We actually
have not used pip yet.
-
I'm trying to think,
-
but I don't think that we have.
-
As far as exiting installing.
-
Jerik: To install a
third party module.
-
The syntax is pip install
and then the package name.
-
I'll show an example of that
-
later on when we do the
code along exercises.
-
John DeCrey: We type that
in at the Python console.
-
Jerik: Now I want to just
show some examples of
-
different modules that I've used
-
like throughout work and
personal projects and stuff.
-
This is only three
lines of code,
-
but it's going to do
something really cool.
-
This is a module
called pytesseract.
-
What it is it's image,
-
it's a OCR engine.
-
You can get a picture.
-
This is just a
picture of a receipt.
-
You can get all the
text data from it,
-
which is crazy when
-
three lines of code,
and it can do that.
-
The reason I needed this
is I worked at a company
-
where we would scrape
the web for obituaries.
-
Then we sold that
the obituaries to
-
a well known Utah based
ancestry website.
-
There's this website.
-
John DeCrey: What company
you're talking about?
-
Jerik: Yeah. [LAUGHTER] There's
-
this obituary website in Germany
-
that for some reason
on their web page
-
instead of pasting the
obituary like as text they had
-
generated like a PNG
or a picture file
-
of the obituary and posted
that on the website.
-
We had our scraper go
and get those images,
-
and it would use pytesseract
to get the text out of
-
those images, to
get the obituaries.
-
But I can show this running.
-
We have this picture.
-
John DeCrey: That's a PNG file.
-
Jerik: All this is going
to do is open that
-
and get the text from
it and printed out.
-
John DeCrey: We're invoking
the image to string function,
-
and then passing in the
image to that function.
-
Jerik: Yeah, so this
image-to-string function is
-
something that's built into
this pytesseract module.
-
John DeCrey: Perfect.
-
Jerik: So running.
-
John DeCrey: Well, look at
that. Yeah, that's cool.
-
That was a fuzzy picture too.
-
It wasn't like, crisp, clean.
-
Jerik: Yeah, it actually
does pretty good.
-
I was playing around
with other images but.
-
John DeCrey: Yeah.
It did a good job.
-
Jerik: To me, it was just crazy.
-
Like that huge task of
getting obituary data
-
from picture files
on the Internet
-
was reduced to basically
three lines of code.
-
John DeCrey: Yeah, seriously.
-
Jerik: It did not take me
hours upon hours to do that.
-
[LAUGHTER] [OVERLAPPING]
-
John DeCrey: There's
already, stuff out there,
-
but if you wanted to design
your own, like for example,
-
you take a picture of receipts
and have it parse out,
-
saving it to your own data.
-
There's just lots of.
-
Jerik: Yeah, the sky's
the limit. [LAUGHTER]
-
John DeCrey: Of course
stuff you can do with that.
-
Actually, if I can
talk a little bit
-
about that for a minute.
-
We did a project at
the Uview Hospital.
-
The hospital uses Epic,
-
which is a huge EMR system,
Electronic Medical Records.
-
Every now and then,
like nurses and
-
the medical personnel that use
-
it will get an error message.
-
Sometimes the error, the screen,
-
it's like it's a message box.
-
They may not be sure what
the recovery is or what
-
the procedure is or process
-
is when you encounter
certain errors.
-
[NOISE] We developed a thing
-
where the medical personnel
can take a picture from
-
the screen of that error and
-
it translates it
to text and does
-
a look-up from a database,
-
finds the error code,
-
the error number, and stuff,
-
and then brings up the
documentation right
-
there on the handheld.
-
Jerik: That's awesome.
-
John DeCrey: Just give more
information on the error
-
and what best practices moving
forward from that error.
-
Lots of really great
opportunities.
-
Like you said,
-
the sky is the limit there.
-
Jerik: Here's another example.
-
Where is it? This is another
thing from that same job.
-
John DeCrey: Other
funeral homes?
-
Jerik: Yeah. [LAUGHTER] We
wanted to make sure that we had
-
all the funeral
home websites that
-
has the obituary data
in the US and Canada.
-
This isn't the script itself,
-
but the same idea.
-
A script that would
go on Google Maps,
-
search for a funeral home in
-
every zip code and
-
return a list of all
the funeral homes.
-
And then I would
go and add it to
-
a master list of
-
that would check for like
duplicates and stuff.
-
John DeCrey: Let's
talk about that.
-
Just those couple lie to
code and what it's doing.
-
We got on line four,
-
so you got request and
get and Google URL,
-
along with an appended
search string,
-
which is how Google
searches work.
-
Line five, however,
looks very cryptic.
-
I know what that is. I
do my best to avoid it.
-
[LAUGHTER]
-
Jerik: I probably knew
what this meant at
-
one point too. But
I don't anymore.
-
John DeCrey: This is called
regular expressions.
-
Whoever invented the syntax
-
for regular [LAUGHTER]
expressions,
-
I don't know, should maybe
-
go find something
else to do instead.
-
[LAUGHTER]
-
Jerik: I agree.
-
John DeCrey: I
suppose if you spend
-
enough time in it,
it becomes memory.
-
But anytime you
have something that
-
you have to have a guide and
a cheat sheet and stuff,
-
and ledgers and
whatever you look up,
-
I would say that it failed.
-
That's regular expressions on
-
Line 5 is what that
stuff is in. Go ahead.
-
Jerik: This is the
third-party module
-
that I'm using in this one.
-
This request, it
basically just gets
-
the entire HTML
from this request,
-
and then this rejects
URLs out of the HTML.
-
John DeCrey: Just
for clarification,
-
that's just shorthand for
-
regular expressions
when you hear
-
the rejects, same thing.
-
Jerik: Then Google
obviously has some of
-
their own URLs in there,
-
so I filter those out.
-
I'll run this, you can see
-
what [OVERLAPPING]
it prints out.
-
These are all what
-
shows up when I search
funeral homes in my zip code.
-
[LAUGHTER]
-
John DeCrey: That's right.
Then you specified a zip code?
-
Jerik: Yeah. When
I ran this script,
-
like for my job at that company,
-
I think I just passed it in
-
all the zip codes and just
let it run overnight.
-
[LAUGHTER] Then in the morning
had a list of like 20,000.
-
John DeCrey: In Line 4,
-
I just noticed that you're
-
converting your zip
code to a string.
-
Why wouldn't you just wrap
it inside of a string?
-
Is there a particular reason?
-
Jerik: I probably copied
this from my real script and
-
I was probably passing in
-
a variable of [OVERLAPPING]
the zip codes there.
-
I was just lazy
-
when I brought it over
to this example one.
-
[LAUGHTER]
-
John DeCrey: It's pretty cool
that you can do that too.
-
When that returns, it looks like
-
the URLs that on Line 5,
-
all that the results are
coming into that URL's,
-
that looks like a
list. Is that right?
-
Jerik: Yeah.
-
John DeCrey: Okay.
-
Jerik: I can even do this.
-
John DeCrey: Oh, I see.
Then in your fore loop,
-
you're going through that list,
-
and then also making sure
-
that anything that's
in the bad URLs is
-
not in the URL's
list. That's cool.
-
Jerik: I have a
breakpoint right here.
-
Now, this res
variable will have,
-
this text is going to be
the entire HTML from that,
-
I don't know, there's a way
to copy the whole thing.
-
But anyway, it's like
-
a big jumbling of HTML
that gets returned.
-
John DeCrey: Very cool.
-
Jerik: This next one is a
personal project that is dumb,
-
but awesome too because
-
it's been [OVERLAPPING]
paying out.
-
[LAUGHTER] I call this
one the Being Rewarder.
-
Being the search engine
has this dumb program
-
where it's called Being Rewards.
-
It gives you these points
-
every day for just
searching and being.
-
I think you can do
like 20 searches
-
on desktop and then like 15
-
on mobile being and
you get points for it.
-
I've made this script that
just searches being every day.
-
John DeCrey: Then you can redeem
the points for goods, right?
-
Jerik: Yeah. You can get
gift cards and stuff.
-
I use it for like
Xbox game pass on PC,
-
which is like a game service.
-
John DeCrey: What is
that typically files?
-
Jerik: I have no
idea. [LAUGHTER] I
-
bet it's like 20 bucks a
month or 15 bucks a month,
-
probably. Not too expensive.
-
John DeCrey: I have to say
more like 50 bucks a year.
-
But if you're saying
25 [LAUGHTER]
-
a month, that's
pretty expensive.
-
Jerik: This script
has been working
-
for like six years now.
-
[OVERLAPPING] I thought it would
-
have been shut down forever.
-
John DeCrey: You've
never paid for it.
-
Because you get the
being rewarded.
-
Jerik: Yeah, exactly.
-
[LAUGHTER] I'll talk through
the code a little bit.
-
These functions, actually,
-
I'll run through it, and
then I'll go through it.
-
What this is going to do
is just going to open,
-
being in desktop mode,
-
search it three times,
-
and then open it in mobile mode,
-
and search it three
times. We'll run that.
-
These are just random words.
-
John DeGrey: From
your generator there.
-
Jerik: I haven't thought
about this before.
-
This is actually
-
a good example to bring up.
-
I have these generated words,
-
but there was also another
third party module
-
that was for just
entering in real.
-
But from the dictionary words.
-
I was using that for a while,
-
but it had to get it from
-
their online database
or whatever and
-
sometimes it didn't work
and so I went back to this.
-
John DeGrey: I see.
-
Jerik: That's just
something you have to
-
deal with with third
party modules.
-
Sometimes they don't
work the best, I guess.
-
John DeGrey: Yeah, we saw it run
-
three times for the
desktop and three times on
-
the mobile and you just
-
run this automatically
daily, right?
-
Jerik: Yeah.
-
John DeGrey: It's awesome.
-
Jerik: I call this the KSL
deal finder. It's a script.
-
When you search for
something you want,
-
like dirt bikes for example,
-
you have your list
of search results.
-
The top ad is going
to be the newest.
-
This script gets the top
ad and then it keeps
-
checking for new ads to show up.
-
If a new ad is
posted on my script,
-
at least I send an email
-
to myself that
there was a new ad.
-
It's a way to be the
first person to message,
-
the person who posts
what they're selling.
-
John DeGrey: Yeah, especially
if it's a great deal
-
and I love the story.
-
If you wouldn't mind telling us
-
the great deal you got on
one of your motorcycles.
-
Jerik: I was in the market
for a new dirt bike
-
and I had this script
running and I was up in
-
Heber and was that like
-
eight o'clock at night
and I was looking at
-
a bike and didn't want it and
-
then eight o'clock I get a
email from the deal finders.
-
John DeGrey: Eight
o'clock at night?
-
Jerik: Yes. Of this
incredible deal.
-
Like crazy deal.
-
But it was down in Grand
Junction, Colorado.
-
John DeGrey: You're out
of where? You're at?
-
Jerik: Yeah, I'm in Orem.
-
I think it was like a
three or four hour drive.
-
But anyways, I'm
like, ding it now.
-
I have to drive to Grand
Junction, Colorado tonight.
-
Yeah, I called the guy up.
-
I'm like, I can drive down
-
now but I won't be
there till like
-
midnight or 01:00 AM
and he said that was
-
fine and so I start driving down
-
there and it was a blizzard
-
in Spanish Fort Canyon
of all places, too.
-
I hate that road, but
I almost turned back.
-
John DeGrey: Oh, is that right.
-
Jerik: Probably like
three or four times?
-
Yeah. I even pulled over once.
-
John DeGrey: Oh, really?
-
Jerik: Yeah, I almost just
because the deal was so good,
-
I thought it had to
have been a scam.
-
John DeGrey: Oh, yeah,
that's through the whole way
-
down you're thinking,
please don't be a scam.
-
Jerik: But yeah, I ended
up picking up the bike.
-
Drove back. Yeah, I still
have that dirt bike.
-
I could still sell it for
-
1,000 more than I bought it for.
-
John DeGrey: Wow.
-
Jerik: It was one was that
access data so over six years.
-
John DeGrey: Wow, so
because of your script,
-
you got an alert to
-
the ad immediately as
soon as it became on it.
-
Jerik: Yeah, so I was
definitely the first caller.
-
But I've had like
buddies use this script
-
too and find deals on stuff too.
-
John DeGrey: Then you still
went to work that same day?
-
Jerik: Oh, yeah I got back
-
at 04:00 AM and I still went to
-
work and there's absolutely
no way I could do that now.
-
I think just getting over 30,
-
I'm retired from doing
stuff like that.
-
John DeGrey: Until the next
great deal comes along.
-
Jerik: Oh yeah, probably.
-
John DeGrey: By the way,
-
for the listeners here,
-
actually two things for anybody
-
that's out of state
not familiar with KSL,
-
that KSL is a local
station here in Utah
-
and they have online
classified listing.
-
It is probably one
of the most common
-
and popular in the state.
-
In fact, other people in other
states use it frequently.
-
It's not as famous as eBay.
-
Well, it may be
as an auction too
-
but coniine ads and stuff.
-
We're actually
going to do one of
-
our labs using the
KSL deal finder.
-
Then just want to do a caution
-
because any time that
-
this is also referred to
as like screen scraping,
-
anytime that you're doing that,
-
you want to make sure that
you don't flag yourself as
-
being a bot that could
-
potentially get banned
from a service.
-
Jerik, I think
-
didn't you get banned
at one time from KSL?
-
Jerik: Yeah, they banned my IP.
-
John DeGrey: Yeah, so they
block your IP address.
-
You can't use it anymore?
-
I think we have the timer.
-
This goes into a timer setting,
-
loops and there's
a sleep function
-
that we can use to sleep for
-
so many seconds and
so he's got it set to
-
15 which is I'm assuming
is that 15 seconds?
-
Jerik: Yeah.
-
John DeGrey: That's
probably well adequate.
-
The thing is, you
don't want to flag
-
yourself and lower that
down to maybe even
-
like 10 or five seconds
because chances are,
-
a lot of these systems
including KSL,
-
they do have bot detectors and
-
you'll definitely get
yourself put on the radar.
-
That's just a word of caution
when using these scripts.
-
Take it away Jarik.
-
Jerik: I don't have any
examples but there's
-
a couple of other projects that
-
I did that I want to talk about.
-
There's one I called
the info wall.
-
It was me and my buddies.
-
We did like a Bluetooth
speaker start up thing that
-
we sold it on Amazon
-
and we wanted a pretty display
-
that showed how many
sales we got that day.
-
I made a little
script that would go
-
and look at our Amazon portal
-
and just get the
sales for that day.
-
Now that I think about
it, there's probably
-
an API I could have looked
into to get that data,
-
but this is what I'm
used to so I did that.
-
John DeGrey: It's fun
to do your own anyway
-
then you have more control of
-
getting exactly what you want.
-
Jerik: Unfortunately,
the info wall
-
didn't have very good numbers.
-
We didn't sell very
many speakers,
-
so that was retired.
-
Another project,
-
this is probably the
dumbest project I've done.
-
I called it NBA bet buddy.
-
I used to bet on
NBA games a lot.
-
I tried an experiment
where I had made
-
a script that every morning
-
it would go and get the
Vegas odds of games
-
and then compare it against
-
expert picks for
games that night.
-
Then anytime there was
a mismatch between
-
the odds and the expert picks,
-
it would send those to me
-
because those are
supposedly supposed to
-
be like good ones to bet on.
-
End of the story, they were not
-
so that project is retired too.
-
John DeGrey: Still
a lot of risk?
-
Jerik: Yes. Vegas knows how
to make their money still.
-
John DeGrey: That's funny.
-
Jerik: All right. We're good
-
to start the code-along
exercises now?
-
John DeGrey: Yeah,
let's jump into it.
-
What's our first code-along.
-
Jerik: This one is going
to be a simple one.
-
I'll show you what
this is going to do.
-
John DeGrey: We're going
to do this ski report.
-
Jerik: Yeah. What this
script is going to do,
-
it's going to go
to this web page,
-
SkiUtah.com snow report.
-
This is a good day to do it.
-
This is October 23.
-
We just had some snow.
-
John DeGrey: Actually got
some snow last 24 hours.
-
Jerik: We got some
data to extract.
-
This is what our
script, it's just going
-
to get for snowbird,
-
the 24 hour snowfall,
-
48 and then the base
-
and since this is
the first snowfall,
-
these are going to be the same.
-
But that is all right.
-
Let me make a new file.
-
John DeGrey: You're just
creating a new game pip file?
-
Jerik: Yeah. Then let's see.
-
This is where we're going to use
-
the pip install syntax
-
to install the
modules that we need.
-
The first one we need
is called selenium,
-
so you just type pip install
selenium into your terminal.
-
John DeGrey: Most
people, I don't know
-
if anyone's using
the IDE you're in,
-
most of us is in PyCharm.
-
If you open up the
Python console,
-
which is in PyCharm,
-
down at the bottom you'll see
-
a terminal and there
will be Python console.
-
You can also do it
from the terminal,
-
but you still have
to be in Python.
-
But you go and there's
a PIP, install.
-
Jerik: Yeah, then this
-
will say that I already
have it installed.
-
But if you don't,
it will install it.
-
John DeGrey: See how
do you spell that?
-
Jerik: S-E-L-E-N-I-U-M.
-
John DeGrey: Okay.
-
John DeCrey: Mine
says the same thing.
-
Already set aside.
-
Jerik: So Selenium.
That's the module that
-
you're using to hook
to the web page,
-
can navigate web pages with
it like click on buttons
-
or Enter and search
-
bars like it did for
the being rewarder,
-
or you can also get
text from it too,
-
so that's what we're going
to use for this script.
-
We do need another module,
-
this one's called
WebDriverManager,
-
so you do pip, install
web driver-manager.
-
John DeCrey: Web driver
and then hyphen,
-
is there a space?
-
Jerik: No space,
just dash manager.
-
John DeCrey: Dash manager.
-
Jerik: So when I was
getting this lesson
-
ready yesterday, I noticed that.
-
When I was running the scripts,
-
I had to add one more package,
-
I don't know why,
but just in case,
-
I might as well install it,
-
it's called packaging,
-
so you do pip install packaging,
-
and now we should be good to go.
-
John DeCrey: Install packaging?
-
Jerik: Yeah.
-
John DeCrey: Requirement
already satisfied. Cool.
-
Jerik: Then for each of
these Selenium scripts,
-
I'm just going to copy
-
some stuff that gets
everything set up.
-
I don't know if I
should leave it on
-
the screen for a while, John.
-
I'm good to keep going.
-
John DeCrey: Actually,
if you wouldn't mind
-
just bumping your
font up a little bit,
-
I've been using 18.
-
Jerik: How is that?
-
John DeCrey: There you
go. That looks great.
-
What are the instructions?
-
Instructions are to just
-
copy what you got
there on the screen?
-
Jerik: You can
pause now and copy.
-
John DeCrey: People
watching this,
-
you can just pause the video,
-
go ahead and enter
all that stuff
-
in your script and
then go ahead, Jack.
-
Do you want to explain?
-
Jerik: Yes, so I'll just
explain some of these.
-
This driver object it's
-
basically the browser that
-
we're going to be
interacting with,
-
so you can pass it in
a bunch of arguments,
-
one of them is headless,
-
what headless does
is it makes it
-
so it doesn't bring
up the browser.
-
Makes the script run faster,
-
because it times have to
load images and stuff.
-
John DeCrey: You
are still running,
-
but you don't see it.
-
It's running as a
back end process,
-
so there's no user interfaces.
-
Jerik: When I'm
building scripts,
-
I'll usually comment
this out so I can see
-
what's going on as it runs.
-
Then this, I just had to add
-
because I don't know if
it's chrome or Selenium,
-
just logs, or these two lines I
-
guess make it so it doesn't spit
-
out logs that look scary.
-
This is all setting
up this driver,
-
which is the web driver
-
or the browser that we're
going to interact with.
-
John DeCrey: The driver
is what that's basically
-
what's driving the
screen scraping.
-
That's what's loading
the web browser thing,
-
Jerik: To navigate to a
web page using Selenium,
-
you do driver.get,
-
and then in this case
-
we're going to go
to this web page,
-
and then there's better
ways to do this,
-
but this makes it a lot simpler.
-
I'm just going to
put a sleep there,
-
so this is going to
go to that web page
-
and sleep for five seconds,
-
this sleep is just
going to make sure that
-
everything's loaded before
we try and do stuff with it.
-
John DeCrey: Then
five means we're
-
sleeping it for five seconds.
-
Jerik: The first thing
we want to get is
-
this 24 hour snowfall.
-
To do that we can set
it as a variable,
-
so I'll just do our 24 equals,
-
and then you do everything
through the driver.
-
driver.find_element.
-
All these things on the page,
-
they're called elements,
-
and then with Selenium,
-
there's different
ways you can find
-
or you can hook to the elements,
-
CSS selectors is a
really popular one,
-
but I've used different
ones like x path,
-
I've had to use a few times,
-
it just depends on
what works are not,
-
so we'll do CSS sector.
-
Then just a quick overview
on CSS selectors,
-
so this is the page source,
-
over on the right, you can
click this little button,
-
it's called the element picker.
-
John DeCrey: How did
you tell [OVERLAPPING]?
-
Jerik: Sorry.
-
John DeCrey: How
you even got there?
-
Jerik: So when you
have chrome open,
-
you can press F12 and
we'll bring this up,
-
it's called the DevTools.
-
John DeCrey: And for mac users?
-
Jerik: I think it's F12
on Mac 2, isn't it?
-
John DeCrey: There's
a menu option too
-
from the Chrome menu.
-
Jerik: Inspect, I think.
-
John DeCrey: Inspect.
-
Jerik: Excuse me. So
this element picker.
-
John DeCrey: So you
click that little icon
-
in the far left?
-
Jerik: Yeah.
-
John DeCrey: That's cool.
Then you can hover over
-
the web page and it goes
right into the elements?
-
Jerik: Yeah. For this exercise,
-
I'm going to show you
a really easy way to
-
get the selector for
whatever element you want.
-
You click on what you
-
want and it highlights
it here in the source,
-
and then you can right
click on this and do copy,
-
select that and that
copies the CSS selector.
-
You can just paste
that in there.
-
John DeCrey: That's really cool.
-
Jerik: So if we left it
like this and ran it,
-
this hour 24 variable would
be the Selenium element,
-
but we need to get the
text from that element,
-
so to do that,
-
we're going to use.getattribute,
-
and then here you do inner HTML.
-
John DeCrey: And by
the way, for anybody
-
that might be having problems
-
getting that whole
CSS selector stuff,
-
I do have it posted
in the assignment
-
in Canvas that you can copy.
-
I encourage you to try to follow
-
Jerrek getting it live
from the website,
-
but just as a backup plan,
-
you can get it from
the Canvas page too.
-
Jerik: Next we want
48 hour total.
-
This is going to
be really similar
-
just a different selector,
-
so driver.find_element
by CSS selector,
-
and I'm just going to
add the get attribute,
-
inner HTML here,
-
then do the same thing to get
-
that selector, click on that,
-
right click, copy selector,
-
and put that in there.
-
John DeCrey: Do it
once for the 24 hour
-
and then twice for the 48.
-
Jerik: Then, same
thing for the base.
-
Now this should have those
values there, the 19.
-
Now we just want to print
it out so we can make
-
a report variable and
build out a report.
-
Let's see, 24-hour snowfall.
-
Then the base,
-
and then we can print
out the report.
-
John DeCrey: Just to
explain like line 25
-
using catenation and like
the slash being there.
-
We have talked about that.
-
Another kind of cool away we
could do it would be using
-
the string tripolation and
-
then the triple quotes
rather to get the literal.
-
Anyway, that's all that's doing.
-
Jerik: I'm going
to try this now.
-
I am running it, this
-
is going to be a boring
one because it's
-
just bringing up this web page.
-
I noticed even though
that box is there,
-
the page in the
background is still
-
loaded everything.
It doesn't matter.
-
John DeCrey: That's cool.
-
Jerik: If we did need to click
on things for our script,
-
I bet we would have had
to get rid of that.
-
This anticipated
opening dates thing.
-
That's just things
that you have to deal
-
with writing scripts like this.
-
That is that script all done.
-
John DeCrey: I was just
-
going to say you
can change it to
-
headless so we can see
the difference there.
-
Jerik: Yeah, it's just
not going to bring up
-
the browser window this time
-
and it should just
post the data there.
-
John DeCrey: There we go.
-
This is
-
one of the card
along assignments.
-
If you get it running
and you're good with
-
this before we move on,
-
you can submit your
script in the canvas,
-
and then, we can move
on to the next one.
-
Jerik: The next one. Where is
-
the deals page? That was weird.
-
This one is going to go
to the Amazon Deal of
-
the Day page and this
script is going to
-
get these deal titles along
with the link to the deal,
-
and it's going to
just print them out,
-
or we're going to save them
-
as a dictionary and
then print them out.
-
Same thing with this one.
-
You can copy it over
from the last script.
-
This is just stuff to get
everything set up and
-
then we're going to
-
do driver.get to
go to that page.
-
Instead of using this URL,
-
I'm just going to use this.
-
This seems like it will
work in the future better.
-
It's Amazon.com/deal.
-
Then same thing.
-
We're going to let it sleep for
-
five seconds to make sure
everything is loaded.
-
This script is going
to be a little more
-
complicated and here's
the thought process
-
to why we need to
do it this way.
-
For this one or for
the last script,
-
we just got a
single element from
-
Selenium using that
fine-by CSS thing.
-
But this we're going to want
to get every single one
-
of these titles and links.
-
I kind of see here in the
source how these are set up.
-
They're each in their
own little div.
-
We're going to find a selector
that gets all the divs.
-
If I click here and then
look in the source code,
-
you can hover over these,
-
and what I'm looking for now
is something that is unique,
-
that I can hook to
with the CSS selector.
-
Sorry, there's a lot here.
I want to make sure I
-
find the right one.
-
I'm going to find a
selector that finds
-
this div that's highlighted,
-
and then we can
-
drill down and get the
title and links from that.
-
We'll make a variable
called deal divs,
-
be similar to the last one.
-
This time though is really
-
important and in
the last classes,
-
a few people always miss this.
-
You want to make sure
this one is plural.
-
Let's find elements.
-
John DeCrey: Element with
an S. Easy to get confused.
-
If at the end
-
have some issues it's
not running as expected,
-
go back to this line,
-
to the line that you
have, and make sure
-
that it's elements
with S.element.
-
Jerik: Going back to this CSS,
-
I can search for this class.
-
I'll show you
another cool thing.
-
When you have Chrome
dev tools open,
-
you can press Control
F that brings up
-
this search bar that you
can search for selectors.
-
I use this a lot to
-
test selectors to make sure
I'm getting the right thing.
-
I'm just going to type
in the selector I
-
am thinking I want to use here.
-
These selectors will be
posted to, right, John?
-
John DeCrey: We can post
it. Your mouse cursor is
-
kind of covering what
you're typing there.
-
Jerik: This isn't too
important to the lessons.
-
Just in case you're
interested on
-
like my thought process
of finding selectors.
-
Then it's that deal
grid item module.
-
This star equals is
searching for a wild card,
-
so it's going to do any
class containing this.
-
We can see that it's
-
pining these divs,
which is good.
-
Then this found them all.
-
I'm going to go one
more down though.
-
To do that, let's do read of n,
-
and then div then it's
going down one more level.
-
The reason I like this
search thing to test
-
selectors is now I
can go and make sure.
-
John DeCrey: Narrow
it down and confirm
-
that you got the right one.
-
Jerik: Now when this runs,
-
instead of finding one of these,
-
this is setting all of
these to that variable.
-
So that is that.
-
Now we'll make an empty list for
-
the titles and an empty
list for the links too.
-
Then we can iterate through
-
the list of those
div elements and get
-
out what we need and
append it to these lists.
-
So to do that, we'll just do
for deal-div in deal_div.
-
John DeCrey: Just pay attention
-
on one that is without the S,
-
the one right after the for
-
loop is your variable
in the fourth in
-
that loop just to not
get confused there.
-
Jerik: We'll go deal titles.
-
I wanted to make
that [inaudible].
-
Deal titles, append.
-
This deal_div now when it gets
to this point is going to
-
be a single div.
-
So the first time it
iterates through it,
-
it's going to be this first one.
-
I'm going to expand
this because we want
-
to get the title.
-
Let me do this.
-
This text right here.
-
These smart DIY and home tools,
-
that's what we're trying to get.
-
This class looks like
something we could hook to.
-
We could do another wild card
to search so we don't have
-
to get this random.
-
We'll do deal_div.find element.
-
This is going to be single
-
now because we find in
the single element now.
-
John DeCrey: Yes.
Make sure it's find
-
element, not elements.
-
Jerik: Yes.
-
Then okay,
-
so from what was that first one?
-
We found this one and then
we drill down to that.
-
So from there we need
to go down and a tag,
-
so we need to go to this a tag.
-
So to do that, we'll
do a.a-text-normal.
-
That will bring us
down into this class.
-
Then we need to go down one
-
more into this deal
content module thing.
-
We can do a wildcard
search on that just
-
to make sure we're
getting the right thing.
-
Then before that last
closing parenthese do.text,
-
this will get the text from it.
-
This.text is actually
doing the same thing
-
as the get.attribute interHTML.
-
For some reason, text would
-
not work on this website though.
-
I have no idea why.
-
I will do a breakpoint and
-
this hour 24 would have
the text in there,
-
but for some reason when I would
-
do hour 24, that text
it wouldn't work.
-
I have no idea why.
-
Then now
-
we just need to
get the link deal,
-
links start, append,
-
and then using
-
that same deal_div
find_element, singular.
-
Then this one is right here.
-
So we just need to
drill down once
-
from that original div.
-
Then this one we're
-
going to do get attribute.
-
The attribute we want is this
HF that has the link to it.
-
That's that. This is
-
going to be a
dictionary in the end.
-
Title with links equals.
-
This dictionary zip thing is
just something I found that,
-
where you can take two lists and
-
this merges them
into a dictionary,
-
passing the deal_titles
and deal_links,
-
and then we can print the
title with the links.
-
John DeCrey: So ultimately that
-
becomes a dictionary title?
-
Jerik: Yep. This
title with links is
-
a dictionary that
has both of those.
-
There it is.
-
John DeCrey: Hey, look at that.
-
Jerik: Sweet.
-
John DeCrey: See, can
I get the price too?
-
Was that in there?
-
Jerik: They used to have
like one item per price,
-
but now they have these
-
are going to be a bunch
of deets in there.
-
John DeCrey: Fifteen
percent off.
-
Jerik: Yes.
-
John DeCrey: Interesting.
-
Jerik: Yes, there
is that script.
-
Should we go ahead and
move on to the KSL one?
-
John DeCrey: Let me talk
about this for a minute.
-
There's actually
business models that
-
are around this thing.
-
There's companies
that have that screen
-
scrap Amazon's site and
showing the trends of
-
pricing over time because
sometimes you can look at
-
certain products that
might be cheaper
-
in some time part of
the year than others.
-
Where you can maybe make a wise,
-
maybe a smarter decision
on when to buy something,
-
especially on the cost.
-
But even like business
opportunities are there as well.
-
I do want to also caution
a little bit with ethics
-
because with knowledge is
-
power and it comes
responsibility.
-
There is the ethics of things,
-
especially screen scraping and
-
other people's content
and stuff like that,
-
so just something
else to be aware
-
and consider on
certain projects.
-
But I think that's all I
have to say about that.
-
Actually at one time I did have
-
some code fragment to
a pen to this to take
-
all the findings and put it
into a vocal database like
-
SQL-lite but I think
we'll skip that for now.
-
But we could easily extend
this script and save
-
it to a database or even to
a CSV file or something.
-
That's pretty much for all the
scripts that we're giving.
-
We could easily save off.
-
You want to move on
to the next one?
-
Jerik: Yes.
-
John DeCrey: Go ahead and if you
-
want to submit this one into
-
canvas in the Amazon I
believe it's just called,
-
Amazon Web Scrape Lab,
-
so you can submit that into
that assignment in Canvas and
-
then we'll go to the next one.
-
Jerik: This one is
-
the one my testing I
-
found that furniture gets
posted the most often.
-
[NOISE] What this is
-
going to do is it's going
to go to this page,
-
like we're searching
for furniture to buy,
-
it's going to get
this first listing
-
then it's going to save that off
-
and check again
every 15 seconds.
-
Once this changes, then
we know a new ad has been
-
posted and it's going to tell us
-
the newest ad that was listed.
-
One thing to note on this is
-
these first three ads are
like sponsored spots.
-
They were posted days ago,
-
so when we do our CSS selectors,
-
we're going to find
this fourth one.
-
But anyway, so do
-
the new file and then same
thing with the other scripts,
-
stuff is just a set up.
-
Then [NOISE] I'm going to
-
put this setting
section at the top.
-
These are things that you
might want to change.
-
If you want to actually
use this as a deal finder,
-
you can put in
-
a different search criteria
-
or you can have it
checked more often.
-
In my experience, 15 seconds
between checking is a lot.
-
I think you'll get blacklisted
-
pretty quick if you did that.
-
When I was running
-
this deal finder for
-
its intended purpose to
actually try and find deals,
-
I would run it through
-
a BPN2 just so I didn't
get my IP banned again.
-
John DeCrey: What's
the recommendation
-
, maybe every minute.
-
Jerik: I think I was
doing every two minutes
-
and you could even go
-
bigger than that depending on
what you're searching for.
-
If you're searching for
something really niche,
-
you could even do like once
every 30 minutes probably?
-
John DeCrey: Yes.
-
Jerik: It just
depends on how much.
-
John DeCrey: I agree
to, o every 15 seconds
-
is pretty excessive.
-
I probably avoid anything
other than 30 seconds.
-
Jerik: Yes.
-
John DeCrey: Even every
minute would be adequate.
-
I was just thinking you could
use a random generator.
-
Random number generator.
-
Jerik: Yes. That's
actually a thing.
-
When I was working at
that job where we were
-
scraping the funeral
home websites,
-
we did like a bunch of
other things we were
-
scraping for too but it was
-
always like a cat and
mouse game between us
-
and the people not wanting
us to scrape their stuff.
-
But that was one of
the things we did
-
was add random timeouts,
-
so it seemed more like a human
was browsing their site.
-
John DeCrey: Jarik and
I used to work together
-
in digital forensics and
-
specifically mobile but we were
-
always looking for and
working with hackers that
-
have workarounds on
mobile security to
-
get to the data for criminal
investigations and whatnot.
-
But same thing,
-
it was always a cat and mouse.
-
The manufacturers
would always close
-
the security holes and then
-
the hackers would find new ones,
-
and the forensic industry
-
would expose and put
it in their product.
-
[LAUGHTER]
-
Jerik: Just for this exercise,
-
I'm going to put 15 seconds
-
here just so it
doesn't take too long.
-
This furniture section gets
posted too all the time.
-
I'll be surprised if it doesn't
-
find one in the
first 15 seconds.
-
John DeCrey: Yes. [NOISE]
-
Jerik: Now I'm going to make
a function that's going to
-
get the first listings info.
-
This listings info, call it,
-
get_first_listing_info
and this is
-
where we'll have our
first driver.get.
-
This is going to go to
that classified link.
-
Then like the other scripts,
-
we'll let it sleep
for five seconds.
-
Now we want to get the link
-
driver.find element by CSS.
-
I think on this one,
-
let's see if we want the link.
-
This is what we want.
-
Let me see if I can get it by
this item info title link.
-
Except we are going
to want it from
-
this fourth listing so
we can't do it by that.
-
[NOISE]
-
Let's see, we got
search results.
-
We'll want to go from
there to this Div,
-
to that class.
-
I'm looking for when it goes to
-
this first ad and then we
can drill down from there.
-
These will be posted
somewhere too, right John?
-
This will be a long one.
-
John DeCrey: If you want
to post in the chat
-
then I'll put it in the Canvas.
-
Jerik: Okay.
-
I'll send it to you
-
after the recording is done.
-
This can be a div to
a section to a div,
-
to another div and by the way,
-
this is going from
-
the search result to
the div section div.
-
Then here I had to do a div
-
and then this nth child,
-
this is saying,
get the first one.
-
That's just saying, get
this first div after
-
this div so this is getting
-
that and then we can do
-
another nth child here to
get the fourth section,
-
which is the third that we want.
-
After that the sectioned
-
nth child the fourth one.
-
Then from there we go
-
to another div,
-
the 2dh div, and
-
then h_2 div and the
link, it's a long one.
-
It looks like there's a tag.
-
Then we'll do the
same thing we did on
-
the Amazon one we'll get
the attribute intrif.
-
John DeCrey: Okay,.
-
Jerik: I hope I typed
that all in right,
-
but we'll see, then the title.
-
Then this is going to be
-
similar from Oh,
-
wait, this is really similar.
This is the same thing.
-
Because here's the
title of this brand
-
new velvet ivory couch.
-
It's the same selector.
-
We're just getting the
text from it this time.
-
John DeCrey: I don't know
if I want a velvet couch.
-
Jerik: I know. It
look pretty though.
-
That's one of those
couches that it's
-
just a show couch. You
don't actually use it.
-
John DeCrey: Yeah.
-
Jerik: This function will return
-
the link in the title.
-
John DeCrey: When you do
two returns like that,
-
it's returning as a
tapple, is that right?
-
Jerik: Yeah. Now,
-
we can call this function
and save it to listing info.
-
Then we're going to
save off the link,
-
so we can check it
again in 15 seconds.
-
John DeCrey: That's our Delta.
-
Jerik: Yep.
-
John DeCrey: Sort of thing.
-
Jerik: The first listing link,
-
temp equals listening info
-
and it's the first thing
in there, the link.
-
The title is going to
be the second thing.
-
Scripts like this to
you, I like to print
-
out like what's happening
-
if it's going to be
running for a while.
-
John DeCrey: That is for sure.
-
Jerik: We'll print this out
-
just to log what's going on.
-
First, listing title.
-
It's going to be the title,
-
and then we'll print
out the link to you.
-
I'm going to add
-
a counter so we can print out
how many times this is run.
-
Then now, I'm going to
make a while loop and this
-
is where it's just going to
run until it finds a new ad,
-
so it'd be, while true
-
that means it's
just going to run
-
forever until I
tell it to break,
-
we'll increase the
check count and
-
then we'll sleep for our
time we specified up here.
-
John DeCrey: Okay.
-
Jerik: You print out
another log message.
-
This will print out
that it's checking
-
and then we'll get
a new listing info.
-
That is, we'll get
it from the get
-
first listing info
function we made.
-
Then first list link
-
can be listing info[0].
-
John DeCrey: So info[0] was
the title or the or no?
-
Jerik: Zero is the link
and then 1 is the title.
-
John DeCrey: Okay,.
-
Jerik: One, and then
-
this is where we'll check to see
-
if it's different or not.
-
If it's different we
know it's a new ad so if
-
first listing link temp
-
does not equal
first listing link,
-
this is when we know
it's different.
-
John DeCrey: Lets
checked in our Delta?
-
Jerik: Yep.
-
John DeCrey: There's the new ad.
-
Jerik: Man, this break
will stop the program.
-
John DeCrey: Okay.
-
Jerik: We'll see if this works.
-
John DeCrey: There's different
options that you can put in here.
-
If you were to
actually use this for
-
your own self and
looking for something,
-
and once it found something
that matches your criteria,
-
you can have it email,
-
send an email to you and
-
you probably even do a text
message, couldn't you?
-
Jerik: Probably,
there's probably
-
a module for that, to be honest.
-
I get email.
-
It was pretty easy
to set up to email.
-
There's SMTP module that I used.
-
John DeCrey: What's the error
that we got there on the bottom?
-
Jerik: That was that stuff.
-
I'm not sure what's
the logging that
-
whether it's selenium or chrome.
-
John DeCrey: We can
ignore that stuff?
-
Jerik: Yeah.
-
John DeCrey: Okay.
-
Jerik: I need to.
-
John DeCrey: Objection.
-
Jerik: A copy of that.
-
John DeCrey: This is
attempt of number one.
-
In 15 seconds we'll check
to see if there's a new ad.
-
Number two, has there been
new ads being posted?
-
Jerik: One just got posted.
-
John DeCrey: Sweet.
-
Jerik: You can go
cop yourself as
-
Freeman Style grandfather clock.
-
John DeCrey: Let's
look at that URL,.
-
Jerik: Nice.
-
John DeCrey: Brand new ad that
we just found within seconds.
-
Jerik: Yep.
-
John DeCrey: Very cool.
-
Jerik: That is it.
-
John DeCrey: Then we can
change it to headless,
-
browsers are not showing
while your script is running.
-
Jerik: That is all I have.
-
I'll turn it back to you.
Usually, ask questions,
-
but back to you John.
-
John DeCrey: Many thanks.
This concludes that lab.
-
If you would submit this into
-
the canvas that's called KSL,
-
it's just called KSL Lab.
-
Many thanks. Jerkally,
-
I appreciate all
the time that you
-
have spend on this and always
-
enjoyable talking
with you and hearing
-
your stories so I
really appreciate it.
-
Jerik: No problem.
Thanks for having me.