Six thousand miles of road,
600 miles of subway track,
400 miles of bike lanes,
and a half a mile of tram track,
if you've ever been to Roosevelt Island.
These are the numbers that make up
the infrastructure of NYC,
these are the statistics
of our infrastructure.
They're the kind of numbers
released in reports by city agencies.
For example, the Department
of Transportation will probably tell you
how many miles of road they maintain.
The MTA will boast how many miles
of subway track there are.
But most city agencies give us statistics.
This is from a report this year
from the Taxi & Limousine Commission,
where we've learned that there is
about 13,500 taxis here in NYC.
Pretty interesting, right?
But did you ever think about
where these numbers came from?
Because for these numbers to exist
somebody at the city agency
has to stop and say hmm, here's a number
that somebody might want to know.
Here's a number
that our citizens want to know.
So they go back to their raw data,
they count, they add, they calculate,
and then they put out reports.
And those reports
will have numbers like this.
The problem is, how do they know
all of our questions?
We have lots of questions.
In fact, in some ways there's literally
an infinite number of questions
that we can ask about our city.
So the agencies can never keep up.
So the paradigm isn't exactly working
and I think our policy makers realize that
because in 2012, Mayor Bloomberg
signed into law what he called
the most ambitious and comprehensive
open data legislation in the country.
In a lot of ways he's right.
In the last two years the city's released
1,000 data sets on our open data portal
and, it's pretty awesome.
You look at data like this,
and instead of counting
the number of cabs,
we can start to ask different questions.
So I had a question:
When is rush hour in NYC?
It can be pretty bothersome.
When is rush hour exactly?
And I thought to myself,
these cabs aren't just numbers,
these are GPS recorders driving around
in our city's streets recording
each and every right they take.
There's data there.
And I looked at that data
and I made a plot
of the average speed of taxis in NYC
throughout the day.
You can see that from around midnight
to around 5:18 AM, speed increases,
and at that point, things turn around.
They get slower, slower and slower
until about 8:35 AM
when they end up at 11.5 mph.
The average taxi is going at 11.5 mph
in our city streets,
and it turns out it stays that way
for the entire day.
(Laughter)
So I said to myself, I guess
there's no rush hour in NYC,
there's just a "rush day."
(Laughter)
Makes sense.
This is important
for a couple of reasons.
If you are a transportation planner,
this might be pretty interesting to know.
But if you want to get somewhere quickly
you now know to set your alarm
for 4:45 AM and you're all set.
New York, right?
But there's story behind this data,
it wasn't just available as it turns out.
It actually came from something called
a Freedom of Information Law Request,
or a FOIL Request.
This is a form you can find on
the Taxi & Limousine Commission website.
In order to access this data,
you need to go get this form,
fill it out, and they will notify you.
And a guy name Chris Whong
did exactly that.
Chris went down and they told him,
"Just bring a brand new hard drive
to our office,
leave it here for 5 hours,
we'll copy the data and you take it back."
And that's where this data came from.
Now, Chris is the kind of guy
that wants to make the data public,
so it ended up online for all to use
and that's where this graph came from.
And the fact that it exists is amazing.
These GPS recorders - really cool!
But the fact that we have citizens
walking around with hard drives
picking up data from city agencies
to make it public -
it was already kind of public,
you could get to it,
but it was "public", it wasn't public.
And we can do better than that as a city,
we don't need our citizens
walking around with hard drives.
Now, not every dataset
is behind a FOIL request.
Here's a map I made with
the most dangerous intersections in NYC
based on cyclist accidents.
So the red areas are more dangerous.
What it shows is first
the East side of Manhattan,
especially in the lower area of Manhattan,
has more cycle accidents.
That might makes sense
because there are more cyclist
coming off the bridges over there.
But there's other hotspots worth studying.
There's Williamsburg.
There's Roosevelt Avenue in Queens.
This is exactly the type of data
we need for vision zero.
This is exactly what we're looking for.
But there's story
behind this data as well.
This data didn't just appear.
How many of you guys know this logo?
Yeah, I see some shakes.
Have you ever tried to copy
and paste data out of a PDF
and make sense of it?
I see more shakes.
More of you tried to copying and pasting
than knew the logo. I like that.
What happen is, the data
that you just saw was actually on a PDF.
In fact, hundreds, and hundreds,
of pages of PDF put out by our own NYPD,
and in order to access it,
you either have to copy and paste
for hundred and hundred of hours,
or you could be John Krauss.
John Krauss is like,
I'm not going to copy and paste this data,
I'm going to write a program.
It's called the NYPD Crash Data Band-Aid.
And it goes to the NYPD's website
and it would download PDFs.
Every day with it would search;
if it found a PDF, it would download it,
and it would run
some PDF-scraping program,
and out would come the text
and it would go on the Internet,
and people could make maps like that.
And the fact that the data is here,
that we can have access to it -
every accident, by the way, is a row
on this table.
You can imagine how many PDF that is.
The fact that we
have access to that is great.
But let's not release it in PDF form.
Because then we're having our citizens
write PDF scrapers.
It's not the best use
of our citizens' time,
and we, as a city,
can do better than that.
The good news is that
the de Blasio Administration
actually released this data
a few months ago,
so now, we can have access to it.
But there's a lot of data
still entombed in PDF.
For example our crime data,
still is only available in PDF.
And not just our crime data,
our own city budget.
Our city budget is only
readable right now in PDF form.
And it's not just us
that can't analyze it -
our own legislators
who vote for the budget,
also only get it in PDF.
So our legislators cannot analyze
the budget that they are voting for.
And I think as a city we can do
a little better than that as well.
Now, there's a lot of data
that's not hidden in PDFs.
This is an example of a map I made.
And this is the dirtiest waterways in NYC.
How do I measure dirty?
Well, it's kind of a little weird,
but I looked at the level
of fecal coliform,
which is a measurement of fecal matter
in each of our waterways.
The larger the circle,
the dirtier the water.
The large circles are dirty waters,
the smaller circles are cleaner.
What you see is inland waterways.
This is all data that was sampled
by the city over the last 5 years.
And inland waterways are,
in general, dirtier.
That makes sense, right?
And I learned a few things from this.
Number 1: never swim in anything
that ends in creek or canal.
Number 2: I also found
the dirtiest waterways in New York City
by this measure, one measure.
In Coney Island Creek,
which is not Coney Island you swim in,
luckily, it's on the other side.
But Coney Island Creek, 94% of samples
taken over the last 5 years
have had fecal levels so high,
that it would be against state law
to swim in the water.
And this is not the kind of fact
that you're going to see
boasted in a city report
or on the front page of nyc.gov.
You're not going to see it there,
but the fact that we can
get to that data, is awesome.
Once again, it wasn't super easy,
because this data was not
on the open data portal.
If you were to go to the open data portal,
you'd see just a snippet of it,
a year or a few months.
It was actually on the Department
of Environmental Protection's website.
Each one of these links is an Excel sheet,
and this Excel sheet is different.
Every heading is different:
you copy, paste, reorganize.
When you do you can make maps
and that's great, but once again,
we can do better than that as a city,
we can normalize things.
We're getting there because
there's this website that Socrata makes,
called the Open Data Portal NYC.
This is where 1100 data sets,
that don't suffer
from the things I told you live,
and that number is growing,
and that's great.
You can download data in any format,
be it CSV or PDF or Excel document.
Whatever you want,
you can download the data that way.
The problem is, once you do,
you'll find that each agency
codes their addresses differently.
So, one is street name,
intersection street,
street, borough, address building,
building, address.
So, once again, you're spending time,
even when we have this portal,
you're spending time
normalizing our address field.
I think that's not the best use
of our citizens' time,
we can do better than that as a city.
We can standardize our addresses.
If we do, we can get more maps like this.
This is a map of fire hydrants
in New York City.
But not just any fire hydrant.
These are the top 250
grossing fire hydrants
in terms of parking tickets.
(Laughter)
So I learned a few things from this map.
Number 1: just don't park
on the Upper East side.
Just don't. No matter where you park,
you will get a hydrant ticket.
Number 2: I found the two highest
grossing hydrants in all of New York City.
They are on the Lower East side,
and they are bringing in over
55,000 dollars a year in parking tickets.
And that seemed a little strange to me
when I noticed it,
so I did a little digging,
and it turns out
what you had is a hydrant
and something called a curb extension,
which is like a seven-foot space
to walk on,
and then a parking spot.
So these cars came along and the hydrant -
"It's all the way over there, I'm fine,"
and there was actually a parking spot
painted there beautifully for them.
They would park there and the NYPD
disagree with the designation,
and would ticket them.
And it wasn't just me
who found a parking ticket.
This is the Google street view car
driving by, finding same parking ticket.
So I wrote about this
on my blog, on I Quant NY,
and the DOT responded and they said,
"While the DOT has not received
any complaints about this location,
we will review the roadway markings
and make any appropriate alterations."
I thought to myself, you know,
typical government response,
all right, moved on with my life.
But then, a few weeks later,
something incredible happened.
They repainted the spot.
And for a second I thought
I saw the future of open data
because think about what happened here.
For five years, this spot
was being ticketed, and it was confusing.
And then a citizen found something,
they told the city and within a few weeks,
the problem was fixed. It's amazing.
A lot of people see open data
as being a watch dog, it's not.
It's about being a partner.
We can empower our citizens to be
better partners for government,
and it's not that hard.
All we need are a few changes.
If you're FOILing data,
if you seeing your data
being FOILed over and over again,
let's release it to the public, that's
a sign that it should be made public.
And if you're a government agency
releasing a PDF,
let's pass a legislation that requires you
to post it with your underlying data,
because that data
is coming from somewhere.
I don't know where,
but you can release it with the PDF.
And let's adopt and share
some open data standards.
Let's start with our addresses
here in New York City.
Let's just start
normalizing our addresses.
Because New York is a leader in open data.
Despite all this, we're absolutely
a leader in open data,
and if we start normalizing things,
and set an open data standard,
others will follow.
The state will follow,
maybe the federal government,
other countries could follow,
and we're not that far off from a time
where you can write one program
and map information from a 100 countries.
It's not science fiction,
we're actually quite close.
And by the way, who are we
empowering with this?
Because it's not just John Krauss,
it's not just Chris Whong.
There are hundred of meetups
going around in New York City right now,
active meetups.
There are thousands of people
attending these meetups.
These people are going after work
and on weekends,
and they're attending these meetups
to look at open data,
and make our city a better place.
Groups like BetaNYC who just last week,
released something called citygram.nyc
that allows you to subscribe
to 311 complaints
around your own home,
or around your office.
You put in your address,
you get local complaints.
And it's not just the tech community
that are after these things.
It's urban planners like the students
I teach at Pratt.
It's policy advocates, it's everyone,
it's citizens from a diverse
set of backgrounds.
And with some small incremental changes,
we can unlock the passion
and the ability of our citizens
to harness open data
and make our city even better,
whether is one data set
or one parking spot at a time.
Thank you.
(Applause)