Machine Learning in Ecological Science and Environmental Management


It’s a great pleasure today for me to introduce
Tom Dietterich. I’ve known Tom for quite some time. He…which I brought a prop. He
actually was an author on my first machine learning textbook. And I got to meet him when
I was a graduate student at Berkley. And I guess this was not long after Tom started
his first faculty position after getting his Ph.D. at Stanford. He came down to visit Berkley
and I enjoyed getting to meet him then. And even then as you see with this textbook he
was playing a very important role in shaping the machine learning community. And he’s
gone on since then to continue to play a role in shaping the machine learning community.
So he is a Triple AI fellow and ACM fellow. He’s been program chair for for Triple AI,
program chair for NIPS. He’s been very active in the International Machine Learning Society,
and really a mentor in the field to a lot of young people. And Tom is one of the few
people to this day who really sees the entire field of machine learning. And as the fields
have become increasingly specialized, it’s rare to find people who can appreciate the
whole field and take it all in. And that’s one, the great, many great things that Tom
is known for. And today he’s going to be telling us about a very important application
of machine learning, which is to computational ecology and environmental management. Thank you very much Ron. So, the work I’m
going to be describing today is obviously a very collaborative interdisciplinary, and
the collaborators in particular that I want to mention is my graduate student Ethan Derigensky,
two post docs Rebecca Hutchinson and Dan Shelton, and then colleagues Wanking Wong who’s in
Computer Science, a machine learning person Clair Montgomery who’s a forest ecologist.
And then several folks at the Cornell Lab of Ornithology. So, if we look at the earth’s ecosystems
or the biosphere, it’s a very complex system. And I think we can agree that in many ways
we have not managed it in a sustainable way. And so I thought I would start the talk by
asking about why is that so, and is there anything that computer science can do to help?
And I think, I mean everybody had their own views of why this is so. But I think maybe
there are three reasons. First of all we don’t understand the system very well. So, it’s
very hard to manage a system when it’s behaving very unpredictably. And there was a very thought
provoking article by a group of authors, first author was Doak in 2008 where they talk about;
they ask the question are ecological surprises inevitable. Or is the dynamics of ecosystems
so complex that we will never really be able to predict the behavior of the systems reliably.
And to sort of support this well, thesis they go through, I don’t know, fifteen or twenty
different examples of situations where either something completely surprising happened like
the population of a species in the Gulf of Alaska suddenly exploded and then five years
later disappeared again, with no one knowing why. Or examples where we attempted a an intervention
in an ecosystem and then it behaved in a, the outcome was very different from what we
had intended. And one example that is very current right now in the Pacific Northwest
is the Northern Spotted Owl. So, during the late 80’s and 1990’s we had what we call
the owl wars in Oregon, because there’s this species that was listed as an endangered
species, the Northern Spotted Owl, and its preferred habitat was, is old growth forests.
And these, most of the old growth forests on private land had already been cut, and
so now there was a lot of logging in the national forests in the public lands and the conservation
community wanted to shut down all that logging. And obviously the Forest Products Industry
which was a very important part of the Oregon economy was dependent on it to a large extent.
And it took you know the President had to come to the state and bring everybody together.
And they came up with this zone called the Northwest Forest Plan, which by and large
did stop logging forests and federal lands, which had a devastating impact on the economy.
And the hope was that this would help the spotted owl recover. But spotted owl numbers
have continued to decline since then. And partly that’s because there was another
species that has come in from the North. The Canadian Invader, which is known as the Barred
Owl. And it turns out it is more reproductively successful and more aggressive. And it seems
to be pushing out the spotted owl. So, that’s the kind, that’s another example of an ecological
surprise, and it’s one of the reasons managing the ecosystems is so difficult. I think another reason that we’ve had trouble
managing ecosystems is that we’ve often focused on only a small part of a very large
system, because the system is so complicated, and we’ve focused on only one piece of it.
So, that could be a single species like the Northern Spotted Owl might be an example of
that. And we’ve often also ignored some of the larger contexts. There’s a colleague
of mine, Heidi Jo Albers who has studied things like creating forest reserves in tropical
forests. And often these forests, when you design these reserves, you need to consider
what the native people might be using that forest for. If you don’t take that into
account, in her case that meant creating large buffer zones around the actual forest, you
end up with those people making encourageons into your bio reserve and degrading it in
one way or another. So, having to consider the spatial aspects,
the interactions among multiple species, these are things that are often ignored in a lot
of ecology and ecosystem management. And finally, I think particularly if you look in agriculture,
we often deliberately manipulate a system to simplify it in order to try to manage it.
So, in crop agriculture for example we try to remove all of the other species so we only
have to worry about one species. But as a consequence we have to provide a lot of the
support for that species that would normally be provided by other species, like fertilizers
and pest management and so on. We have to provide those as exogenous inputs. And many
of those like I say some of the nutrients that we’re providing now are becoming expensive.
And this is not a sustainable way of managing those systems. Well, and I’m sure you could go on and list
many other things. What can Computer Science have to offer? I mean the reason I’m here
is because I think there are several things. First of all if we look at the question of
our lack of the knowledge of the function and structure of the systems, we now have
a couple of ways that we can contribute. First of all you know, we and our colleagues in
nanotechnology and electrical engineering, we’re producing all kinds of novel sensors
that we…so we have wireless sensor networks. We can create thousands of sensors, put them
into these systems, and be able to monitor them much better. And of course the machine learning community
and computational statistics community have been working on building the modeling technique
that can scale up to much larger systems. Although of course it’s still a challenge,
but much more than say was possible twenty years ago. When it comes to this question
about focusing on subsystems, some of the same story. Obviously with our modeling tools
we can now look at the larger system in which the smaller system is embedded. But I think
we also now have tools in say mechanism design to look at the interactions of different parties
that might be competing for a resource or tools in modern optimization that let us find
good solutions to very large and complex optimization problems. And again when we come to agriculture it’s
a different combination of these three things. But better sensing, better modeling, and better
optimization all have a role to play in allowing us to model these systems and manage them
better. So, this general field that we’re calling
computational sustainability, is one of the big things in my group is we are joint with
Karla Gomez at Cornell have one of the NSF expeditions in Computer Science projects.
So, a ten million dollar grant to try to boldly go where no computer scientist has gone before,
and in particular to look at computational methods that can contribute to sustainable
management of (unintelligible) systems. And so as a machine learning person I tend to
think about the computational challenges that are here in terms of a pipeline, from data
to models to policies. And so what I’m going to do in this talk is first talk about what
I see as some of the work that’s going on in each of these areas outside of my group
briefly, and then drill down on three specific things that we’re doing in my group that
contribute to this area. And so I’m hoping you’ll get a sense of the range of challenge
problems that are here and some of the opportunities from a Computer Science perspective. So, the first thing I want to talk about is
sensor placements. And Andres Crowlza (sp?) and his students have been doing some really
exciting things there. So, this particular example is a case where they’re (which I’m
not supposed to point with this, I point with this) where this is a city’s water network.
And they want to know where should we place sensors in this network in order to detect
pollutants or maybe an attack, but a chemical that’s introduced into the system. And their
main tool that they use is something called sub modularity, right, which is the idea that
if, that if you have a function, it’s a function of a set. In this case the set of
places that you have put your sensor. And it exhibits a diminishing returns property
that you know once you’ve placed K sensors the K plus first one is going to give you
less benefit than the K one and so on. Then you can, that’s, you can formulate, if your
objective function is sub modular, then the greedy algorithm in various sophisticated
variants of it, give a performance that is within a constance of optimal or proportional
or fraction of optimal. So, you can get very good results and in fact they won some competitions
on for water, water quality monitoring. And they’ve looked at many other problems as
well. So, sensor placements and of course this has a lot of relationship to the huge
literature and experiment design and its existence. The second thing that comes up is what I call
data interpretation for lack of a better word, which is often the raw data you get from your
sensors, is not at the level you want for your modeling effort. And this is particularly
true for image data. So, for the last eight years I’ve been running a project that we
call the Bug ID Project, where we take photos of moths and arthropods and soil and freshwater
larvae. And we want to identify them to the genous level and ideally to the species level.
And this might be for instance input to building a model of the distributional species in space
to tracking bases species. Or even to water quality monitoring, where you want this histogram
by species of how many individuals you had in a given stream. So, this particular picture
here is from a collaborative of mine Qing Yao, who’s looking at rice pests, and they
put out these light traps at night. And moths wonderfully trap themselves in these traps.
And then they spread them out on a glass table, photograph them from above and below and then
they want to account to identify to species level. The third problem then I call data integration.
I guess that’s an established term. The problem is with a lot of ecological modeling
challenges you have data coming from a wide variety of sources, and a wide variety of
scales in time and in space. And you need to somehow pull all this together in order
to then fit a model to the data. And so in what we’re doing for instance on bird migration
modeling, we’re dealing with data. Everything from stuff that basically never changes like
a digital elevation model of the terrain to things that are maybe changing on a fifteen
minute time scale, like the temperature or the weather and having to integrate all of
these things. And then we come to the part that you know
is really my core competence, which is model fitting and machine learning. And so there
are of course a wide range of models in ecology that people would like to fit. We’ve been
looking really at just three kinds of models. The first are what are known as species distribution
models. And the question there is can we create a map of where a species is found in the landscape.
And so that’s very close to sort of the core machine learning supervised learning
problem. You’re given a site where with some set of features describing it and then
either the species is present there or absent there. Another kind of model is something called
a Meta-Population Model. And here we imagine that we have a set of patches arranged in
space. And a patch may be occupied by a species or not. And over time the species may reproduce.
It may spread to other patches; it may go locally extinct and then get re-colonized.
So, that’s sort of focusing on space and looking at what comes in and out of a cell.
And then the other (unintelligible) sort of migration or disbursal models where you follow
the organism instead. So, you want to model the trajectory say that a bird follows or
the timing of movement. And so there’s work in machine learning
on all of these. One I want to show is what’s called a STEM Model that was developed by
Daniel Fink at the Cornell Lab of Ornithology. And so at the Lab of Ornithology they have
a big project called Project E-bird, where if you’re a birder you can go out observing
in the morning say and then fill out a checklist on their webpage and say here’s what I saw
and I didn’t see anything else. You can click a button for that and then upload it.
There are a lot of avid birders out there. So, we’re now getting like a million data
points a month from people uploading. And they exceeded three million points in May,
sort of the peak of the breeding season. And so there’s a lot of data. Unfortunately
it’s completely uncontrolled. Right? So, you have lots of variation expertise. You
have no control over where people go. But you can still do some interesting things.
And what Daniel does is fit ensembles of decision trees to try to predict whether this species,
in this case the Indigo Bunting, will be present or absent at a particular place and time.
And so I’m going to show you this movie, but it’s important to realize this is a
series of snapshots. There’s no dynamical model here. But this species winters down
in Central America. And you’ll see the orange colors. That’s the species is predicted
to be present first along the Mississippi Valley and then sort of spread out through
the entire eastern U.S. And then as we move into September, this is sort of a clock ticking
along the bottom you see the species goes back down to Mississippi and disappears from
the U.S. And so this is a really I think a very nice model. And it was used as part of
a something called the State of the Birds Report to try to estimate what fraction of
habitat for each of the something like two hundred species of birds is publicly owned
versus privately owned. And this report came out late last year. So, once we have built a model like this then
it’s time to say well it’s great that we have this model of birds but what can we
use, how can we use that to make policy decisions to manage the ecosystem. And I don’t have
a good example for management with birds but with fish John Leathwick who does excellent
work in New Zealand, so I don’t know if you can tell, but see these gray things over
there, these are the islands of New Zealand. And these blue dots are where and red dots
correspond to places where fishing trollers found, did not find or found, the red ones
are positive, did harvest a particular species of fish. The Mora Moro. And the blue line
around the outside is the exclusive economic zone of New Zealand. And so using this data,
he fit a species distribution model similar to the one that I was just describing except
that instead of estimating presence or absence he’s estimating the catch in kilograms,
so the biomass of the fish. And so these are his estimates. The blue areas there are no
fish at all and then you can see this pattern. And then what he wanted to do was then use
that to prioritize regions for their conservation value in terms of supporting that, allowing
this population to grow. And the left pot is prioritizing them if we ignore the fishing
industry and just say what would be the places that would best encourage the growth of the
species. But of course you really need to consider these within an economic context.
And so the right diagram re-prioritizes them now taking into account the cost of the fishing
industry. And you can see, I mean the main lesson here I think is that there’s still
a lot of places that we can conserve and yet also still have the benefit of fishing. So, but this is a kind of a spatial optimizational
problem to solve. And I’ll be talking about some more of those. So, finally that we have
the problem of policy execution, this is usually of course the chasm to go from a design policy
to one that we can convenience people to actually adopt. And you know at the simplest level
we just have a policy where at each time step we observe the state of the system. And then
we choose the action that our policy tells us to choose. And we go ahead and act. But
in practice we’re often called upon to act in a lot of ecosystem management problems,
before we have a very good model of what’s going on. And so they’re really what we
would call a partially observable markup decision process or worse, where we don’t have a
complete understanding of the system we’re trying to model. I think a challenge here
is that, this means that our policy in our early actions should be designed not only
to achieve the ecosystem goal, but also to help us gather more information about the
system so that we can improve our model. So, we have duel objectives. And these are very
difficult to optimize. And one of the big concerns I think in particularly
in light of these ecological surprises is can we design policies that are robust to
our lack of knowledge. Both to the known unknowns to think that where we know that we’re uncertain
and we can model our uncertainty, and then also to the known unknowns the factors that
we forgot to include in the model. And I think that’s one of the most interesting intellectual
questions. I don’t have an answer for it, but I think that there are some things we
might be able to do. Okay, so that’s the review of the sort of
pipeline. And now I’d like to look at, talk about three specific projects at Oregon State.
So, and these will be in data interpretation and model fitting and in policy optimization.
So, the first project is the dissertation project of my student Ethan Dereszynski. And
he’s going to be graduating soon, so he’s looking for a job. And what he works on is
automated data cleaning in sensor networks. So, Oregon State University operates something
called the H.J. Andrews Long Term Ecological Site. So, NSF funds a collection of these
study sites where there have been, they’re committed to collecting data over long periods
of time and doing long term experiments. So, one of my colleagues Mark Harmon for instance
has started an experiment that is going to last two hundred years that’s called the
Roth Experiment. It’s about trees and how long it takes them to decay. But you know
it takes forever to get tenure in this field. Anyway, in this case we’re looking at these
weather stations that are there. And I’m going to talk mostly about four thermometers.
So, this is a weather tower here and these little L shaped things coming off are have
a thermometer on them. And they’re allegedly at one and a half, two and a half, three and
a half, and four and a half meters above the ground. And we get data from them that looks
something like this. So, every fifteen minutes we get a temperature reading. And you can
see on these curves the up and down motion, this is, those are the daily cycle. The (unintelligible)
Cycle. So, it’s warming up in the daytime and cooling off at night. And it’s kind
of fun, because the thermometer that’s nearest the ground, which is the black line in the,
is the one that’s coldest at night and hottest in the day. So, they, they flip back and forth
like this. And the problem is that these sensors are out in the world and bad things happen
to them. And so someone has to do data quality assurance on these, on the sensor data and
clean it up before we try to do any analysis on it. Now traditionally in the (unintelligible)
Forest, we’ve got three of these towers and then there are many more. But there are
three main ones that have been in operation since the 80’s. And with twelve thermometers
it’s not really much of a burden for someone to go check this data. They just eyeball it
and cluster it in various ways and look for outliers. But of course we now want to, now
we’ve now got Wi-Fi over the entire forest and we want to put out huge networks of things.
And if we have a thousand thermometers this human data cleaning becomes infeasible, unless
we can figure out how to make a Capua out of it and maybe get people to do it. So, the kinds of things that go wrong, like
for instance here this is an instance of what’s called a broken sun shield. And so, the air
temperature sensor is now measuring actually the surface skin temperature of the thermometer
with the sun directly beating down on it. And so you can see in the daytime it spikes
way high, as many as ten degrees higher than the true air temperature. At night it’s
a perfectly good air temperature sensor, but in the daytime, particularly sunny days, not
so good. Can anyone guess what’s going on in the
bottom case here? We have this our 1.5 meter sensor is flat lining for a while. Yes. So, the problem here is this is week
three. So, that means it’s right about now. But it was in 1996. We had a big snowstorm.
And so this is now a snow temperature sensor, instead of an air temperature sensor. In some
sense the thermometer is still functioning correctly, it’s just that the metadata is
wrong. But there’s a lot more going on here. So, you notice that the 4.5 meter thermometer
is still bouncing up and down rather nicely. I mean obviously over here It’s quite cold
these days, even the nights, even in the daytime it’s just barely getting above freezing.
But then what’s happening over here. We’re getting, it really warmed up. I mean it’s
almost in incident fifties at the top of the thermometer tower. And at right around 3500
here it’s starting to rain. And so the snow temperature goes moves up to sort of the triple
point of water for a while. And now the snow is melting and we’re having…and the university
is closed right around 4500 because we had such a huge flood that you couldn’t get
to campus. So, this is, this is how you get a big flood in Oregon is to have what’s
called a rain on snow event. And this was, this was one of them. So, we like to detect these things also you
know because they’re interesting, but we don’t want to assume that this thermometer
is measuring air temperature during this entire period. So, how can we do this? Well, we’d like
a data cleaning system to do really two functions. The first is we’d like it to mark every
data value that we think is anomalous. And so in this case this is a different set of
data. But we’ve put the little red what they call the rug, right a little red tick
underneath each data point that our model predicts is incorrect as something wrong. And then the other thing you’d like it to
do is to impute, or predict, fill in the missing values, what the, what the thermometer should
have been reading if it had been working correctly. And we’re going to do this, we’re going
to do both of these things using probabilistic model. So, the basic probabilistic model we’re
going to use though these are you know Bayesian Networks or probabilistic graphical models
is the following. We’re going to have one node here for each of our variables of interest
and the one that is gray, that’s an observe node. So, this is the observe temperature
at time t. And then there is a hidden node, which is our true temperature that we wish
we could observe directly. Then up here is our sensor state variable. And I’ve made
it a box to indicate that it’s a discrete, whereas these are continuous variables. And
the idea is a very simple sensor model that says when the sensor state is one, that is
normal or working, then the observe temperature has a (unintelligible) distribution who’s
mean is the true temperature x but with some small variance around that. But when the thermometer
is broken and so the state is zero then the observe temperature has a mean of zero and
a gigantic variance. So, basically what we’re saying is completely unrelated to the true
temperature. So, this is a very simple model, and why do we adopt this kind of model? Well
you could try to think about this kind of data as if it were a diagnosis problem that
the sensor has various fault modes and failure modes and you want to predict what they are.
And so you could do a kind of Bayesian diagnosis where you could say well given the sensor
readings and my expectations it looks like it’s a broken sunshield or it looks like
it’s a flat line because of a communications failure or something like this. But the trouble
is we were not confident that we could enumerate an advance all the ways a sensor could fail.
We wanted to have an open ended set. So, the idea here is to treat it more as an anomaly
detection problem where we model the normal behavior of the sensor as accurately as we
can. And then anything that is a serious departure from normal, this model, the normal model
will the first line will give it very low likelihood and it’ll instead get picked
up by this sort of very generic failure model. So, that’s the idea here. So, we can do anomaly detection then by doing
probabilistic inference. We ask the query you know what is the most likely value of
the state of this sensor at time t. And that’s just the argmax over the possible states of
the probability of the states given the observation. And we can also do imputation by asking instead
what’s the most likely temperature given the observed temperature. So, basic probabilistic
inference techniques work just fine. But of course this is a very bad model of the sensor
here. So, the next thing we want to do is add some sort of Markov Model so that we can
look at the history of the sensor. Because we’d like to say well sensors, if it was
working fifteen minutes ago, it’s probably still working now. And if it was broken fifteen
minutes ago, it’s very likely it’s still broken now. So, we’d like to do that. And
similarly of course the actual real temperature doesn’t change that drastically either.
So, we’d like to have some model of the true temperature changes over time. So this
gives us now a Markov version of this. And now we can ask a query like what’s the most
likely state of this sensor this time given the entire observation history. And that also
can be reasonably calculated. But we can go even further than this if we have multiple
sensors as we do on these towers. We could build a separate copy of the model for each
of them and then couple those somehow. So we could say that you know if we know the
temperature of the sensor at the bottom of the tower then we should be able to predict
with reasonable accuracy the sensor next up on the tower. And so this is the kind of thing
we do. In general we learn a sparse joint grousing distribution among all of the t variables.
And then so that we have a connected model. Unfortunately probabilistic inference in these
models starts to become intractable. So, even in the single sensor model, which it with
the Markovian independence, you would think that that would not be a problem. But it is
because of our observed variable. That would be true. If all the variables were discrete
then we could solve that very easily. A simple message passing algorithm will do it. But
because our variables are continuous so there are conditional grousing’s when you marginalize
away the history it gives you a mixture of grousing’s that grows exponentially with
the number of time steps. And so, so it becomes impractical to do you know more than just
a few time steps before that. That won’t work. So, what we do is basically a forward
filtering process where we at each time step we ask what’s the most likely state of my
sensor. And then we say okay we’ll believe it. We’ll adopt that stage and treat it
as evidence and then at time two we’ll ask okay what’s the most likely state at time
two given that I already committed to the state of time one. And we do this. And so
now, and we also have to bound the variance on the true temperature. Just because if you
have a whole a long string of sequences where the sensor is bad the true temperature becomes
extremely uncertain. And you can’t let that grow too far. Probabilistic inferences also infeasible in
the Multiple Sensor Model, even if you follow this step by step commitment strategy. And
so the solution we’re using right now which seems to work best is something we’re calling
Search MAP, which at each time step you start by assuming that all of the sensors are working.
And you score how well that accounts for the observations. And then you ask can I improve
that score by breaking one of the sensors. And you do this in a greedy algorithm basically
hill climbing to try to find a map solution. You don’t always find the true maximum,
because there are local (unintelligible). But even the simple greedy algorithm is takes
a polynomial time that’s quite substantial in the number of sensors. Yeah? (Unintelligible) working even if in the previous
times commitment you decided one of em was broken. That’s what we’re doing right now. But
we could start with yeah with our map guess from the previous time step too. And you can
also consider a variation where having broken one sensor you might reconsider your previous
decision in which case you can do I don’t know sometimes called floating backward you
know floating greedy algorithm, which takes even longer but gives you better solutions.
And we’ve tried a whole bunch of other things you know, various kinds of expectation propagation
and the whole bag of tricks in the machine learning probabilistic modeling area, but…actually
one thing we haven’t tried yet is particle filters. He’s working on that right now.
(Unintelligible). Rob (unintelligible). Well here are single sensor results. So, on
the broken sunshield you can see that it, the bottom curve is the data again, the bottom
plot is the data the top plot is the predicted temperature of just the thermometer of the
one, the one that’s closest to the ground. And then along our periphery curve, we color
code it with, our domain experts wanted us to not just have broken or working but to
actually just have four levels of performance from very good, good, bad, and very bad. So,
very bad would be black and there are just a couple of spots at the peaks of these days
when there are some black spots there. But otherwise it’s mostly marked things as red
for bad. And at night, of course it’s still very; it’s a very good sensor. So, we’re
able to do using just a single sensor model, and there’s a lot more in the single sensor
model we build a baseline expectation based on previous years so that we certainly know
what week six looks like in general. And then for the Multi Sensor Case, Ethan
did an internship at EPFL in Switzerland and there they put out these short term deployments
of sensor networks and he learns conditional, well in this case yeah conditional grousing
basing network over the true temperatures and then fix that combined model. And so these
are the results. And you can see it’s doing quite well in some cases. It’s picking out
a lot of these things where we have like a extremely bad spiky sensors. But in these
long flat lines it’s doing okay, except sometimes when the dash line here is the imputed
valued, when the predicted value happens to coincide with the flat line it said oh the
sensor’s working again. So, this is a case where we probably really should have a flat
line model, because these flat lines happen when the data link is lost and so. Okay. And there are many other challenges.
I mean we’re working to single time step, but of course it really should be multiple
scales. And we’re also working on integrating more heterogeneous sensors than just temperature. Okay. Well, so that’s an example of this
automated data cleaning work. The next problem is model fitting with an explicit detection
model. And this is worked by a post-doc of mine, Rebecca Hutchinson, who’s wrapping
up her post-doc later this spring. And I already talked about species distribution
modeling. Often, particularly with birds and wildlife in general, when you go out an do
a wildlife survey the species could be there, but you just fail to detect it. And this is
a well-known problem in ecology. So, imagine that there’s some landscape and we’ve
chosen some set of these black squares that we’re going to go survey, but when we go
out there it turns out some of the birds are in the vegetation and we don’t see them.
So, although there were every one of those squares was occupied by our species we only
see it twice. What can we do about that? Well, one solution is to make repeated visits that
are close enough together in time that you think the birds have not moved around. Like
during, when they’re sitting on their nests or something. But far enough in time that
you think you’re getting independent measurements of this, of their hiding behavior. So, if
we go back another day maybe you know now we see the bird from the first cell, but the
bird in the second cell is hiding. The third one we still think is unoccupied, because
that bird was hiding the whole time and so on. So, this is one strategy that you can
use. And if you look at the kind of data you get, you get what are called detection histories.
So, suppose we have four different sites. Three of them are in forests, and one is in
grassland. And suppose that there is this true occupancy, which we say is a latent or
hidden variable, right. And the first three sites are occupied and the fourth one is unoccupied.
But we don’t know that. That’s hidden from us. So, on the first day we go out and
it turns out it’s a rainy day and we’re going out at lunch time, and we don’t see
any birds. So, we have all zero’s here. Now another day, we go out early in the morning.
It’s a very good time to go birding and it’s a clear day. And we detect the birds
in the first two sites, but we don’t detect this guy here in site three, and of course
we don’t detect anything at site four. So, we’re going to assume no false detections
here, no hallucinations. Although, that’s not always a safe assumption. And then the
third day, it’s a clear day, but we’re a little late getting out. So, we only see
the, we only detect the bird in the first site. So, these, a thing like 0, 1, 1 or 0,
1, 0 is called the detection history. And from the detection histories you can estimate
if you assume there’s there are independent trials of your detecting ability. You can
get a naïve estimate of you detection probability. So, in this case we know from our data that
sites A and B are occupied by the species. And we know we had six opportunities to detect
the birds, three at each site. We did, we succeeded three times. So, our naïve estimate
of our detection probability would be point five. But in fact we really had nine chances
to observe this species, which we only saw it three times. So, our true detection probability,
at least a maximum likelihood, the estimate thereof would be point three, or one third. So, the big challenge is how can we tell the
difference between an all zero’s history that is due to our (unintelligible) to detect
versus an all zero’s history that’s due to the fact that the site is unoccupied. And
the answer of course is to build a probabilistic model. And so this is a plate style model.
And for those of you, who aren’t familiar with the notation, think of these dash boxes
as being four loops. So, we have a loop where we iterate over the site. So, i index is a
site. And x (unintelligible) is some set of features that describes the site. Like it’s
a forest and it’s at three hundred meters of elevation. And at each site then based
on its features or its properties there’s going to be some occupancy probability (unintelligible).
And we’re going to assume that birds toss a coin with the probability of heads (unintelligible)
to decide whether to occupy a site. And z (unintelligible) is their true occupancy status
of that site, either a zero or one. Now the variable t is going to index over our visits
to that site when we go observing. So, if wit is some description of say it was 6 a.m.
and it was sunny that are, that might influence or account for our detection probability.
And so then yit is the actual report, the data that we get. So, we actually observe
x, w, and y, when we really want z. So, we’d like to extract out of this zi, which is the
species distribution model, the probability of the site being occupied given the properties
of that site. And we’ll call a function; I’m going to name the probability of that
function f. So, f of xi is going to be the occupancy probability. And we’d love to
plot that on a map. But then we have this nuisance model, which is our observation model
and we’ll let dit be the value of this function g that is our detection probability. And so
we can say our probability of reporting a 1 at, that we saw the bird is the product
of z, which will be 1 if the bird is there, and dit which is the probability with detectors.
So, that’s the model. And this was developed by a group McKenzie Adolf from the USGS, but
is a very nice and well established model. But I’m a machine learning person. And you
know in machine learning there is sort of two parallel communities. There’s the community
that loves probabilistic models and there’s the community that loves non-parametric kind
of decision models like support vector machines and decision trees. And these two communities,
well they’re people like me that have one foot in both camps. But they really have very
different outlooks. Why do we like probabilistic graphical models?
Well, it’s a terrific language for expressing our models. And we have wonderful machinery
using probabilistic inference for reasoning about them. So, we know what the semantics
of the models are at least what they’re intended to be. And we can also write down
models that have hidden variables, latent variables that describe some hidden process
that we’re trying to make inferences about. So, probabilistic graphical models are kind
of like the declarative representation of machine learning. But there are some disadvantages,
particularly when you’re exploring in a new domain and you don’t understand the
system well. Because you as the designer have to choose the parametric form of each of the
probability distributions in the mode and you need to decide if you think there are
interactions among the variables and you need to include those interactions in the model.
The data typically have to be pretreated to be scaled and so on if you assumed linearity
in your model you may need to transform you data so that the model, it will have a linear
relationship. And one of the most important things we’ve learned in machine learning
is the importance of adapting the complexity of your model to the complexity of the data.
And it’s difficult to adapt the complexity of a parametric model. I mean there’s some
things you can do with regulization, but it’s not as flexible as using the sort of flexible
machine learning models. So, you know back at that very first machine learning workshop
from which that book came out Ross Quinlan gave a talk about a classification tree method
that he was developing. And it was about a couple of years later that Leo Bryman and
Company published the book on CART. So, classification and regression trees are
a very powerful kind of exploratory non-parametric method. And one of the beauties is that you
can just use them off the shelf. Right? You don’t have to design your model. You don’t
have to pre-process or transform your data. If they automatically discover interactions
if they’re there, and sometimes even if they’re not there. And they can achieve
higher accuracy if you use em in ensembles. So, boosting and bagging and random force
type techniques. And then of course since then support vector
machine kind of revolution has swept through machine learning. And these still require
the same data preprocessing and transformation steps, but by using kernels you can introduce
the non-linarites in an extremely flexible way. And there are very powerful ways of tuning
the model complexity to match the complexity of the problem. So, they work remarkably well
also without a lot of carful design work. So, a challenge is can we have our cake and
eat it too? Can we write down probabilistic graphical models with latent variables in
them that describe processes we care about and yet also have the benefits of these non-parametric
methods? And this is a major open problem in machine learning. And there are several
efforts. There’s been a lot of work recently in the SBM family. There’s Basing non-parametrics
that use mixture models. The approach we’re exploring is boosted regression tree. So, I don’t really have a lot of time to
describe booster regression trees. But they grew out of boosting work in machine learning.
And then first Mason and then Friedman Jerry Friedman and Sanford noticed that there, that
these could really be viewed as part of a generic algorithm schema where you’re going
to fit a weighted sum of regression trees to data. And so he develop this thing called
boosted tree regression or tree boosting. So, the standard approach in these occupancy
models is to represent these functions f and g as log linear models or linear, logistic
regressions. What we’re going to do is replace those functions f and g with non-parametric
flexible models, boosted regression trees. And this can be done using this algorithm
schema called functional gradient descent or you could do functional EM actually also.
And we had a paper at Triple AI last summer that describes the method. So, I’ll just
give you a little flavor for the results. Of course there are methodological problems
for studying latent variable models. And that is that you don’t know the true values of
those variables. They’re hidden from you. So, how do you know whether you’re doing
well? And so I’m going to describe results for one synthetic bird species where we simulate
a species using real data but faked occupancy and faked things. So, we made this model additive,
but non-linear. And this is a scatter plot showing on the horizontal axis the true occupancy
probabilities for this simulated species. And on the vertical axis what different families
of models predict. So, the left column is models that are trained without latent variables
treating it as a supervised learning problem. And you can see that they systematically underestimate
the true occupancy probabilities because they assume the only positive examples they saw
were the cases when you actually detected the bird, which is obviously an underestimate
of what’s really going on. In the right hand column are ones that are using this latent
variable model, the Occupancy Detection Model, the OD Model. And then the top row are where
we’re using logistic aggression as our peramitization. And you can see that on the top right, it’s
more or less unbiased. So, the true probabilities and the predicted ones more or less lie on
that diagonal line which is where they should be. But there’s a lot of scatter and that’s
because the true model is non-linear and we’re fitting a linear model. Whereas if we use
the booster regression trees on the bottom we’re doing a lot better. We’re much closer
to the line. I’d like to omit a couple of the points that are far from the line. But
otherwise we’re pretty happy with that fit. And so, in general this is what we find is
that we can train these flexible booster regression tree models within a graphical model’s framework
and get more accurate results. And so we’ve been applying this to several bird species
data. So, looks like I’m running tight on time
here. So, let me briefly just describe the final problem which is managing fire in Eastern
Oregon. Conveniently this is the problem where we don’t have any results yet. So, I shouldn’t
have said anything, you wouldn’t notice. But, so this is now a policy problem, not
really a data problem. So, you know since the late 1910’s, 1920’s
the U.S. Forest Service had a policy of suppressing all fires essentially. It was part of the
kind of political argument that was used to sell the creation of the Forest Service was
that we will prevent these terrible catastrophic wildfires. Of course it turns out you can’t
prevent them. You can only postpone them. And that’s now coming to pass that our forests
are filled with, we believe that the sort of natural state of forests particularly in
eastern Oregon, we should look something like this where we have very large Ponderosa Pines,
and then what’s called an open understory, so just very small vegetation on the ground. I don’t have a picture for it, but what
we have right now is because fire has been suppressed for a long time we have all kinds
of vegetation on the forest floor. And we have small trees of all different sizes, logical
pines in particular that’s grown up among these Ponderosa Pines. And so when you have
an open ground like that and a fire happens it burns through the ground and actually maintains
that openness. But the Ponderosa Pines have this big, thick fire resistant bark. And they’re
actually happy with this fire coming through and getting rid of some of their competitors.
But what’s happened, since that hasn’t happened now when a fire happens it is able
to climb up the smaller vegetation, reach the crown, and actually destroy the forest,
kill all the trees. And you end up with the really very intense catastrophic fires. And
so one question is, is there anything we can do to manage this landscape. And so we have
a steady area in eastern Oregon, that’s divided up into about 4,000 cells. They’re
irregular shaped. They’re based on homogeneity of the landscape there. And there are four
things you can do to each of these cells each year. You can do nothing. You can do what’s
called mechanical fuel treatment. So, you send people in and they cut down a lot of
that small vegetation and card it out. You can do clear cutting where you harvest the
trees, but you leave behind a lot of debris, and that actually while it gives you timber
value it actually increases fire risk. Or you can do clear cutting and fuel treatment
and then fire just can’t burn at all in that area at least for a few years. So, the question is how should we position
these treatments in the landscape if we want to say minimize the risk of big catastrophic
fires and maybe maximize the probability of these low intensity ground fires. Well we
can think about this as kind of a game against nature. In each time step we can observe the
current state of the landscape. Maybe this is like a fire risk map. And then we choose
an action. We have to choose an action, which is actually a vector of actions. One action
in each cell. And then nature takes, so these are the actions maybe we choose to treat these
particular cells. And then nature has its turn and it lights fires and burns them. And
then it’s our turn again. And so we can model this as a big markup decision
process. But unfortunately it’s a markup decision process with an exponentially large
state space. So, if each of these cells in my landscape has five tree ages and five fuel
levels then I have twenty-five to the four thousandth power of possible states of the
landscape, which is not going to fit into memory very easily. And similarly, each time
I take an action, I have an action vector that has got four thousand elements and each
with four possibilities in each position. So, I have four of the four thousandth possible
actions to consider. Even with all the cleverness of the reinforcement learning community and
approximate dynamic programming we don’t know how to solve these problems. There’s been a little bit of work. There
was a paper by Wei, et al. a couple of years ago when they looked at just a one year planning
problem. So, if I just had one year to make treatments and then there’s going to be
fires in a hundred years, where should I put my treatments? And they were able to formulate
and solve a mixed integer program for this optimal one-shot solution. They were just
completely trying to prevent fire, which is really not the right problem. But any case,
we’re trying now to see whether we can build on that work or come up with some method where
we can solve this MDP over a hundred year horizon. Okay. So, in summary I’ve talked about this
pipeline for the ways computation could help in addressing problems in ecology and ecosystem
management. I’ve talked about automated data cleaning, about fitting these flexible
models within a latent variable modeling framework. And then very briefly about policy optimizations.
And as I mentioned this is part of our larger effort in what we call computational sustainability.
And there are many other opportunities to contribute to. You know I haven’t talked
about energy. I haven’t talked about sustainable development or smart cities or any of these
things. But there are lots of computational problems there as well. I’d like to point out that the Computing
Community Consortium I think the CCC is funding some travel grants and prizes for papers in
this area at several AI Conferences. I know about the ICML and Triple AI, but I think
there are some other conferences where they’re doing this this year. So, there’s a special
track for that that you could submit to. And my joint grant with Cornell, we have created
something called the Institute for Computational Sustainability. And we have a website with
all kinds of information about what’s going on, not just in our own research, but throughout
the computer science community. And I’ll just thank the people that I mentioned
at the start of the project. On the fire project there are two other graduate students Rachel
Houtman and Sean McGregor who have been working there and of course the National Science Foundation
that has been very generous here. Well, thank you for your attention and I’ll
answer questions. So, how does this work local versus remote? So, what we usually do is give
the remote sites a chance to go first, because they might lose the connection later on. Okay. Remote sites? Go ahead. (Question being asked) Okay. Yeah. So, what they do is they run several
thousand fires, simulated fires. And try to calculate for each cell in their landscape
the probability that it will burn. And they decompose that into the probability that it
will burn because the fire ignited in that cell. Or the probability that it will burn
because fire propagated from one of its neighbors. So, they can basically build a sort of probabilistic
flow model that says the probability that this cell will burn conditioned on whether
its neighbors burned. And then they can model a fuel treatment, which they model simply
as if I treat this cell then no fire will be able to propagate through that cell. Okay,
And so with a couple of other approximations they can turn this into a flow problem basically
that we want to prevent flow from sort of the total flow we want to minimize subject
to some budget constraint about how many cells we can afford to treat. And so that, they
basically then have one integer variable for each cell, and they have an objective and
then they can solve it. I mean in our case there would be four thousand integer variables,
which would be a little bit scary. Their problem I think had more like nine hundred cells though.
So, it’s still quite substantial. But you know sea plex is a wonderful thing. And so
it was able to find the solution to that. (Question being asked) Uh huh. Okay. Right. Well this was one chat
as opposed to sequential decision making. So here we just get one, we just get one time
step at which we’re allowed to take actions. And then from there on out nature just gets
all the moves in the game. So, that’s the sense in which it’s a single decision, single
one-shot plan, totally upfront planning in other words. And there are a lot of problems
in ecology where we end up having to take that view that we’re just going to say we
want to buy all the following territory. So, we’ve looked at some, there’s an endangered
species called the Red Cockaded Woodpecker that I believe is here in North Carolina.
And my post doc Dan Sheldon did some very nice work where the question was there are
two pockets of this species, one I think at Camp Legune and the other in the Palmetto
Palm Reserve or something like this. And the question was could they buy a series of intermediate
sites to encourage those two species to mix and have some genetic flow between them. So,
it’s a problem of basically trying to encourage flow instead of trying to prevent flow. And
they were able to also formulate this and solve it for the one-shot case in terms of
building a network that would maximize flow subject to budget constraints. But the real problem, you can’t buy all
the property all at once. You don’t have the money and it isn’t all available. So,
you really need to have be online and every year take some actions that you can afford
to take to keep moving toward that objective. So, turning that into a Markov decision problem
or what’s often called active management in the you know environmental literature that’s
still an open problem. We don’t know how to do that. (Question being asked) That, yeah that is a good question. And we
do wonder is there some way we could come up with some set of sort of spatial basis
function that would let us for instance represent…suppose that we had an optimal policy for laying out
treatments in landscape could we, but we could only compute it for a particular fixed landscape,
could we somehow generalize from that to a more general policy and maybe some kind of
set of spatial basis functions would allow us to do that. And the same is true for looking
at yeah the sort of structure of the landscape. There’s certainly a lot of work done in
atmospheric sciences and weather where they basically use PCA to create a set of basis
functions that they can use then to approximate a lot of things. So, it’s something we’d
like to explore more. (Question being asked) I’m sorry. Right. Well particularly here
we’re intervening in the system, and so yeah the trouble is that we, you have this
research base where if I take these actions then these fires will burn. If I take these
actions something else will happen. And you end up having to do exponentially many simulations
just to simulate one set of circuitry. And so obviously we have to rely on some kind
of sampling or some kind of way of capturing the spatial scale where we beyond which we
can ignore the spatial components. It’s not clear really how to proceed. (Question being asked) Well that is a very good question. Right now
we’ve mostly been looking at just this one site. And we have the weather data and all
the data about the sites, which we need to be able to do the work. And it’s a good
question whether they are generalizable lessons that you could take away from this. One, I
also have some projects in evasive species management and we’re asking the same question
there. And often it’s kind of disturbing. I mean you get a solution like this big map
here wherever it was that says well these are the places that are the optimal places
for me, but how do you, is there any pattern to that? Is there any way that we could explain
that as a sort of a set of rules that we could apply to a different situation? How could
we generalize from this particular landscape? And we need, we need to do that just to explain
it to our domain experts. And obviously policy makers are not going to be happy just being
told well it’s optimal. Our algorithm said so. Particularly because we won’t be able
to say that. We’ll have to say it’s approximately optimal, but we don’t know how bad or something
like that. And so we’re really going to need to be able to give them something qualitative
understanding and let them be able to play with it, and modify it, and explore, and understand
you know how good it is. And that’s a huge challenge to just explain you know once you’ve
done ten million simulations what lesson can you take away from it. Okay. So, I’ve got a question that maybe ducktails
with that. Uh huh. So, to what extent do you feel like these
techniques and the recommendations or policies that you’re producing using these techniques
are getting traction with the people who are actually implementing policy decisions and
you know is it something where you feel like you’re having impact now, you feel like
maybe it’ll be five years, ten years, how, you know what time scale are we talking about
here? Um I would guess in five to ten years. I mean
we’re very fortunate with the forest situation that we have some of the Forest Service people
on our team. And a lot of them are former students of Claire Montgomery who was on the
team. And so we have a nice working relationship with them. And, but the question is, that
is a good question whether they would ever be able to execute our particular policies.
I think one of the main things we’re trying to do is give them backup ammunition for being
able to support the actions that they are taking. Right now the idea that they might
want to treat the landscape in a particular way or in a related problem they might want
to let a fire burn instead of suppressing it, that’s an extremely controversial politically
difficult decision. If we could provide some analysis that shows that yes, under a wide
variety of scenarios that would be a, it would be better to let this fire burn, or it’s
better to treat this than those other things. And that might help them persuade their stakeholders
to go along with it. Of course another thing that would help them persuade their stakeholders
is if we could say well and for these small communities that have timber mills we can
also guarantee you a certain economic benefit from doing this. And so there’s a whole
set of economic objectives perhaps that we would like to have. We don’t maybe also
like to have a whole bunch of endangered species habitat objectives. So, the real problem you
know gets messier and messier. But we won’t be able to attack any of those unless we really
can come up with a methodology that works for these problems. What you just laid out is a hard scenario
for any algorithm to work you know, a procedure to optimize. But as it is it has to be optimized
by humans. I mean in other worlds there are people actually making decisions about whether
or not to let a fire burn. And they have to process all of it. So… Right. I mean. Well mostly they are not letting fires burn,
because it’s just too risky and plus the firefighting money doesn’t come out of their
budget. It’s somebody else’s budget. So, there’s not really an incentive for them.
For the fuel treatment though you’re right. Right now they are making some guesses about
where to treat, trying to balance all of these issues, and I would say they’re not very
happy with that. They would like some more rational way, basis for making those decisions. Yeah. I guess my point was you may not have
to get optimal. You may just have to do better than humans guessing. Right. Well, but we have to convince them
that we are doing better yeah. And that comes into a lot of this broader contextual thing
as well. Yes. You sort of apply the basic approach. Could
you just take the particular plans or policies that they are using or thinking of using as
a prior and then go from there and simplify you’re model because you’re, you’re
working from a targeted assumption… Uh. Huh. base. Oh that’s an interesting idea, yeah, would
be to see if we could in some sense model what they’re doing and then ask locally
how could we improve it, without maybe without walking too far away from it so it doesn’t
look so strange or threatening. No we hadn’t thought about that, but that’s an interesting
idea. Okay. Thanks. Well thank you very much. My pleasure.

Leave a Reply

Your email address will not be published. Required fields are marked *