It’s a great pleasure today for me to introduce

Tom Dietterich. I’ve known Tom for quite some time. He…which I brought a prop. He

actually was an author on my first machine learning textbook. And I got to meet him when

I was a graduate student at Berkley. And I guess this was not long after Tom started

his first faculty position after getting his Ph.D. at Stanford. He came down to visit Berkley

and I enjoyed getting to meet him then. And even then as you see with this textbook he

was playing a very important role in shaping the machine learning community. And he’s

gone on since then to continue to play a role in shaping the machine learning community.

So he is a Triple AI fellow and ACM fellow. He’s been program chair for for Triple AI,

program chair for NIPS. He’s been very active in the International Machine Learning Society,

and really a mentor in the field to a lot of young people. And Tom is one of the few

people to this day who really sees the entire field of machine learning. And as the fields

have become increasingly specialized, it’s rare to find people who can appreciate the

whole field and take it all in. And that’s one, the great, many great things that Tom

is known for. And today he’s going to be telling us about a very important application

of machine learning, which is to computational ecology and environmental management. Thank you very much Ron. So, the work I’m

going to be describing today is obviously a very collaborative interdisciplinary, and

the collaborators in particular that I want to mention is my graduate student Ethan Derigensky,

two post docs Rebecca Hutchinson and Dan Shelton, and then colleagues Wanking Wong who’s in

Computer Science, a machine learning person Clair Montgomery who’s a forest ecologist.

And then several folks at the Cornell Lab of Ornithology. So, if we look at the earth’s ecosystems

or the biosphere, it’s a very complex system. And I think we can agree that in many ways

we have not managed it in a sustainable way. And so I thought I would start the talk by

asking about why is that so, and is there anything that computer science can do to help?

And I think, I mean everybody had their own views of why this is so. But I think maybe

there are three reasons. First of all we don’t understand the system very well. So, it’s

very hard to manage a system when it’s behaving very unpredictably. And there was a very thought

provoking article by a group of authors, first author was Doak in 2008 where they talk about;

they ask the question are ecological surprises inevitable. Or is the dynamics of ecosystems

so complex that we will never really be able to predict the behavior of the systems reliably.

And to sort of support this well, thesis they go through, I don’t know, fifteen or twenty

different examples of situations where either something completely surprising happened like

the population of a species in the Gulf of Alaska suddenly exploded and then five years

later disappeared again, with no one knowing why. Or examples where we attempted a an intervention

in an ecosystem and then it behaved in a, the outcome was very different from what we

had intended. And one example that is very current right now in the Pacific Northwest

is the Northern Spotted Owl. So, during the late 80’s and 1990’s we had what we call

the owl wars in Oregon, because there’s this species that was listed as an endangered

species, the Northern Spotted Owl, and its preferred habitat was, is old growth forests.

And these, most of the old growth forests on private land had already been cut, and

so now there was a lot of logging in the national forests in the public lands and the conservation

community wanted to shut down all that logging. And obviously the Forest Products Industry

which was a very important part of the Oregon economy was dependent on it to a large extent.

And it took you know the President had to come to the state and bring everybody together.

And they came up with this zone called the Northwest Forest Plan, which by and large

did stop logging forests and federal lands, which had a devastating impact on the economy.

And the hope was that this would help the spotted owl recover. But spotted owl numbers

have continued to decline since then. And partly that’s because there was another

species that has come in from the North. The Canadian Invader, which is known as the Barred

Owl. And it turns out it is more reproductively successful and more aggressive. And it seems

to be pushing out the spotted owl. So, that’s the kind, that’s another example of an ecological

surprise, and it’s one of the reasons managing the ecosystems is so difficult. I think another reason that we’ve had trouble

managing ecosystems is that we’ve often focused on only a small part of a very large

system, because the system is so complicated, and we’ve focused on only one piece of it.

So, that could be a single species like the Northern Spotted Owl might be an example of

that. And we’ve often also ignored some of the larger contexts. There’s a colleague

of mine, Heidi Jo Albers who has studied things like creating forest reserves in tropical

forests. And often these forests, when you design these reserves, you need to consider

what the native people might be using that forest for. If you don’t take that into

account, in her case that meant creating large buffer zones around the actual forest, you

end up with those people making encourageons into your bio reserve and degrading it in

one way or another. So, having to consider the spatial aspects,

the interactions among multiple species, these are things that are often ignored in a lot

of ecology and ecosystem management. And finally, I think particularly if you look in agriculture,

we often deliberately manipulate a system to simplify it in order to try to manage it.

So, in crop agriculture for example we try to remove all of the other species so we only

have to worry about one species. But as a consequence we have to provide a lot of the

support for that species that would normally be provided by other species, like fertilizers

and pest management and so on. We have to provide those as exogenous inputs. And many

of those like I say some of the nutrients that we’re providing now are becoming expensive.

And this is not a sustainable way of managing those systems. Well, and I’m sure you could go on and list

many other things. What can Computer Science have to offer? I mean the reason I’m here

is because I think there are several things. First of all if we look at the question of

our lack of the knowledge of the function and structure of the systems, we now have

a couple of ways that we can contribute. First of all you know, we and our colleagues in

nanotechnology and electrical engineering, we’re producing all kinds of novel sensors

that we…so we have wireless sensor networks. We can create thousands of sensors, put them

into these systems, and be able to monitor them much better. And of course the machine learning community

and computational statistics community have been working on building the modeling technique

that can scale up to much larger systems. Although of course it’s still a challenge,

but much more than say was possible twenty years ago. When it comes to this question

about focusing on subsystems, some of the same story. Obviously with our modeling tools

we can now look at the larger system in which the smaller system is embedded. But I think

we also now have tools in say mechanism design to look at the interactions of different parties

that might be competing for a resource or tools in modern optimization that let us find

good solutions to very large and complex optimization problems. And again when we come to agriculture it’s

a different combination of these three things. But better sensing, better modeling, and better

optimization all have a role to play in allowing us to model these systems and manage them

better. So, this general field that we’re calling

computational sustainability, is one of the big things in my group is we are joint with

Karla Gomez at Cornell have one of the NSF expeditions in Computer Science projects.

So, a ten million dollar grant to try to boldly go where no computer scientist has gone before,

and in particular to look at computational methods that can contribute to sustainable

management of (unintelligible) systems. And so as a machine learning person I tend to

think about the computational challenges that are here in terms of a pipeline, from data

to models to policies. And so what I’m going to do in this talk is first talk about what

I see as some of the work that’s going on in each of these areas outside of my group

briefly, and then drill down on three specific things that we’re doing in my group that

contribute to this area. And so I’m hoping you’ll get a sense of the range of challenge

problems that are here and some of the opportunities from a Computer Science perspective. So, the first thing I want to talk about is

sensor placements. And Andres Crowlza (sp?) and his students have been doing some really

exciting things there. So, this particular example is a case where they’re (which I’m

not supposed to point with this, I point with this) where this is a city’s water network.

And they want to know where should we place sensors in this network in order to detect

pollutants or maybe an attack, but a chemical that’s introduced into the system. And their

main tool that they use is something called sub modularity, right, which is the idea that

if, that if you have a function, it’s a function of a set. In this case the set of

places that you have put your sensor. And it exhibits a diminishing returns property

that you know once you’ve placed K sensors the K plus first one is going to give you

less benefit than the K one and so on. Then you can, that’s, you can formulate, if your

objective function is sub modular, then the greedy algorithm in various sophisticated

variants of it, give a performance that is within a constance of optimal or proportional

or fraction of optimal. So, you can get very good results and in fact they won some competitions

on for water, water quality monitoring. And they’ve looked at many other problems as

well. So, sensor placements and of course this has a lot of relationship to the huge

literature and experiment design and its existence. The second thing that comes up is what I call

data interpretation for lack of a better word, which is often the raw data you get from your

sensors, is not at the level you want for your modeling effort. And this is particularly

true for image data. So, for the last eight years I’ve been running a project that we

call the Bug ID Project, where we take photos of moths and arthropods and soil and freshwater

larvae. And we want to identify them to the genous level and ideally to the species level.

And this might be for instance input to building a model of the distributional species in space

to tracking bases species. Or even to water quality monitoring, where you want this histogram

by species of how many individuals you had in a given stream. So, this particular picture

here is from a collaborative of mine Qing Yao, who’s looking at rice pests, and they

put out these light traps at night. And moths wonderfully trap themselves in these traps.

And then they spread them out on a glass table, photograph them from above and below and then

they want to account to identify to species level. The third problem then I call data integration.

I guess that’s an established term. The problem is with a lot of ecological modeling

challenges you have data coming from a wide variety of sources, and a wide variety of

scales in time and in space. And you need to somehow pull all this together in order

to then fit a model to the data. And so in what we’re doing for instance on bird migration

modeling, we’re dealing with data. Everything from stuff that basically never changes like

a digital elevation model of the terrain to things that are maybe changing on a fifteen

minute time scale, like the temperature or the weather and having to integrate all of

these things. And then we come to the part that you know

is really my core competence, which is model fitting and machine learning. And so there

are of course a wide range of models in ecology that people would like to fit. We’ve been

looking really at just three kinds of models. The first are what are known as species distribution

models. And the question there is can we create a map of where a species is found in the landscape.

And so that’s very close to sort of the core machine learning supervised learning

problem. You’re given a site where with some set of features describing it and then

either the species is present there or absent there. Another kind of model is something called

a Meta-Population Model. And here we imagine that we have a set of patches arranged in

space. And a patch may be occupied by a species or not. And over time the species may reproduce.

It may spread to other patches; it may go locally extinct and then get re-colonized.

So, that’s sort of focusing on space and looking at what comes in and out of a cell.

And then the other (unintelligible) sort of migration or disbursal models where you follow

the organism instead. So, you want to model the trajectory say that a bird follows or

the timing of movement. And so there’s work in machine learning

on all of these. One I want to show is what’s called a STEM Model that was developed by

Daniel Fink at the Cornell Lab of Ornithology. And so at the Lab of Ornithology they have

a big project called Project E-bird, where if you’re a birder you can go out observing

in the morning say and then fill out a checklist on their webpage and say here’s what I saw

and I didn’t see anything else. You can click a button for that and then upload it.

There are a lot of avid birders out there. So, we’re now getting like a million data

points a month from people uploading. And they exceeded three million points in May,

sort of the peak of the breeding season. And so there’s a lot of data. Unfortunately

it’s completely uncontrolled. Right? So, you have lots of variation expertise. You

have no control over where people go. But you can still do some interesting things.

And what Daniel does is fit ensembles of decision trees to try to predict whether this species,

in this case the Indigo Bunting, will be present or absent at a particular place and time.

And so I’m going to show you this movie, but it’s important to realize this is a

series of snapshots. There’s no dynamical model here. But this species winters down

in Central America. And you’ll see the orange colors. That’s the species is predicted

to be present first along the Mississippi Valley and then sort of spread out through

the entire eastern U.S. And then as we move into September, this is sort of a clock ticking

along the bottom you see the species goes back down to Mississippi and disappears from

the U.S. And so this is a really I think a very nice model. And it was used as part of

a something called the State of the Birds Report to try to estimate what fraction of

habitat for each of the something like two hundred species of birds is publicly owned

versus privately owned. And this report came out late last year. So, once we have built a model like this then

it’s time to say well it’s great that we have this model of birds but what can we

use, how can we use that to make policy decisions to manage the ecosystem. And I don’t have

a good example for management with birds but with fish John Leathwick who does excellent

work in New Zealand, so I don’t know if you can tell, but see these gray things over

there, these are the islands of New Zealand. And these blue dots are where and red dots

correspond to places where fishing trollers found, did not find or found, the red ones

are positive, did harvest a particular species of fish. The Mora Moro. And the blue line

around the outside is the exclusive economic zone of New Zealand. And so using this data,

he fit a species distribution model similar to the one that I was just describing except

that instead of estimating presence or absence he’s estimating the catch in kilograms,

so the biomass of the fish. And so these are his estimates. The blue areas there are no

fish at all and then you can see this pattern. And then what he wanted to do was then use

that to prioritize regions for their conservation value in terms of supporting that, allowing

this population to grow. And the left pot is prioritizing them if we ignore the fishing

industry and just say what would be the places that would best encourage the growth of the

species. But of course you really need to consider these within an economic context.

And so the right diagram re-prioritizes them now taking into account the cost of the fishing

industry. And you can see, I mean the main lesson here I think is that there’s still

a lot of places that we can conserve and yet also still have the benefit of fishing. So, but this is a kind of a spatial optimizational

problem to solve. And I’ll be talking about some more of those. So, finally that we have

the problem of policy execution, this is usually of course the chasm to go from a design policy

to one that we can convenience people to actually adopt. And you know at the simplest level

we just have a policy where at each time step we observe the state of the system. And then

we choose the action that our policy tells us to choose. And we go ahead and act. But

in practice we’re often called upon to act in a lot of ecosystem management problems,

before we have a very good model of what’s going on. And so they’re really what we

would call a partially observable markup decision process or worse, where we don’t have a

complete understanding of the system we’re trying to model. I think a challenge here

is that, this means that our policy in our early actions should be designed not only

to achieve the ecosystem goal, but also to help us gather more information about the

system so that we can improve our model. So, we have duel objectives. And these are very

difficult to optimize. And one of the big concerns I think in particularly

in light of these ecological surprises is can we design policies that are robust to

our lack of knowledge. Both to the known unknowns to think that where we know that we’re uncertain

and we can model our uncertainty, and then also to the known unknowns the factors that

we forgot to include in the model. And I think that’s one of the most interesting intellectual

questions. I don’t have an answer for it, but I think that there are some things we

might be able to do. Okay, so that’s the review of the sort of

pipeline. And now I’d like to look at, talk about three specific projects at Oregon State.

So, and these will be in data interpretation and model fitting and in policy optimization.

So, the first project is the dissertation project of my student Ethan Dereszynski. And

he’s going to be graduating soon, so he’s looking for a job. And what he works on is

automated data cleaning in sensor networks. So, Oregon State University operates something

called the H.J. Andrews Long Term Ecological Site. So, NSF funds a collection of these

study sites where there have been, they’re committed to collecting data over long periods

of time and doing long term experiments. So, one of my colleagues Mark Harmon for instance

has started an experiment that is going to last two hundred years that’s called the

Roth Experiment. It’s about trees and how long it takes them to decay. But you know

it takes forever to get tenure in this field. Anyway, in this case we’re looking at these

weather stations that are there. And I’m going to talk mostly about four thermometers.

So, this is a weather tower here and these little L shaped things coming off are have

a thermometer on them. And they’re allegedly at one and a half, two and a half, three and

a half, and four and a half meters above the ground. And we get data from them that looks

something like this. So, every fifteen minutes we get a temperature reading. And you can

see on these curves the up and down motion, this is, those are the daily cycle. The (unintelligible)

Cycle. So, it’s warming up in the daytime and cooling off at night. And it’s kind

of fun, because the thermometer that’s nearest the ground, which is the black line in the,

is the one that’s coldest at night and hottest in the day. So, they, they flip back and forth

like this. And the problem is that these sensors are out in the world and bad things happen

to them. And so someone has to do data quality assurance on these, on the sensor data and

clean it up before we try to do any analysis on it. Now traditionally in the (unintelligible)

Forest, we’ve got three of these towers and then there are many more. But there are

three main ones that have been in operation since the 80’s. And with twelve thermometers

it’s not really much of a burden for someone to go check this data. They just eyeball it

and cluster it in various ways and look for outliers. But of course we now want to, now

we’ve now got Wi-Fi over the entire forest and we want to put out huge networks of things.

And if we have a thousand thermometers this human data cleaning becomes infeasible, unless

we can figure out how to make a Capua out of it and maybe get people to do it. So, the kinds of things that go wrong, like

for instance here this is an instance of what’s called a broken sun shield. And so, the air

temperature sensor is now measuring actually the surface skin temperature of the thermometer

with the sun directly beating down on it. And so you can see in the daytime it spikes

way high, as many as ten degrees higher than the true air temperature. At night it’s

a perfectly good air temperature sensor, but in the daytime, particularly sunny days, not

so good. Can anyone guess what’s going on in the

bottom case here? We have this our 1.5 meter sensor is flat lining for a while. Yes. So, the problem here is this is week

three. So, that means it’s right about now. But it was in 1996. We had a big snowstorm.

And so this is now a snow temperature sensor, instead of an air temperature sensor. In some

sense the thermometer is still functioning correctly, it’s just that the metadata is

wrong. But there’s a lot more going on here. So, you notice that the 4.5 meter thermometer

is still bouncing up and down rather nicely. I mean obviously over here It’s quite cold

these days, even the nights, even in the daytime it’s just barely getting above freezing.

But then what’s happening over here. We’re getting, it really warmed up. I mean it’s

almost in incident fifties at the top of the thermometer tower. And at right around 3500

here it’s starting to rain. And so the snow temperature goes moves up to sort of the triple

point of water for a while. And now the snow is melting and we’re having…and the university

is closed right around 4500 because we had such a huge flood that you couldn’t get

to campus. So, this is, this is how you get a big flood in Oregon is to have what’s

called a rain on snow event. And this was, this was one of them. So, we like to detect these things also you

know because they’re interesting, but we don’t want to assume that this thermometer

is measuring air temperature during this entire period. So, how can we do this? Well, we’d like

a data cleaning system to do really two functions. The first is we’d like it to mark every

data value that we think is anomalous. And so in this case this is a different set of

data. But we’ve put the little red what they call the rug, right a little red tick

underneath each data point that our model predicts is incorrect as something wrong. And then the other thing you’d like it to

do is to impute, or predict, fill in the missing values, what the, what the thermometer should

have been reading if it had been working correctly. And we’re going to do this, we’re going

to do both of these things using probabilistic model. So, the basic probabilistic model we’re

going to use though these are you know Bayesian Networks or probabilistic graphical models

is the following. We’re going to have one node here for each of our variables of interest

and the one that is gray, that’s an observe node. So, this is the observe temperature

at time t. And then there is a hidden node, which is our true temperature that we wish

we could observe directly. Then up here is our sensor state variable. And I’ve made

it a box to indicate that it’s a discrete, whereas these are continuous variables. And

the idea is a very simple sensor model that says when the sensor state is one, that is

normal or working, then the observe temperature has a (unintelligible) distribution who’s

mean is the true temperature x but with some small variance around that. But when the thermometer

is broken and so the state is zero then the observe temperature has a mean of zero and

a gigantic variance. So, basically what we’re saying is completely unrelated to the true

temperature. So, this is a very simple model, and why do we adopt this kind of model? Well

you could try to think about this kind of data as if it were a diagnosis problem that

the sensor has various fault modes and failure modes and you want to predict what they are.

And so you could do a kind of Bayesian diagnosis where you could say well given the sensor

readings and my expectations it looks like it’s a broken sunshield or it looks like

it’s a flat line because of a communications failure or something like this. But the trouble

is we were not confident that we could enumerate an advance all the ways a sensor could fail.

We wanted to have an open ended set. So, the idea here is to treat it more as an anomaly

detection problem where we model the normal behavior of the sensor as accurately as we

can. And then anything that is a serious departure from normal, this model, the normal model

will the first line will give it very low likelihood and it’ll instead get picked

up by this sort of very generic failure model. So, that’s the idea here. So, we can do anomaly detection then by doing

probabilistic inference. We ask the query you know what is the most likely value of

the state of this sensor at time t. And that’s just the argmax over the possible states of

the probability of the states given the observation. And we can also do imputation by asking instead

what’s the most likely temperature given the observed temperature. So, basic probabilistic

inference techniques work just fine. But of course this is a very bad model of the sensor

here. So, the next thing we want to do is add some sort of Markov Model so that we can

look at the history of the sensor. Because we’d like to say well sensors, if it was

working fifteen minutes ago, it’s probably still working now. And if it was broken fifteen

minutes ago, it’s very likely it’s still broken now. So, we’d like to do that. And

similarly of course the actual real temperature doesn’t change that drastically either.

So, we’d like to have some model of the true temperature changes over time. So this

gives us now a Markov version of this. And now we can ask a query like what’s the most

likely state of this sensor this time given the entire observation history. And that also

can be reasonably calculated. But we can go even further than this if we have multiple

sensors as we do on these towers. We could build a separate copy of the model for each

of them and then couple those somehow. So we could say that you know if we know the

temperature of the sensor at the bottom of the tower then we should be able to predict

with reasonable accuracy the sensor next up on the tower. And so this is the kind of thing

we do. In general we learn a sparse joint grousing distribution among all of the t variables.

And then so that we have a connected model. Unfortunately probabilistic inference in these

models starts to become intractable. So, even in the single sensor model, which it with

the Markovian independence, you would think that that would not be a problem. But it is

because of our observed variable. That would be true. If all the variables were discrete

then we could solve that very easily. A simple message passing algorithm will do it. But

because our variables are continuous so there are conditional grousing’s when you marginalize

away the history it gives you a mixture of grousing’s that grows exponentially with

the number of time steps. And so, so it becomes impractical to do you know more than just

a few time steps before that. That won’t work. So, what we do is basically a forward

filtering process where we at each time step we ask what’s the most likely state of my

sensor. And then we say okay we’ll believe it. We’ll adopt that stage and treat it

as evidence and then at time two we’ll ask okay what’s the most likely state at time

two given that I already committed to the state of time one. And we do this. And so

now, and we also have to bound the variance on the true temperature. Just because if you

have a whole a long string of sequences where the sensor is bad the true temperature becomes

extremely uncertain. And you can’t let that grow too far. Probabilistic inferences also infeasible in

the Multiple Sensor Model, even if you follow this step by step commitment strategy. And

so the solution we’re using right now which seems to work best is something we’re calling

Search MAP, which at each time step you start by assuming that all of the sensors are working.

And you score how well that accounts for the observations. And then you ask can I improve

that score by breaking one of the sensors. And you do this in a greedy algorithm basically

hill climbing to try to find a map solution. You don’t always find the true maximum,

because there are local (unintelligible). But even the simple greedy algorithm is takes

a polynomial time that’s quite substantial in the number of sensors. Yeah? (Unintelligible) working even if in the previous

times commitment you decided one of em was broken. That’s what we’re doing right now. But

we could start with yeah with our map guess from the previous time step too. And you can

also consider a variation where having broken one sensor you might reconsider your previous

decision in which case you can do I don’t know sometimes called floating backward you

know floating greedy algorithm, which takes even longer but gives you better solutions.

And we’ve tried a whole bunch of other things you know, various kinds of expectation propagation

and the whole bag of tricks in the machine learning probabilistic modeling area, but…actually

one thing we haven’t tried yet is particle filters. He’s working on that right now.

(Unintelligible). Rob (unintelligible). Well here are single sensor results. So, on

the broken sunshield you can see that it, the bottom curve is the data again, the bottom

plot is the data the top plot is the predicted temperature of just the thermometer of the

one, the one that’s closest to the ground. And then along our periphery curve, we color

code it with, our domain experts wanted us to not just have broken or working but to

actually just have four levels of performance from very good, good, bad, and very bad. So,

very bad would be black and there are just a couple of spots at the peaks of these days

when there are some black spots there. But otherwise it’s mostly marked things as red

for bad. And at night, of course it’s still very; it’s a very good sensor. So, we’re

able to do using just a single sensor model, and there’s a lot more in the single sensor

model we build a baseline expectation based on previous years so that we certainly know

what week six looks like in general. And then for the Multi Sensor Case, Ethan

did an internship at EPFL in Switzerland and there they put out these short term deployments

of sensor networks and he learns conditional, well in this case yeah conditional grousing

basing network over the true temperatures and then fix that combined model. And so these

are the results. And you can see it’s doing quite well in some cases. It’s picking out

a lot of these things where we have like a extremely bad spiky sensors. But in these

long flat lines it’s doing okay, except sometimes when the dash line here is the imputed

valued, when the predicted value happens to coincide with the flat line it said oh the

sensor’s working again. So, this is a case where we probably really should have a flat

line model, because these flat lines happen when the data link is lost and so. Okay. And there are many other challenges.

I mean we’re working to single time step, but of course it really should be multiple

scales. And we’re also working on integrating more heterogeneous sensors than just temperature. Okay. Well, so that’s an example of this

automated data cleaning work. The next problem is model fitting with an explicit detection

model. And this is worked by a post-doc of mine, Rebecca Hutchinson, who’s wrapping

up her post-doc later this spring. And I already talked about species distribution

modeling. Often, particularly with birds and wildlife in general, when you go out an do

a wildlife survey the species could be there, but you just fail to detect it. And this is

a well-known problem in ecology. So, imagine that there’s some landscape and we’ve

chosen some set of these black squares that we’re going to go survey, but when we go

out there it turns out some of the birds are in the vegetation and we don’t see them.

So, although there were every one of those squares was occupied by our species we only

see it twice. What can we do about that? Well, one solution is to make repeated visits that

are close enough together in time that you think the birds have not moved around. Like

during, when they’re sitting on their nests or something. But far enough in time that

you think you’re getting independent measurements of this, of their hiding behavior. So, if

we go back another day maybe you know now we see the bird from the first cell, but the

bird in the second cell is hiding. The third one we still think is unoccupied, because

that bird was hiding the whole time and so on. So, this is one strategy that you can

use. And if you look at the kind of data you get, you get what are called detection histories.

So, suppose we have four different sites. Three of them are in forests, and one is in

grassland. And suppose that there is this true occupancy, which we say is a latent or

hidden variable, right. And the first three sites are occupied and the fourth one is unoccupied.

But we don’t know that. That’s hidden from us. So, on the first day we go out and

it turns out it’s a rainy day and we’re going out at lunch time, and we don’t see

any birds. So, we have all zero’s here. Now another day, we go out early in the morning.

It’s a very good time to go birding and it’s a clear day. And we detect the birds

in the first two sites, but we don’t detect this guy here in site three, and of course

we don’t detect anything at site four. So, we’re going to assume no false detections

here, no hallucinations. Although, that’s not always a safe assumption. And then the

third day, it’s a clear day, but we’re a little late getting out. So, we only see

the, we only detect the bird in the first site. So, these, a thing like 0, 1, 1 or 0,

1, 0 is called the detection history. And from the detection histories you can estimate

if you assume there’s there are independent trials of your detecting ability. You can

get a naïve estimate of you detection probability. So, in this case we know from our data that

sites A and B are occupied by the species. And we know we had six opportunities to detect

the birds, three at each site. We did, we succeeded three times. So, our naïve estimate

of our detection probability would be point five. But in fact we really had nine chances

to observe this species, which we only saw it three times. So, our true detection probability,

at least a maximum likelihood, the estimate thereof would be point three, or one third. So, the big challenge is how can we tell the

difference between an all zero’s history that is due to our (unintelligible) to detect

versus an all zero’s history that’s due to the fact that the site is unoccupied. And

the answer of course is to build a probabilistic model. And so this is a plate style model.

And for those of you, who aren’t familiar with the notation, think of these dash boxes

as being four loops. So, we have a loop where we iterate over the site. So, i index is a

site. And x (unintelligible) is some set of features that describes the site. Like it’s

a forest and it’s at three hundred meters of elevation. And at each site then based

on its features or its properties there’s going to be some occupancy probability (unintelligible).

And we’re going to assume that birds toss a coin with the probability of heads (unintelligible)

to decide whether to occupy a site. And z (unintelligible) is their true occupancy status

of that site, either a zero or one. Now the variable t is going to index over our visits

to that site when we go observing. So, if wit is some description of say it was 6 a.m.

and it was sunny that are, that might influence or account for our detection probability.

And so then yit is the actual report, the data that we get. So, we actually observe

x, w, and y, when we really want z. So, we’d like to extract out of this zi, which is the

species distribution model, the probability of the site being occupied given the properties

of that site. And we’ll call a function; I’m going to name the probability of that

function f. So, f of xi is going to be the occupancy probability. And we’d love to

plot that on a map. But then we have this nuisance model, which is our observation model

and we’ll let dit be the value of this function g that is our detection probability. And so

we can say our probability of reporting a 1 at, that we saw the bird is the product

of z, which will be 1 if the bird is there, and dit which is the probability with detectors.

So, that’s the model. And this was developed by a group McKenzie Adolf from the USGS, but

is a very nice and well established model. But I’m a machine learning person. And you

know in machine learning there is sort of two parallel communities. There’s the community

that loves probabilistic models and there’s the community that loves non-parametric kind

of decision models like support vector machines and decision trees. And these two communities,

well they’re people like me that have one foot in both camps. But they really have very

different outlooks. Why do we like probabilistic graphical models?

Well, it’s a terrific language for expressing our models. And we have wonderful machinery

using probabilistic inference for reasoning about them. So, we know what the semantics

of the models are at least what they’re intended to be. And we can also write down

models that have hidden variables, latent variables that describe some hidden process

that we’re trying to make inferences about. So, probabilistic graphical models are kind

of like the declarative representation of machine learning. But there are some disadvantages,

particularly when you’re exploring in a new domain and you don’t understand the

system well. Because you as the designer have to choose the parametric form of each of the

probability distributions in the mode and you need to decide if you think there are

interactions among the variables and you need to include those interactions in the model.

The data typically have to be pretreated to be scaled and so on if you assumed linearity

in your model you may need to transform you data so that the model, it will have a linear

relationship. And one of the most important things we’ve learned in machine learning

is the importance of adapting the complexity of your model to the complexity of the data.

And it’s difficult to adapt the complexity of a parametric model. I mean there’s some

things you can do with regulization, but it’s not as flexible as using the sort of flexible

machine learning models. So, you know back at that very first machine learning workshop

from which that book came out Ross Quinlan gave a talk about a classification tree method

that he was developing. And it was about a couple of years later that Leo Bryman and

Company published the book on CART. So, classification and regression trees are

a very powerful kind of exploratory non-parametric method. And one of the beauties is that you

can just use them off the shelf. Right? You don’t have to design your model. You don’t

have to pre-process or transform your data. If they automatically discover interactions

if they’re there, and sometimes even if they’re not there. And they can achieve

higher accuracy if you use em in ensembles. So, boosting and bagging and random force

type techniques. And then of course since then support vector

machine kind of revolution has swept through machine learning. And these still require

the same data preprocessing and transformation steps, but by using kernels you can introduce

the non-linarites in an extremely flexible way. And there are very powerful ways of tuning

the model complexity to match the complexity of the problem. So, they work remarkably well

also without a lot of carful design work. So, a challenge is can we have our cake and

eat it too? Can we write down probabilistic graphical models with latent variables in

them that describe processes we care about and yet also have the benefits of these non-parametric

methods? And this is a major open problem in machine learning. And there are several

efforts. There’s been a lot of work recently in the SBM family. There’s Basing non-parametrics

that use mixture models. The approach we’re exploring is boosted regression tree. So, I don’t really have a lot of time to

describe booster regression trees. But they grew out of boosting work in machine learning.

And then first Mason and then Friedman Jerry Friedman and Sanford noticed that there, that

these could really be viewed as part of a generic algorithm schema where you’re going

to fit a weighted sum of regression trees to data. And so he develop this thing called

boosted tree regression or tree boosting. So, the standard approach in these occupancy

models is to represent these functions f and g as log linear models or linear, logistic

regressions. What we’re going to do is replace those functions f and g with non-parametric

flexible models, boosted regression trees. And this can be done using this algorithm

schema called functional gradient descent or you could do functional EM actually also.

And we had a paper at Triple AI last summer that describes the method. So, I’ll just

give you a little flavor for the results. Of course there are methodological problems

for studying latent variable models. And that is that you don’t know the true values of

those variables. They’re hidden from you. So, how do you know whether you’re doing

well? And so I’m going to describe results for one synthetic bird species where we simulate

a species using real data but faked occupancy and faked things. So, we made this model additive,

but non-linear. And this is a scatter plot showing on the horizontal axis the true occupancy

probabilities for this simulated species. And on the vertical axis what different families

of models predict. So, the left column is models that are trained without latent variables

treating it as a supervised learning problem. And you can see that they systematically underestimate

the true occupancy probabilities because they assume the only positive examples they saw

were the cases when you actually detected the bird, which is obviously an underestimate

of what’s really going on. In the right hand column are ones that are using this latent

variable model, the Occupancy Detection Model, the OD Model. And then the top row are where

we’re using logistic aggression as our peramitization. And you can see that on the top right, it’s

more or less unbiased. So, the true probabilities and the predicted ones more or less lie on

that diagonal line which is where they should be. But there’s a lot of scatter and that’s

because the true model is non-linear and we’re fitting a linear model. Whereas if we use

the booster regression trees on the bottom we’re doing a lot better. We’re much closer

to the line. I’d like to omit a couple of the points that are far from the line. But

otherwise we’re pretty happy with that fit. And so, in general this is what we find is

that we can train these flexible booster regression tree models within a graphical model’s framework

and get more accurate results. And so we’ve been applying this to several bird species

data. So, looks like I’m running tight on time

here. So, let me briefly just describe the final problem which is managing fire in Eastern

Oregon. Conveniently this is the problem where we don’t have any results yet. So, I shouldn’t

have said anything, you wouldn’t notice. But, so this is now a policy problem, not

really a data problem. So, you know since the late 1910’s, 1920’s

the U.S. Forest Service had a policy of suppressing all fires essentially. It was part of the

kind of political argument that was used to sell the creation of the Forest Service was

that we will prevent these terrible catastrophic wildfires. Of course it turns out you can’t

prevent them. You can only postpone them. And that’s now coming to pass that our forests

are filled with, we believe that the sort of natural state of forests particularly in

eastern Oregon, we should look something like this where we have very large Ponderosa Pines,

and then what’s called an open understory, so just very small vegetation on the ground. I don’t have a picture for it, but what

we have right now is because fire has been suppressed for a long time we have all kinds

of vegetation on the forest floor. And we have small trees of all different sizes, logical

pines in particular that’s grown up among these Ponderosa Pines. And so when you have

an open ground like that and a fire happens it burns through the ground and actually maintains

that openness. But the Ponderosa Pines have this big, thick fire resistant bark. And they’re

actually happy with this fire coming through and getting rid of some of their competitors.

But what’s happened, since that hasn’t happened now when a fire happens it is able

to climb up the smaller vegetation, reach the crown, and actually destroy the forest,

kill all the trees. And you end up with the really very intense catastrophic fires. And

so one question is, is there anything we can do to manage this landscape. And so we have

a steady area in eastern Oregon, that’s divided up into about 4,000 cells. They’re

irregular shaped. They’re based on homogeneity of the landscape there. And there are four

things you can do to each of these cells each year. You can do nothing. You can do what’s

called mechanical fuel treatment. So, you send people in and they cut down a lot of

that small vegetation and card it out. You can do clear cutting where you harvest the

trees, but you leave behind a lot of debris, and that actually while it gives you timber

value it actually increases fire risk. Or you can do clear cutting and fuel treatment

and then fire just can’t burn at all in that area at least for a few years. So, the question is how should we position

these treatments in the landscape if we want to say minimize the risk of big catastrophic

fires and maybe maximize the probability of these low intensity ground fires. Well we

can think about this as kind of a game against nature. In each time step we can observe the

current state of the landscape. Maybe this is like a fire risk map. And then we choose

an action. We have to choose an action, which is actually a vector of actions. One action

in each cell. And then nature takes, so these are the actions maybe we choose to treat these

particular cells. And then nature has its turn and it lights fires and burns them. And

then it’s our turn again. And so we can model this as a big markup decision

process. But unfortunately it’s a markup decision process with an exponentially large

state space. So, if each of these cells in my landscape has five tree ages and five fuel

levels then I have twenty-five to the four thousandth power of possible states of the

landscape, which is not going to fit into memory very easily. And similarly, each time

I take an action, I have an action vector that has got four thousand elements and each

with four possibilities in each position. So, I have four of the four thousandth possible

actions to consider. Even with all the cleverness of the reinforcement learning community and

approximate dynamic programming we don’t know how to solve these problems. There’s been a little bit of work. There

was a paper by Wei, et al. a couple of years ago when they looked at just a one year planning

problem. So, if I just had one year to make treatments and then there’s going to be

fires in a hundred years, where should I put my treatments? And they were able to formulate

and solve a mixed integer program for this optimal one-shot solution. They were just

completely trying to prevent fire, which is really not the right problem. But any case,

we’re trying now to see whether we can build on that work or come up with some method where

we can solve this MDP over a hundred year horizon. Okay. So, in summary I’ve talked about this

pipeline for the ways computation could help in addressing problems in ecology and ecosystem

management. I’ve talked about automated data cleaning, about fitting these flexible

models within a latent variable modeling framework. And then very briefly about policy optimizations.

And as I mentioned this is part of our larger effort in what we call computational sustainability.

And there are many other opportunities to contribute to. You know I haven’t talked

about energy. I haven’t talked about sustainable development or smart cities or any of these

things. But there are lots of computational problems there as well. I’d like to point out that the Computing

Community Consortium I think the CCC is funding some travel grants and prizes for papers in

this area at several AI Conferences. I know about the ICML and Triple AI, but I think

there are some other conferences where they’re doing this this year. So, there’s a special

track for that that you could submit to. And my joint grant with Cornell, we have created

something called the Institute for Computational Sustainability. And we have a website with

all kinds of information about what’s going on, not just in our own research, but throughout

the computer science community. And I’ll just thank the people that I mentioned

at the start of the project. On the fire project there are two other graduate students Rachel

Houtman and Sean McGregor who have been working there and of course the National Science Foundation

that has been very generous here. Well, thank you for your attention and I’ll

answer questions. So, how does this work local versus remote? So, what we usually do is give

the remote sites a chance to go first, because they might lose the connection later on. Okay. Remote sites? Go ahead. (Question being asked) Okay. Yeah. So, what they do is they run several

thousand fires, simulated fires. And try to calculate for each cell in their landscape

the probability that it will burn. And they decompose that into the probability that it

will burn because the fire ignited in that cell. Or the probability that it will burn

because fire propagated from one of its neighbors. So, they can basically build a sort of probabilistic

flow model that says the probability that this cell will burn conditioned on whether

its neighbors burned. And then they can model a fuel treatment, which they model simply

as if I treat this cell then no fire will be able to propagate through that cell. Okay,

And so with a couple of other approximations they can turn this into a flow problem basically

that we want to prevent flow from sort of the total flow we want to minimize subject

to some budget constraint about how many cells we can afford to treat. And so that, they

basically then have one integer variable for each cell, and they have an objective and

then they can solve it. I mean in our case there would be four thousand integer variables,

which would be a little bit scary. Their problem I think had more like nine hundred cells though.

So, it’s still quite substantial. But you know sea plex is a wonderful thing. And so

it was able to find the solution to that. (Question being asked) Uh huh. Okay. Right. Well this was one chat

as opposed to sequential decision making. So here we just get one, we just get one time

step at which we’re allowed to take actions. And then from there on out nature just gets

all the moves in the game. So, that’s the sense in which it’s a single decision, single

one-shot plan, totally upfront planning in other words. And there are a lot of problems

in ecology where we end up having to take that view that we’re just going to say we

want to buy all the following territory. So, we’ve looked at some, there’s an endangered

species called the Red Cockaded Woodpecker that I believe is here in North Carolina.

And my post doc Dan Sheldon did some very nice work where the question was there are

two pockets of this species, one I think at Camp Legune and the other in the Palmetto

Palm Reserve or something like this. And the question was could they buy a series of intermediate

sites to encourage those two species to mix and have some genetic flow between them. So,

it’s a problem of basically trying to encourage flow instead of trying to prevent flow. And

they were able to also formulate this and solve it for the one-shot case in terms of

building a network that would maximize flow subject to budget constraints. But the real problem, you can’t buy all

the property all at once. You don’t have the money and it isn’t all available. So,

you really need to have be online and every year take some actions that you can afford

to take to keep moving toward that objective. So, turning that into a Markov decision problem

or what’s often called active management in the you know environmental literature that’s

still an open problem. We don’t know how to do that. (Question being asked) That, yeah that is a good question. And we

do wonder is there some way we could come up with some set of sort of spatial basis

function that would let us for instance represent…suppose that we had an optimal policy for laying out

treatments in landscape could we, but we could only compute it for a particular fixed landscape,

could we somehow generalize from that to a more general policy and maybe some kind of

set of spatial basis functions would allow us to do that. And the same is true for looking

at yeah the sort of structure of the landscape. There’s certainly a lot of work done in

atmospheric sciences and weather where they basically use PCA to create a set of basis

functions that they can use then to approximate a lot of things. So, it’s something we’d

like to explore more. (Question being asked) I’m sorry. Right. Well particularly here

we’re intervening in the system, and so yeah the trouble is that we, you have this

research base where if I take these actions then these fires will burn. If I take these

actions something else will happen. And you end up having to do exponentially many simulations

just to simulate one set of circuitry. And so obviously we have to rely on some kind

of sampling or some kind of way of capturing the spatial scale where we beyond which we

can ignore the spatial components. It’s not clear really how to proceed. (Question being asked) Well that is a very good question. Right now

we’ve mostly been looking at just this one site. And we have the weather data and all

the data about the sites, which we need to be able to do the work. And it’s a good

question whether they are generalizable lessons that you could take away from this. One, I

also have some projects in evasive species management and we’re asking the same question

there. And often it’s kind of disturbing. I mean you get a solution like this big map

here wherever it was that says well these are the places that are the optimal places

for me, but how do you, is there any pattern to that? Is there any way that we could explain

that as a sort of a set of rules that we could apply to a different situation? How could

we generalize from this particular landscape? And we need, we need to do that just to explain

it to our domain experts. And obviously policy makers are not going to be happy just being

told well it’s optimal. Our algorithm said so. Particularly because we won’t be able

to say that. We’ll have to say it’s approximately optimal, but we don’t know how bad or something

like that. And so we’re really going to need to be able to give them something qualitative

understanding and let them be able to play with it, and modify it, and explore, and understand

you know how good it is. And that’s a huge challenge to just explain you know once you’ve

done ten million simulations what lesson can you take away from it. Okay. So, I’ve got a question that maybe ducktails

with that. Uh huh. So, to what extent do you feel like these

techniques and the recommendations or policies that you’re producing using these techniques

are getting traction with the people who are actually implementing policy decisions and

you know is it something where you feel like you’re having impact now, you feel like

maybe it’ll be five years, ten years, how, you know what time scale are we talking about

here? Um I would guess in five to ten years. I mean

we’re very fortunate with the forest situation that we have some of the Forest Service people

on our team. And a lot of them are former students of Claire Montgomery who was on the

team. And so we have a nice working relationship with them. And, but the question is, that

is a good question whether they would ever be able to execute our particular policies.

I think one of the main things we’re trying to do is give them backup ammunition for being

able to support the actions that they are taking. Right now the idea that they might

want to treat the landscape in a particular way or in a related problem they might want

to let a fire burn instead of suppressing it, that’s an extremely controversial politically

difficult decision. If we could provide some analysis that shows that yes, under a wide

variety of scenarios that would be a, it would be better to let this fire burn, or it’s

better to treat this than those other things. And that might help them persuade their stakeholders

to go along with it. Of course another thing that would help them persuade their stakeholders

is if we could say well and for these small communities that have timber mills we can

also guarantee you a certain economic benefit from doing this. And so there’s a whole

set of economic objectives perhaps that we would like to have. We don’t maybe also

like to have a whole bunch of endangered species habitat objectives. So, the real problem you

know gets messier and messier. But we won’t be able to attack any of those unless we really

can come up with a methodology that works for these problems. What you just laid out is a hard scenario

for any algorithm to work you know, a procedure to optimize. But as it is it has to be optimized

by humans. I mean in other worlds there are people actually making decisions about whether

or not to let a fire burn. And they have to process all of it. So… Right. I mean. Well mostly they are not letting fires burn,

because it’s just too risky and plus the firefighting money doesn’t come out of their

budget. It’s somebody else’s budget. So, there’s not really an incentive for them.

For the fuel treatment though you’re right. Right now they are making some guesses about

where to treat, trying to balance all of these issues, and I would say they’re not very

happy with that. They would like some more rational way, basis for making those decisions. Yeah. I guess my point was you may not have

to get optimal. You may just have to do better than humans guessing. Right. Well, but we have to convince them

that we are doing better yeah. And that comes into a lot of this broader contextual thing

as well. Yes. You sort of apply the basic approach. Could

you just take the particular plans or policies that they are using or thinking of using as

a prior and then go from there and simplify you’re model because you’re, you’re

working from a targeted assumption… Uh. Huh. base. Oh that’s an interesting idea, yeah, would

be to see if we could in some sense model what they’re doing and then ask locally

how could we improve it, without maybe without walking too far away from it so it doesn’t

look so strange or threatening. No we hadn’t thought about that, but that’s an interesting

idea. Okay. Thanks. Well thank you very much. My pleasure.