danalwyn: (Default)
[personal profile] danalwyn
I've been to a lot of conferences lately, in the past four months, and I'm not sure I like it.  So far, none of them have been Physics conferences.  My suspicion that I'm being turned into a computer specialist is pretty much confirmed.  Anyway, I've now come back from Indianapolis, and am not particularly happy with the outcome (as it seems a little vague) but I suppose it was as much as could be expected.

So I thought, just to torment you, I would actually bother to tell you what I'm doing.

There are three types of computers in the world of science, dedicated computers, grid computers, and private computers.  This is how work gets done.

I'm sure you've noticed over the years that computers have gotten more and more important.  In science, they are now almighty.  Everything that we do is tested on a computer first (unless the experimenter is intentionally courting disaster), from simulations to control over experimental equipment.  Computers allow us to not only rapidly sort data, but also to predict in advance what is going to happen and what we should be looking for when it does occur.  This has resulted in massive computing divisions who do nothing but provide code and administrator support for scientific endeavors, frequently tearing out their hair in the meantime.

For the past couple of years, everything has depended on dedicated computers.  Dedicated computers are what scientists buy out of their grant money.  For a fairly standard half million dollar grant (after the university takes its cut), a professor might invest $100,000 in computers, usually half for hardware (the actual computers) and half for operational expenses (power, air conditioning, etc.).  Larger projects, like the human genome project, have thousands of computers.  These are machines owned by a project and "dedicated" toward working on that project's goals.  They are available for work sorting data (as in the Human Genome), for running simulation (as in CDF's distributed CAFs), or for running analysis over that data (everyone does this). 

The problem is that you always need more, and you never use them correctly.  CDF just bought 240 new nodes, meaning 240 boxes that get stuck in a rack at the computer center.  Each node has four dual-core Opteron processors in it.  To a computing person this is a lot of computing power, but now just two months after installation we already need more.  This is because, as with traffic, jobs expand to fill the number of CPUs available to fill them.  This is during peak hours though, there are times when we actually have empty space.  For instance, few people submit jobs to the cluster between 8pm Sunday and 7am Monday morning (depending on what the Liverpool people want out of us) Fermilab time.  This isn't something easy to control, it's just the way that people are.  So, even though we are full a lot of the time, there are spaces of time where we have computing power to spare.  It's like parking spaces, not enough when you need them, too many when you don't.

And it's really expensive to run all these computers.  Generally it takes about a MegaWatt of power to run the computers in our computing center, and another MWatt to run the air conditioning.  In 2003, electricity companies delivered approximately 1.3 trillion kWh of electricity to America's approximately 110 million households, or 11,800 kWh per household per year.  By that standard, Fermilab computing uses the same amount of power in 6 hours that an average American household uses in a year.  Various schemes to reduce the cost have been tried, including upgrading the computers and building more efficient air conditioning, but so far nothing has had much of an effect.  It takes a lot of money to keep running these things.

So, in recent years, we've tried to reduce costs by scattering our infrastructure and moving from dedicated computers to grid, and ultimately to private resources.

To explain private resources, I need to enlist the help of my friend James.  James, get out here.  James?  You're on.  James?

Just a second here...

This is James.  He's sort of shy and he's a bit uncomfortable with appearing in public.  He says hi.  He also says that he would be a lot less uncomfortable if I was not twisting his arm behind his back, but I don't listen to him very often.  Anyway, James thinks he has a bit of an idea.  He fed some family history, genetic information, and some other unrelated information about people into a computer program, and came up with an estimate, accurate to within five percent, of whether that person would develop breast cancer, and, within two years, when they would develop it.  Unfortunately, an explanation for the success is not forthcoming, which means that he can't explain, or even believe in his own success.  It might just be a fluke.  On the off chance that it's not a fluke, we'll have to take a closer look.  To figure this out, he will need a lot more data.

The problem is that computing time is a scarce resource, and a project like this requires a great big chunk of it.  It will take a lot of processor time to sort though millions of family histories, draw the estimates, and plot out the results.  Since James can't explain what he did to create this, he's having a lot of trouble convincing the funding agencies that they should give him the money to build his own computing cluster.  It's a worthy pursuit, but science is full of worthy pursuits, and dedicated resources are rare.  So James has a compromise solution.  He wants to borrow your computer.

Yes.  Your computer.  Don't look for other people in the room.  It's the big thing with the screen that you're looking at.  Now, most everyone who has ever used Windows has looked at the Task Manager.  You can bring it up now by pressing Ctrl-Alt-Delete (don't push it twice though).  On the bottom (at least for XP) is a little counter showing you how much of your CPU you are using.  Wait a moment for the usage from the Task Manager itself to die down, and you should get a fair idea of how much you use when you are browsing the web.  For most of us, that number is less than fifty percent.  The rest of us are busy rendering porn in the background.  Anyway, James wants to borrow that fifty percent from those of us who have it.

Which isn't such a bad idea.  After all, what are you doing with it?  Would you donate the fifty percent you don't use to a worthy cause?  SETI@Home already figured out how to borrow the computing people don't use when they leave their computer on (it says a great deal about SETI@Home that its value as an infrastructure project is much greater than any scientific value that it has).  And what's wrong with that?  Everyone could have a little program that runs in the background of their computer when they are not using much of the CPU, working away at solving James's problem, or mapping global warming, or searching for gravity waves, or doing a number of other good deeds.  This would not give James much in the way of results, but there are millions of computers across the world.  A little bit from each one would add up in no time.

There is no James of course, and even if there were, the computing he would need would not be ready for another couple of months while BOINC gets its act together.  Running on a private computer is difficult, machines are prone to be rebooted without notice.  Someone can start using their computer, forcing the process to grind to a halt.  Internet connection may be lost, and an important job may be lost on someone's hard drive for days.  Plus, James needs more help.  James would have to hire Mack, who would have the job of trying to get people to sign up for his little project.  James would also have to hire Cindy and Jane to actually run the computing farm, manage the flow of jobs, and deal with problems that crop up in the database.  This is a full-time endeavor with today's bleeding edge technology.

In the interim, before that technology is available, we have grid computers.  Grid computing puts clusters of computers in large centers across the world, and essentially shares them between different users.  For instance, the Compact Muon Solenoid (CMS) owns a lot of computers in computer centers in Florida and Wisconsin.  But they don't use them all the time, so when those machines are not busy, they let us take a piece of them.  I can "borrow" unused chunks from other grid sites all over the world.  And even better, if a short lived project is done with their resources, they can leave them in the pool, resulting in a continuous availability of resources to scientists everywhere.  Grid computers are "dedicated" in the sense that they are owned and operated by one group, but they can be made open to everybody in the world who wants to come along.  James could sign up and could be running (in theory) within a few hours.

Of course, it's not that easy.  We spend most of our time trying to make the damn thing work, either fighting with the people who run it, or fighting with their sites.  We are so far out on the bleeding edge that we're drowning, and the entire system totters on the verge of collapse.  Security infrastructure keeps anything useful from being done, usability requirements keep everything to vague to be actually useful.  Sites go up and down faster than the stock market, admin curse and swear and rewrite their site policies twice a day.  Still, if we could get it to work...

Well, there isn't a James now.  But one day there will be, and hopefully we'll be ready for him.

Experience tells me that only about three people read that, but I'll live.  I don't promise anything entertaining in the future though.

(no subject)

Date: 2006-05-18 03:02 pm (UTC)
From: [identity profile] aries-ascendant.livejournal.com
Experience tells me that only about three people read that, but I'll live. I don't promise anything entertaining in the future though.

Hey! I read it. Does that mean your number goes up to four?

The computer problem isn't so bad here. But large corporation are more equipped to handle the cost of running them. Just institute a hiring freeze and hire contractors that you can treat like shit! *grumbles*

I get really annoyed though when I'm using the robots and everything freezes. More often than not it's because the CPU is up around 90-100% *shrugs* I don't know if it's a problem with the network or with the computers themselves, but it really slows me down and I am BUSY!

(P.S. - You were in Indy? Ah man, we could have gotten coffee or something and lamented the horrible, awful, cold weather ;P )

(no subject)

Date: 2006-05-19 03:47 am (UTC)
From: [identity profile] danalwyn.livejournal.com
Sounds like you need your tech people to have a look at your system. Scream, loudly. That works sometimes. Although if top doesn't tell you what's causing the system freeze, chances are that you might just want to re-install your kernel. Or at least find some graduate student to hire to fix the damn thing.

The weather was fairly cold, but it wasn't too bad. Probably just as well you didn't meet up with me. People in Real Life tell me I'm weird. Then they take out restraining orders on me.

(no subject)

Date: 2006-05-19 05:03 pm (UTC)
From: [identity profile] aries-ascendant.livejournal.com
Hee, I don't have to scream loudly to get what I want. The Automation Dept. is composed of men and a I'm a blonde, reasonably attractive woman in my early twenties. I just have to smile. :P

Of course, getting them to fix the actual problem, instead of the imaginary error that they insist is messing up the robot, is an entirely different matter. It took me an entire week to convince a few of them that I actually knew what I was talking about. Finally, I just threw a fit and refused to work until it got fixed.

People tell you that you're weird? That's just mean. I guess it fits with a "mad scientist" persona. I've gotten pretty adept at hiding my weirdness in Real Life encounters.

(no subject)

Date: 2006-05-18 05:10 pm (UTC)
From: [identity profile] silverjackal.livejournal.com
...the entire system totters on the verge of collapse. Security infrastructure keeps anything useful from being done, usability requirements keep everything to vague to be actually useful. Sites go up and down faster than the stock market, admin curse and swear and rewrite their site policies twice a day.

Since when do we work in the same office?

The difference is that we're chronically behind everyone else. Out admin once liquidated all of the computers planned for a department wide update because they had become obsolete before they got around to rolling them out.

(no subject)

Date: 2006-05-19 03:42 am (UTC)
From: [identity profile] danalwyn.livejournal.com
We can't afford to get behind. We've got eight hundred physicists staring down our backs if we don't get their results out on time. Even when we do, they still yell.

Besides, we can't even afford to keep the old ones running.

Profile

danalwyn: (Default)
danalwyn

November 2017

S M T W T F S
   1234
567891011
12131415161718
192021 22232425
2627282930  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags