Sunday, October 07, 2007

Notice: An Update for your Perlang FPGA Cluster is Available

Ok, I've been bit by Tim Bray's Wide Finder meme.

I noticed the conversation swarm as it bubbled up, but didn't pay too much attention. Mark Masterson's article It's Time to Stop Calling Circuits "Hardware" caught my attention, as I have pondered the plasticity of the boundary between hardware and software in a previous life.

So I've been digesting the conversation swarm. It's one heck of an interesting read.

Tim presents a problem case that frames a fundamental shift occurring in modern CPU/system architectures. The shift is moving us away from ever increasing CPU speeds towards ever increasing CPU counts. Certain classes of problems are extremely well suited for the shift to multcpucore architectures. Other problems gain no direct benefit, particularly if they are migrated without change. Tim uses the problem of summarizing log file data as an example of this latter case.

Without brainpower focused on this aspect of the problem, the techniques being employed to increase aggregate compute capacity will not provide much benefit for many of the common tasks performed in IT shops.

There are three interesting aspects to Tim's conversation swarm. Two are explicit. The third is implicit.

The first aspect consists of all the solutions for the stated goal - how to leverage the latest trend in processor/system architectures for the seemingly mundane task of processing log data.

For what it's worth, here are my first thoughts on the problem of leveraging multiple cpus ofor the task of processing log data. My preference leans towards use of existing technology, most likely to be implemented by the people most likely to feel the pain.

Divide and conquer: (the sysadmin in me)

  • Coerce the logging engine(s) to dump into multiple log files (to multiple disks or disk channels if necessary).
  • Run a pile of processes to process the log files independently.
  • Consolidate the data - either as post processing or incrementally via some form of IPC.
  • The choice of language is immaterial, but history would probably vote for perl or shell goop
This smacks of the type of solutions Tim sees existing in most IT shops. It's blunt. It's probably sufficient enough to allow us to move on the next point of pain. It's almost completely devoid of any interest from a software engineering perspective.

Streams and Trigger: (mentioned in the conversation comments)
  • Hook into the log stream(s)
  • Spawn readers for the various data collection functions
  • Send events from the log stream(s) to the readers, processing the data as it's received
This is generally a solved problem using any one of a variety of existing programming languages/tools.

Neither of these two solutions are particularly interesting, but I imagine they are the most likely to be implemented in the wild.

My final offering is more of a meta solution.
  • Formulate a red herring idea
  • Pose it to a bunch of brainy people
  • Watch them chew on it
  • Gain new insight
Oh wait. I'm getting that deja vu feeling. :-)

The second interesting aspect of the conversation swarm is the rumination over the relationship between computer languages and the shift in cpu/system architectures.

One participant (sorry, can't recall the link) offered the suggestion that it's probably easier to improve a language like Erlang than it is to modify the mainstream languages to provide the capabilities inherent in Erlang.

I don't disagree with this point of view, but Tim's point regarding the widespread use of perl/awk/etc points to a fundamental fact in IT shops - the tool must be wickedly effective at getting the job done. Optimal performance is often optional.

So how to effectively use 64-1024 CPU machines?

First off, who says our currently technologies are effectively using the existing architectures? Follow things from the hardware up the application stack - it staggers the mind.

The reality is we seldom go back and fix. We come up with clever ways to incrementally capitalize on architectural changes. We reframe existing code in ways that take advantage of changes in architectures. I'm overgeneralizing somewhat, but no matter.

At the risk of sounding like a pessimist, I think we'll end up with thousands of little SOA web services engines. Each one handling a single piece. Each one with its own HTTP stack. Each one using PHP/Perl/Ruby/etc to implement the service functions. Each one sitting on top of a tiny little mysql database. Eeeep! I just scared myself - better drop this line of thought. I'll have nightmares for weeks.

The third interesting aspect of the conversation is how it shows some of the most important characteristics of the modern concept of networks vs. groups. It's decentralized, it's unlikely to be swayed by an alpha geek, it creates a variety of unanticipated results, it's a bit messy, and it provides fertile ground for exploring the topic at some point in the future.

Good stuff!

2 comments:

mastermark said...

"... thousands of little SOA web services engines. Each one handling a single piece. Each one with its own HTTP stack. Each one using PHP/Perl/Ruby/etc to implement the service functions. Each one sitting on top of a tiny little mysql database."

"... decentralized ... it creates a variety of unanticipated results, it's a bit messy, and it provides fertile ground for exploring the topic at some point in the future."

Aloof, look closely at those 2 quotes (emphasis is mine). I say: isomorphic. I think the great challenge for us is how to find a way to enable and allow that paradigm, as a welcome and valid part of our EA -- how can we build systems this way without resulting in you having nichtmares? I think we're past the point where we can pretend that this paradigm is not, and will not be, the predominant one. I think that all efforts to "return" to a centralized, comand-and-control paradigm ("return" in quotes, 'cause I think this was always an illusion) are a clear waste of time.

Aloof Schipperke said...

Your comment prompted me to think about it a bit more.

You're probably right.

It might be be interesting to combine concepts from cluster computing and embedded computing.

Cluster computing has a strong tendency to view machines in sets. Embedded computing has a strong appreciation for requisite smallness.

I might post a regular blog article with a bit more elaboration, since this little comment edit box is annoying me...