dimanche 2 novembre 2008

Map / reduce

Reading this article about the functional aspects of Python made me ponder about the meaning of the 'map' function.

I can understand the instruction 'map this function to a list', meaning 'apply this function' to every entry of the list:
In [1]: result = map(lambda x : x + 1, [1,2,3,4,5])

In [2]: print result
[2, 3, 4, 5, 6]

What I failed to comprehend is: how does this relate to the 'map' part 
of the map/reduce algorithm (hadoop) ?
Here, mapping a list aims at splitting the list into smaller buckets, 
then applying an algorithm to this list of lists.

Wikipedia to the rescue: 
I was relieved to read that the semantics attached
 to the infamous Google algorithm are not the same as 
the original functional programming concepts.

lundi 4 août 2008

The future of science.

For the most hurried of you, this is a summary of an article by Michael Nielsen.

Part I: Toward a more open scientific culture

How can the internet benefit science?


Why were Hooke, Newton, and their contemporaries so secretive? In fact, up until that time, discoveries were routinely kept secret. [...] A secretive culture of discovery was a natural consequence of a society in which there was often little personal gain in sharing discoveries.

The great scientific advances of the time motivated wealthy patrons such as the government to begin subsidizing science as a profession. Much of the motivation came from the public benefit delivered by scientific discovery, and that benefit was strongest if discoveries were shared. The result was a scientific culture which to this day rewards the sharing of discoveries with jobs and prestige for the discoverer.

The journal system is perhaps the most open system for the transmission of knowledge that could be built with 17th century media. The adoption of the journal system was achieved by subsidizing scientists who published their discoveries in journals. This same subsidy now inhibits the adoption of more effective technologies, because it continues to incentivize scientists to share their work in conventional journals, and not in more modern media.

  • Observation #1: A failure of science online: online comment sites.
Nature Magazine's final report about their unsuccessful trial of open commentary
There was a significant level of expressed interest in open peer review… A small majority of those authors who did participate received comments, but typically very few, despite significant web traffic. Most comments were not technically substantive. Feedback suggests that there is a marked reluctance among researchers to offer open comments.
(This reluctance may be related to:
- fear that the comments would offend the author, who might also be an anonymous referee in a position to scuttle your next paper or grant application.
- lack of incentive to write such reviews, while you could be working on something more "useful", like writing a paper or a grant?
)

  • Observation #2 :A failure of science online: Wikipedia

Some scientists will object that contributing to Wikipedia isn’t really science. And, of course, it is not, if you take a narrow view of what science is, if you’ve bought into the current game, and take it for granted that science is only about publishing in specialized scientific journals. But if you take a broader view, if you believe science is about discovering how the world works, and sharing that understanding with the rest of humanity, then the lack of early scientific support for Wikipedia looks like a lost opportunity.

Nowadays, Wikipedia’s success has to some extent legitimized contribution within the scientific community. But how strange that the modern day Library of Alexandria had to come from outside academia.



=> Action

We should aim to create an open scientific culture where as much information as possible is moved out of people’s heads and labs, onto the network, and into tools which can help us structure and filter the information.

Ideally, we’ll achieve a kind of extreme openness. This means: making many more types of content available than just scientific papers; allowing creative reuse and modification of existing work through more open licensing and community norms; making all information not just human readable but also machine readable; providing open APIs to enable the building of additional services on top of the scientific literature, and possibly even multiple layers of increasingly powerful services. Such extreme openness is the ultimate expression of the idea that others may build upon and extend the work of individual scientists in ways they themselves would never have conceived.


To create an open scientific culture that embraces new online tools, two challenging tasks must be achieved: (1) build superb online tools; and (2) cause the cultural changes necessary for those tools to be accepted.

Examples of this change happening:arXiv and SPIRES

Part II: Collaboration Markets: building a collective working memory for science

The problem of collaboration


For most scientists, research projects spontaneously give rise to problems in areas in which one isn’t expert. A scientic then needs to ask a fellow for assistance. Unfortunately, expert attention, the ultimate scarce resource in science, is very inefficiently allocated under existing practices for collaboration.


An extremely demanding creative culture already exists, which shows that a collaboration market is feasible - the culture of free and open source software. Scientists browsing for the first time through the development forums of open source programming projects are often shocked at the high level of the discussion. They find professional programmers routinely sharing their questions and ideas, helping solve each other’s problems. Some of the world’s best programmers hang out in these forums, swapping tips, answering questions, and participating in the conversation.

How can scientists collaborate efficiently ?
2 examples: FriendFeed and Innocentive.
An efficient collaboration market would enable Alice and Bob to find this common interest, and exchange their know-how, in much the same way eBay and craigslist enable people to exchange goods and services.
An ideal collaboration market will enable just such an exchange of questions and ideas. It will bake in metrics of contribution so participants can demonstrate the impact their work is having. Contributions will be archived, timestamped, and signed, so it’s clear who said what, and when. Combined with high quality filtering and search tools, the result will be an open culture of trust which gives scientists a real incentive to outsource problems, and contribute in areas where they have a great comparative advantage. This will change science.

Links:
Reproducible research at EPFL

Machine-readable Open Access scientific publishing

One big lab
University of Cambridge

mercredi 30 juillet 2008

Four harmful Java idioms, and how to fix them.

Nice discussion about OO design, and arguable conventions


In conclusion

I have argued in this article that four common Java idioms should be modified. The ultimate justification for such changes is that they will make code demonstrably easier to read, understand, and use -- and, in the process, they will exhibit more compassion for the mental experience of the reader. In the case of immutability and packaging style, they will also nudge you in the direction of improved design.

In summary, I suggest that the following idioms should be favored as the preferred style:

  • Use a naming convention to distinguish three kinds of data, not two: local variables, fields, and method arguments.
  • Prefer the package-by-feature style over package-by-layer.
  • Prefer immutable objects over JavaBeans.
  • Order items in a class in terms of decreasing scope, with private items appearing last.

mardi 29 juillet 2008

Unit testing within a container.

Problem: how can we test our code which relies on services provided by a JEE container ?

  • Write stubs which provide a fake version of these services
Drawback: writing these extra classes - as well as pluging them in - will require extra work. Moreover, the tested behaviour will not be the same as customer production code.

  • A better solution ?
Embed your unit tests in Tomcat:


vendredi 25 juillet 2008

Python URL Handling (HTTP)

Now this is another example of the expressiveness you get from a scripting language:

Get some HTTP resource using Python urllib:
 import urllib 
my_url='http://diveintomark.org/xml/atom.xml'
data = urllib.urlopen(my_url).read()
print data

Same thing in Java:

 public class HttpClient
{
public static void main(String[] args) throws IOException {
URL url = new URL("http://diveintomark.org/xml/atom.xml");
URLConnection con = url.openConnection();
con.connect();
HttpURLConnection httpConnex = (HttpURLConnection) con;
InputStream inputStream = httpConnex.getInputStream();
BufferedReader b = new BufferedReader(new InputStreamReader(inputStream));
String l="";
while ((l=b.readLine())!=null)
System.out.println(l);
}
}

Scratch - An educational programming language.

Scratch is a new programming language that makes it easy to create your own interactive stories, animations, games, music, and art -- and share your creations on the web.

http://scratch.mit.edu/

Consider it like a YouTube, but replace videos with game sequences.