Software developement.: 2008

Reading this article about the functional aspects of Python made me ponder about the meaning of the 'map' function.

I can understand the instruction 'map this function to a list', meaning 'apply this function' to every entry of the list:

In [1]: result = map(lambda x : x + 1, [1,2,3,4,5])

In [2]: print result
[2, 3, 4, 5, 6]

What I failed to comprehend is: how does this relate to the 'map' part

of the map/reduce algorithm (hadoop) ?
Here, mapping a list aims at splitting the list into smaller buckets,

then applying an algorithm to this list of lists.

Wikipedia to the rescue:

I was relieved to read that the semantics attached

 to the infamous Google algorithm are not the same as

the original functional programming concepts.

For the most hurried of you, this is a summary of an article by Michael Nielsen.

Part I: Toward a more open scientific culture

How can the internet benefit science?

Why were Hooke, Newton, and their contemporaries so secretive? In fact, up until that time, discoveries were routinely kept secret. [...] A secretive culture of discovery was a natural consequence of a society in which there was often little personal gain in sharing discoveries.

The great scientific advances of the time motivated wealthy patrons such as the government to begin subsidizing science as a profession. Much of the motivation came from the public benefit delivered by scientific discovery, and that benefit was strongest if discoveries were shared. The result was a scientific culture which to this day rewards the sharing of discoveries with jobs and prestige for the discoverer.

The journal system is perhaps the most open system for the transmission of knowledge that could be built with 17th century media. The adoption of the journal system was achieved by subsidizing scientists who published their discoveries in journals. This same subsidy now inhibits the adoption of more effective technologies, because it continues to incentivize scientists to share their work in conventional journals, and not in more modern media.

Observation #1: A failure of science online: online comment sites.

Nature Magazine's final report about their unsuccessful trial of open commentary

There was a significant level of expressed interest in open peer review… A small majority of those authors who did participate received comments, but typically very few, despite significant web traffic. Most comments were not technically substantive. Feedback suggests that there is a marked reluctance among researchers to offer open comments.

(This reluctance may be related to:
- fear that the comments would offend the author, who might also be an anonymous referee in a position to scuttle your next paper or grant application.
- lack of incentive to write such reviews, while you could be working on something more "useful", like writing a paper or a grant?
)

Observation #2 :A failure of science online: Wikipedia

Some scientists will object that contributing to Wikipedia isn’t really science. And, of course, it is not, if you take a narrow view of what science is, if you’ve bought into the current game, and take it for granted that science is only about publishing in specialized scientific journals. But if you take a broader view, if you believe science is about discovering how the world works, and sharing that understanding with the rest of humanity, then the lack of early scientific support for Wikipedia looks like a lost opportunity.

Nowadays, Wikipedia’s success has to some extent legitimized contribution within the scientific community. But how strange that the modern day Library of Alexandria had to come from outside academia.

=> Action

We should aim to create an open scientific culture where as much information as possible is moved out of people’s heads and labs, onto the network, and into tools which can help us structure and filter the information.

Ideally, we’ll achieve a kind of extreme openness. This means: making many more types of content available than just scientific papers; allowing creative reuse and modification of existing work through more open licensing and community norms; making all information not just human readable but also machine readable; providing open APIs to enable the building of additional services on top of the scientific literature, and possibly even multiple layers of increasingly powerful services. Such extreme openness is the ultimate expression of the idea that others may build upon and extend the work of individual scientists in ways they themselves would never have conceived.

To create an open scientific culture that embraces new online tools, two challenging tasks must be achieved: (1) build superb online tools; and (2) cause the cultural changes necessary for those tools to be accepted.

Examples of this change happening:arXiv and SPIRES

Part II: Collaboration Markets: building a collective working memory for science

The problem of collaboration

For most scientists, research projects spontaneously give rise to problems in areas in which one isn’t expert. A scientic then needs to ask a fellow for assistance. Unfortunately, expert attention, the ultimate scarce resource in science, is very inefficiently allocated under existing practices for collaboration.

An extremely demanding creative culture already exists, which shows that a collaboration market is feasible - the culture of free and open source software. Scientists browsing for the first time through the development forums of open source programming projects are often shocked at the high level of the discussion. They find professional programmers routinely sharing their questions and ideas, helping solve each other’s problems. Some of the world’s best programmers hang out in these forums, swapping tips, answering questions, and participating in the conversation.

How can scientists collaborate efficiently ?
2 examples: FriendFeed and Innocentive.
An efficient collaboration market would enable Alice and Bob to find this common interest, and exchange their know-how, in much the same way eBay and craigslist enable people to exchange goods and services.
An ideal collaboration market will enable just such an exchange of questions and ideas. It will bake in metrics of contribution so participants can demonstrate the impact their work is having. Contributions will be archived, timestamped, and signed, so it’s clear who said what, and when. Combined with high quality filtering and search tools, the result will be an open culture of trust which gives scientists a real incentive to outsource problems, and contribute in areas where they have a great comparative advantage. This will change science.

Links:
Reproducible research at EPFL

Machine-readable Open Access scientific publishing

One big lab
University of Cambridge

Software developement.

dimanche 2 novembre 2008

Map / reduce

lundi 4 août 2008

The future of science.

Part I: Toward a more open scientific culture

How can the internet benefit science?

Part II: Collaboration Markets: building a collective working memory for science

The problem of collaboration

Machine-readable Open Access scientific publishing

mercredi 30 juillet 2008

Four harmful Java idioms, and how to fix them.

In conclusion

mardi 29 juillet 2008

Unit testing within a container.

vendredi 25 juillet 2008

Python URL Handling (HTTP)

Scratch - An educational programming language.

Ma liste de blogs

Archives du blog

Qui êtes-vous ?