Monday, November 24, 2003
Bayesian Categorisers and Preference Maps

Jon Udell writes: "There's been some discussion in the blog world about using a Bayesian categorizer to enable a person to discriminate along various interest/non-interest axes. I took a run at this recently and, although my experiments haven't been wildly successful, I want to report them because I think the idea may have merit...We know that autocategorization succeeds in the narrow domain of spam filtering. Whether it can succeed more generally -- for example, by helping blog authors and readers manage flows of items -- is yet unclear. The raw tools are available, but until they're well integrated into authoring and reading software, it will be hard to get a good sense of what's possible."

Some additional thoughts from Udell:


First, from the perspective of a blog author who already categorizes content (as many do), the question is: can effort that's already being invested pay more dividends? An automated review of things that have been already been categorized can help you sharpen your sense of the structure you are building. A prediction about how to categorize a newly-written item can be interesting and helpful too. As I worked through the exercise, I could (at times) imagine the software to be acting like a person you'd bounce an idea off of. "I can see why you choose that category," we can imagine it saying, "but for what it's worth, it has a lot in common with these items in this other category."

The second and even more speculative idea would be to create subscribable filters. Consider the set of items that I write myself, and categorize under, say, web_services. Some other set of items out there in the blogosphere, written by other folks, will tend to cluster with mine. Could we say that those other items have some affinity for "Jon's take on Web services"? And if so, by subscribing to my text-frequency database for that category could you use it to create one view of your own inbound feeds, or to suggest ones you're not reading?


Matt Mower follows it up with an interesting thought: "What might be interesting is if people could "share" and "subscribe to" preference maps. As a new user of the system you might not really know who is relevant on any particular topic. But imagine you worked with David Weinberger, Phil Wolff, or Dan Gillmor. If you knew them and trusted their judgement you could pick one of their preference maps as a starting point and immediately gain a usseful insight into the data as it is structured by topic. You might even switch between personalities to get more perspective!"

Recipe Web

Les Orchard has some interesting ideas on building out [1 2] a microcontent client for recipes, based on RecipeML: "The real strength in a recipe web would come from cooking bloggers. Supply them with tools to generate RecipeML, post them on a blog server, and index them in an RSS feed. Then, geeks get to work building the recipe aggregators...Since I'd really like to play with some RDF concepts, maybe I'll write some adaptors to munge RecipeML and MealMaster into RDF recipe data. Cross that with FOAF and other RDF whackyness, and build an empire of recipe data."

A respone from Troy Hakala:


We (Recipezaar) wrote a natural language recipe parser to make this possible and it's a difficult job.

Imagine a world of XML recipes distributed around the web on weblogs. An aggregator would need to aggregate millions of weblogs just to cull together a few hundred or thousand recipes. Now imagine millions of aggregator users doing this daily or hourly the way they do this today for weblogs. And if a weblogger had 1,000 recipes on their weblog archives, they wouldn't want millions of aggregators eating their bandwidth every day to maintain the database for each individual using an aggregator (webloggers today already complain about aggregators costing them too much money in bandwidth costs). Additionally, 99.999% of people who create recipes are unlikely to have a weblog to post their XML recipes so you'd lose the majority of the potential content.

A centralized repository provides a place for regular users to post their recipes and get them seen by the most number of people. And a centralized repository provides an easy way to search for recipes, browse for recipes, review & rate recipes, discuss recipes, etc. And let's talk numbers.... today, Recipezaar has 73,000 recipes in the database and, while it's the largest database of recipes on the internet, people still can't find a particular recipe because there is an infinite number of possible recipes that can be created. Having a few hundred or a few thousand recipes is not a useful database to people. More is better. And acquiring more via an aggregator is a big and expensive job.

Distributed databases are useful in some contexts and centralized databases are useful in other contexts. Each has their own advantages and disadvantages, but like auctions, recipes are best stored centrally where everyone has access to them.


Adds Orchard:

If the people behind RecipeZaar like the idea, is to borrow their parser via web service for use in my hypothetical MovableType plugin. This could also be used for any number of other blogging tools. On the upside, we get the benefit of all the work done by Troy and company, and they get to pull in more recipes. On the downside, we’re dependant on a web service not under our control for the basic functionality of this plugin.

I’m excited to see more varieties of micro-content shared between the people of the web, but the thing I see least talked about is how this stuff will be authored. I read about data formats and all that, but in terms of user interface, we haven’t progressed much past the HTML textarea. Also, I often see handwaving and assumptions that the content is really pretty simple -- but as Troy Hakala would tell you, not even something as “simple” as a recipe is a slam dunk in terms of digestion by a machine. There needs to be some happy medium between a natural human expression of information, and the rigorous structuring required by a machine, mediated by good user interface.


As I read all this, I couldn't help thinking that we need is an Information Marketplace. I think I have to speed up the thinking and just get it done. There are many areas I can now think of applying it: for SMEs to find each other, an IndiaMirror and now recipes.

Enterprise Blogging and RSS Ideas

From Robert Scoble...


For instance, I have a vision of a day when every single Microsoft employee will have a weblog. Now, what happens when you have 55,000 people weblogging inside of a corporation? Well, for one, I want to see weblogs in different ways? Why shouldn't it be possible to see results from a search engine in order of where you are on the org chart, for instance? So, how can you match RSS data up with your domain data that's stored in Exchange and/or other corporate data stores?

How about seeing data from corporate webloggers based on revenues? Or other metrics?

Also, one thing I miss is being able to tell readers what I think are my most important items. Look at the function of a newspaper designer. That guy plays a huge amount of value. Look at your average newspaper. You know that the biggest and top-most headline is what the newspaper has decided is the most important story. But, in weblogging we don't have that ability. You get my 60 posts and you have no idea which ones of those 60 that I think are most important.

In fact, you not only don't have any idea which ones I find are most important, but you have no idea which ones my readers think are most important. The only clue you have is how many comments, or how many links a certain article has (and discovering how many links a certain article has is very tough unless I enable trackback which I haven't done cause it slowed down my page loads and had other problems).


...and Mitch Radcliff:

Robert is very articulate -- one has to be inside Microsoft, the institutional equivalent of a Darwinian pool -- about how the ability to discover what content is new is one of the key features of blogging. It doesn't exist in other Web page layouts or within corporate applications where many people may be performing the same queries and need to know about similar interests/concerns visually; this is the heart of all the talk about the semantic Web. It's simple in blogging to find what is new and, through trackback, what's capturing attention, either the new content is at the top of the page or it is in the most recent RSS feed. That's probably the most important benefit of what blogs have done, making it easy to author, share and debate information; it will obviously migrate into other applications, which is where the leading edge will be when everyone "gets" blogging as it is today.

In a page layout, which is how most people and organizations demonstrate what information is most important, there are structural, design and semantic elements we understand: "Important information ss placed at the top of the page, yet a story may stay "important" longer after its initial publication, a characteristic lost in blogging, which replaces the last "top story" with another based on chronological posting; the size and word choice in headlines convey a great deal of information, which is lost in an RSS feed.

So, we were speculating about the need for an RSS 3.0 that adds those features, including page placement metadata, so that the simplicity of blogging can be combined with the cues we're used to in page layout. Imagine a page layout where a new or changed story blinked or glowed momentarily after a page loaded to indicate that it is new, yet the page still looked like a newspaper, report or other standard page.

RSS 3.0 would need to include an interpreter that processed changes, like a wiki page does diffs; a page would, essentially, need to read its own RSS feed. The result would be a dramatically richer Web, not better blogging or a better browser in and of itself. Since desktop publishing has gone through this kind of evolution, not to mention the management of versioning in code, so that groups can share information in context, this seems like a natural direction to go. The simplicity and discoverability of blogs should migrate into harder to use applications.

It could also include trackback analysis to display what is being linked to most. Positive and negative sentiment could be recorded, too.


In fact, I think Traction would suit well - it has a nice feature which lets you create the equivalent of a Front Page for every user.

UNCTAD E-Commerce and Development Report

[via Smart Mobs] Here. "This new edition analyses, from a development perspective, recent trends and advances in information and communication technologies (ICT), such as e-commerce and e-business, and examines their applications in developing countries. The report proposes strategic options to assist developing countries in designing national policies to take advantage of ICT."

Emerging Markets | PermaLink | Comments (4)

With love comes strange currencies.

Posted by Cook Elena

The public is wonderfully tolerant. It forgives everything except genius.

Posted by London Daniel

Government is too big and too important to be left to the politicians.

Posted by Alegant Marci

Have no friends not equal to yourself.

Posted by Kingdon Jim
Modifying Information Offline

Adam Bosworth continues his description of how to build a web services browser in an intermittently connected world:

this new browser I'm imagining doesn't navigate across pages found on the server addressed by URL's. It navigates across cached data retrieved from Web Services. It separates the presentation - which consists of an XML document made up of a set of XHTML templates and metadata and signed script - from the content which is XML. You subscribe to a URL which points to the presentation. This causes the XML presentation document to be brought down, the UI to be rendered, and it starts the process of requesting data from the web services. As this data is fetched, it will be cached on the client. This fetching of the data normally will run in the background just as mail and calendar on the Blackberry fetch the latest changes to my mail and calendar in the background. The data the user initially sees will be the cached data. Other more recent or complete information, as it comes in from the Internet, will dynamically "refresh" the running page or, if the page is no longer visible, will refresh the cache.

I recommend that the model is that, in general, data isn't directly modified. Instead, requests to modify it (or requests for a service) are created. For example, if you want to book a restaurant, create a booking request. If you want to remove a patient from a clinical trial, create a request to do so. If you want to approve an expense report, create a request to approve it. Then relate these requests to the item that they would modify (or create) and show, in some iconographical manner, one of 4 statuses:
1) A request has been made to alter the data but it hasn't even been sent to the internet.
2) A request has been sent to the Internet, but no reply has come back yet.
3) The request has been approved
4) The request has been denied.

the important thing is that it works really well even when the connection is poor because all changes respond immediately by adding requests, thus letting the user continue working, browsing, or inspecting other related data. By turning all requests to alter data into data packets with the request, the user interface can also decide whether to show these overtly (as special outboxes for example or a unified outbox) or just to show them implicitly by showing that the altered data isn't yet "final" or even not to alter any local data at all until the requests are approved.

TECH TALK: An Entrepreneur’s Attributes: Experimentation – Trying New Things

An entrepreneur must be an experimenter, constantly trying out different things and exploring alternate avenues. Many of the experiments may fail, but out of these will arise learnings. Experimentation is what leads to innovation.

Inc has a review of a new book by Stefan Thomke on this very topic: “Experimentation Matters”. Inc summarises the six principles outlined by Thomke on managing the experimentation process:


1. Anticipate and exploit early information through "front-loaded" innovation processes. Thomke explains how there is much value in finding potential failures as early as possible. Considering the vast expense of late-stage failures, whether they are in drug experiments, software development, automobile crash simulations, or aircraft development, using new technologies early in R&D projects helps teams avoid potential problems downstream. Examples from Microsoft, Boeing and Toyota show how millions of dollars can be saved through early experimentation.

2. Experiment frequently but do not overload your organization. Although many early tests can minimize problem-solving delays and costs of redesign, organizations must be ready to handle the increasing amount of information that the experimentation will bring. Thomke uses an extensive and detailed case study from BMW to highlight this principle.

3. Integrate new and traditional technologies to unlock performance. New technologies can create impressive results, but they are not perfect and are not stand-alone techniques. Thomke writes, "To unlock their potential, a company must understand not only how new and traditional technologies can coexist within such a process but also how they enhance and complement each other."

4. Organize for rapid experimentation. The ability to experiment quickly is an important component to effective learning. Since virtual experimentation brings organizations information earlier, managers are able to use results to guide their decisions about the use of major resources and avoid reworking bad designs after a company has committed itself to them. Thomke shows how rapid experimentation helped BMW learn how to make cars safer.

5. Fail early and often but avoid "mistakes." New ideas are bound to fail, so early failures help to eliminate unfavorable options quickly and facilitate learning. Failures can produce new and useful information.

6. Manage projects as experiments. Leaders should have a portfolio of experimental projects from which they can learn that are managed with the same seriousness that is applied to other business processes. Using a project as a learning experiment and an agent of change can help a company investigate diverse concepts.


For me, experimentation is another word for entrepreneurship. Let me give a personal example. During my IndiaWorld days, we created 13 India-centric websites – 9 of these did not work, but 4 of them (Samachar, Khoj, Khel and Bawarchi). When we started, little did I know which ones would work and which would not. The approach we took was to try out our new ideas, and keep the cost of experimentation low, till we got preliminary feedback from our readers. We were willing to fail, and that is why we succeeded.

Tomorrow: Value-Added Aggregation, Knowledge

Related Entries:  [All]

Me
Entrepreneur, Mumbai, India, Emergic, Netcore, Internet, IndiaWorld, Sify, IIT-Bombay, ColumbiaUniv ... More [Write to Me]

- MyToday
- Emergic Ecosystem
- Netcore
- Emergic MailServ: Enterprise Messaging
- Emergic CleanMail: Anti-Virus, Anti-Spam
- BlogStreet: Blog Profiles, RSS Ecosystem
- Novatium: Network Computers
- SEraja: The EventWeb
- Rajshri Media: Broadband Portal
- Newsweek on Novatium (Feb 2007)
- Knowledge@Wharton Interview (Oct 2006)
- TIME Asia (Mar 2000)

Free SMS Updates
Indian mobile users can sms START EMERGIC to 9845398453 to get free daily updates on new additions. [To unsubscribe, sms STOP EMERGIC to 9845398453.]
My Writings
Affordable Computing and ICT for Development
India's Digital Infrastructure (May 2007)
Envisioning Tomorrow's World (Mar 2007)
Computing for the Next Billion (Jun 2006)
City Wi-Fi Networks (Apr 2006)
Microsoft Live (Nov 2005)
Internet Tea Leaves (Sep 2005)
Next-Generation Networks (Jul 2005)
Disruptions (Jul 2005)
The Mobile Phone Platform (Feb 2005)
Microsoft, Bandwidth and Centralised Computing (Jan 2005)
Computing for Broadband 101 (Jan 2005)
Tomorrow's World (Nov 2004)
CommPuting Grid (Nov 2004)
Massputers, Redux (Oct 2004)
The Network Computer (Oct 2004)
Reinventing Computing (Aug 2004)
Tech Trends (Jul 2004)
Letter to Arun Shourie (Apr 2004)
As India Develops (Mar 2004)
My Mental Model (Dec 2003)
The Next Billion (Sep 2003)
Transforming Rural India 2 (Jul 2003)
The Discovery of India (Jun 2003)
Transforming Rural India (Mar 2003)
The Rs 5,000 PC Ecosystem (Jan 2003)
Disruptive Bridges (Nov 2002)
India Post: Ideas for Tomorrow (Nov 2002)
Technology's Next Markets (Oct 2002)
Server-based Computing (Jul 2002)
India's Next Decade (Apr 2002)
The Digital Divide (Apr 2002)
The Real Wireless Revolution (Mar 2002)
Envisioning a New India (Jan 2002)
Emerging Technologies, Emerging Markets (Jan 2002)
The Indianised Linux Desktop (Nov 2001)
Mass Market Internet (Nov 2000)

Enterprise Software and SMEs
The Coming Age of ASPs (May 2005)
SMEs and Technology (Oct 2003)
The Death and Rebirth of Email (Aug 2003)
IT's Future (Aug 2003)
Rethinking the Desktop (Sep 2002)
Rethinking Enterprise Software (Jun 2002)
Emerging Enterprises and Emergent Networks (Mar 2002)
Web Services (Nov 2001)
Alt.Software (Oct 2001)
The Intelligent, Real-Time Enterprise (June 2001)
Enterprise Software (Mar 2001)
SME Tech Utility (Feb 2001)
Software and SMEs (Jan 2001)
The Intelligent Enterprise: Integrating CRM, SCM and EIP (Jan 2001)

Information Management
The Emerging Internet (May 2007)
The Now-New-Near Web (Sep 2006)
Mobile Internet (Aug 2006)
Video on the Internet (Jun 2006)
India Internet and Mobile (Feb 2006)
Rethinking Newspapers (Jan 2006)
Web 2.0 (Oct 2005)
The Future of Search (Mar 2005)
Web 2.0 Conference (Oct 2004)
Thinking A New Food Portal (Sep 2004)
Rethinking Search (Jan 2004)
India.com 2.0 (Jan 2004)
The Publish-Subscribe Web (Jun 2003)
Constructing the Memex (May 2003)
RSS, Blogs and Beyond (Feb 2003)
Blogging (Feb 2002)
Harnessing Information (Oct 2001)
News Refinery (May 2001)

Entrepreneurship
When Bad Things Happen (Jan 2007)
Ventures and Capital (Dec 2006)
15 Years as an Entrepreneur (Nov 2006)
Of Blue Oceans and Black Swans (May 2006)
Let's Build a Business (Apr 2006)
The Value of Vision (Mar 2006)
Vision and Worries (Oct 2005)
Bootstrapping a Business (Oct 2005)
India Needs More Entrepreneurs (Aug 2005)
Dotcom Nostalgia (Jun 2005)
When Things Go Wrong (Apr 2005)
My Life as an Entrepreneur (Nov 2004)
An Entrepreneur's Growth Challenge (Sep 2004)
Creating Options (Sep 2004)
From Employee to Entrepreneur (Aug 2004)
A Tale of Two Summers (Aug 2004)
Crucible Experiences (May 2004)
The Company (May 2004)
An Entrepreneur's Attributes (Nov 2003)
An Entrepreneur's Early Days (Sep 2003)
Reflections on Ideas and Entrepreneurship (Jul 2003)
Entrepreneur's Enigmas (Jan 2003)
The Entrepreneur's Delights (Sep 2002)
Life as an Entrepreneur (Oct 2001)
Leadership Lessons from Lagaan (Aug 2001)
Entrepreneurial Learnings (July 2001)
Entrepreneurship (Mar 2001)
The IndiaWorld Story (1997-8)

Abhishek (my son)
Photos
Letter to a Two-Year-Old (Apr 2007)
Father to Son (Apr 2006)
Letter to a 2005 Baby (Jun 2005)
The Making of Abhishek (Jul 2005)

Moreover
Facebook (May 2007)
Doing Education Right (May 2007)
Reflections from a Dubai Trip (Apr 2007)
Creating India's New Cities (Apr 2007)
India's Challenges (Mar 2007)
3GSM 2007 (Feb 2007)
Demo 2007 (Feb 2007)
A Tale of Two Covers (Feb 2007)
3GSM Mumbai (Feb 2007)
2007 Tech Trends (Jan 2007)
The Best of 2006 (Dec 2006)
Best of Tech Talk 2006 (Dec 2006)
Cyworld (Nov 2006)
Two 2.0 Events (Nov 2006)
Two-Sided Markets (Nov 2006)
The Rise of YouTube (Oct 2006)
Gandhigiri (Oct 2006)
Education and Reservation (May 2006)
Four Blog Years (May 2006)
Fooled by Randomness (May 2006)
Blue Ocean Strategy (May 2006)
Revolution on the Roads (Apr 2006)
The MySpace Story (Mar 2006)
A Presentation at PC Forum (Mar 2006)
Extreme Competition (Mar 2006)
3GSM World Congress 2006 (Feb 2006)
DEMO 2006 (Feb 2006)
India Rising (Jan 2006)
2006 Tech Trends (Jan 2006)
The Best of Tech Talk 2005 (Dec 2005)
The Best of 2005 (Dec 2005)
Trains, Planes and Mobiles (Dec 2005)
Peter Drucker: Management's Newton (Nov 2005)
India Empowered (Oct 2005)
Rajasthan Ruminations 2 (Sep 2005)
Building a Better India (Sep 2005)
South Korea's IT839 (Jul 2005)
Shift-Ctrl (Jul 2005)
Best of Future Tech (Feb 2005)
Multi-Model Minds (Feb 2005)
The Best of 2004 (Jan 2005)
On Watching Swades (Jan 2005)
The Best of Tech Talk 2004 (Dec 2004)
India Trends (Dec 2004)
An American Journey (Aug 2004)
Black Swans (Aug 2004)
A Train Journey (Jun 2004)
An Agenda for the Next Government (May 2004)
Two Blog Years (May 2004)
Rajasthan Ruminations (Feb 2004)
Technology and the Indian Elections (Feb 2004)
2003-04 (Dec 2003)
Random Musings (Sep 2003)
Useful Concepts (July 2003)
Dear Non-Resident Indian (July 2003)
Tech's 10X Tsunamis (July 2002)
An Indian in China (Mar 2002)
Disruptive Technologies (Aug 2001)
Innovation (Aug 2001)
Good Books

- My Business Standard columns
- More columns at Tech Samachar

Presentations
- TiE Bangalore (Dec 2004)
- BangaloreIT.com (Nov 2004)
- CIT 2004 (Jan 2004)
- BangaloreIT.com (Nov 2003)
- Pune CSI Open-Source Workshop (Sep 2003)
- Sydney ICT Workshop (Jul 2003)
- Netcore (Mar 2003)
- Emergent Democracy (MP Govt, Feb 2003)
- Vision for Digitally Bridged India (Dec 2002)
- India Post (Nov 2002)
- Open-Source for eGovernance (Oct 2002)
Recent Entries
Archives
BlogStreet
Syndicate
Powered by
Movable Type 2.21


Main - Feedback
© Rajesh Jain