cloudseer + syndication_technology 1
Real-time, Distributed Conversations: Some Thoughts on the Salmon Protocol
november 2009 by cloudseer
Last week John Panzer, who works on Blogger at Google, wrote about some of the work
he’s been doing on creating a protocol for syndicating comments associated with activity
streams in his post The
Salmon Protocol: Introducing the Salmon Project. Key parts of his post are excerpted
below
A few days ago, at the Real
Time Web Summit, we had a session about Salmon,
a protocol for re-aggregated distributed conversations around web content. I
was hoping for some feedback and to generate some interest, and I was overwhelmed
by the positive reactions, especially after Louis Gray's post "Proposed
Salmon Protocol aims to unify Conversations on the Web". Adina Levin's "Salmon
- Re-assembling distributed conversations" is a good, insightful
review as well. There's clearly a great deal of interest in this, and so I've gone
ahead and expanded Salmon's home at salmon-protocol.org with
an open source project, salmon-protocol.googlecode.com,
and a mailing list, groups.google.com/group/salmon-protocol.
Louis Gray’s post on the topic includes an embedded presentation which captures the
essence of the protocol
Before talking about the technical details of the protocol it is a good idea to understand
the end user problem the protocol solves. For me, it solves a problem I have in the
way that RSS
Bandit integrates with Facebook. The problem is that although there is a way to
get regular updates on changes to the user’s news feed by polling Facebook’s stream
and getting
data back in the Activity Stream format there isn’t a mechanism today to get updates
on the comments on items in the feed. What it means in practice today is that once
an item rolls off of the news feed, there is no way to keep the comments up to date
in RSS Bandit.
The Salmon Protocol aims to address this problem by piggybacking on PubSubHubBub as
a way for applications to get real-time updates on comments on items in an activity
stream not just updates on new activities.
There have also been several mentions of Salmon being a way to aggregate distributed
conversations on an item (e.g. this blog post is syndicated to FriendFeed and
there are comments there as well as in the comments on my blog) but I am less clear
on those scenarios or whether Salmon is enough to solve the various tough problems
that need to be solved to make that work end to end.
Any API for posting comments to a site needs to solve two problems; identity and dealing
with comment spam. I decided to take a look at the Salmon
Protocol Summary to see how it addresses these problems.
The meat of the Salmon Protocol format is excerpted below
A source provides an RSS/Atom feed of content. It includes a Salmon link in its
feed:
<link rel="salmon" href="http://example.org/salmon-endpoint"/>
An aggregator reads the feed (ideally via a push mechanism such as PubSubHubbub),
and sees from the link that it is Salmon-enabled. It remembers the endpoint URL for
later use.
When an aggregator's user leaves a comment on a feed item, the aggregator stores
the comment as usual, and then also POSTs a salmon version of it to the source's Salmon
endpoint:
POST /salmon-endpoint HTTP/1.1
Host: example.org
Content-Type: application/atom+xml
<?xml version='1.0' encoding='UTF-8'?>
<entry xmlns='http://www.w3.org/2005/Atom'>
<author>
<name>John Doe</name>
<uri>acct:johndoe@aggregator-example.com</uri>
</author>
<content>Yes, but what about the llamas?</content>
<id>tag:aggregator-example.com,2009:cmt-441071406174557701</id>
<updated>2009-09-28T18:30:02Z</updated>
<thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0'
ref='tag:example.org,1999:id-22717401685551851865'/>
<sal:signature xmlns:sal='http://salmonprotocol.org/ns/1.0'>
e55bee08b4c643bc8aedf122f606f804269b7bc7
</sal:signature>
<title/>
</entry>
The commenter is identified in the published comment using the atom:uri
element. How this author is authenticated in situations outside of public comments
on a blog such as RSS Bandit posting a comment to Facebook on my behalf isn’t really
discussed. I noticed an offhand reference to OAuth headers which seems to imply
that the publishing application should also be sending authentication headers as well
when publishing the comment. How these authentication headers would flow through the
systems involved is unclear to me especially given the approach Salmon has taken to
deal with spam prevention.
The workflow for dealing with spam comments is described as follows
A major concern with this type of distributed protocol is how to prevent spam
and abuse. Salmon provides building blocks to allow in-depth defense against
attacks. Specifically, every salmon has a verifiable author and user agent.
The basic security flow when salmon swims upstream looks like this:
aggregator-example.com: "Here is a salmon, authored and signed by
'acct:johndoe@aggregator-example.com'; please accept it."
Recipient: "I know that this is really aggregator-example.com due
to its OAuth headers, and it has a good reputatation, but I do not trust it completely;
I will do a double check."
Recipient: Uses Webfinger/XRD to discover salmon validation service for
acct:johndoe@aggregator-example.com, which turns out to be hosted by aggregator-example.com.
Recipient: "Given that johndoe has delegated Salmon validation to
aggregator-example, and I know I'm talking to aggregator-example already, I'll skip
the actual check." (Returns HTTP 200 to aggregator-example.com)
The flow can get more complicated, especially if the aggregator is not also providing
identity services for the user. In the most general case, the recipient needs
to take the salmon, discover a salmon validator service for the author via XRD discovery
on the author's URI, and POST the salmon to the validator service. The validator service
does an integrity / signature check against the salmon and returns 200 if the salmon
checks out, 400 if not. The signature check means that the given author (johndoe
in this case) signed the salmon with the given id, parent id, and timestamp.
It does not attempt to do a full, XML-DSig style verification, though such a service
is another reasonable extension.
This flow seems weird and it is unclear to me that it actually solves the problems
involved in distributed commenting. So let’s say I post a comment to Facebook from
RSS Bandit, in step 3 above they are now supposed to use WebFinger to
lookup my email address provider and determine which service I use for digitally signing
comments. Then they ask it if the comment looks like it was from me.
Hmm, this looks like a user authentication workflow in disguise as a comment validation
workflow. Shouldn’t the service receiving the comment (i.e. Facebook) in the example
above be responsible for validating my identity not some third party service? Maybe
this protocol wasn’t meant for sites like Facebook?
Let’s say this protocol is really meant for situations when the comment recipient
doesn’t intend to be the sole identity provider such as commenting on Robert
Scoble's blog where he allows comments from anyone with just an email address
and an optional web page URL as identifiers. So each commenter needs to provide an
email address on an email service provider that supports WebFinger and validates
digital signatures in the specific situation related to the Salmon protocol? Sounds
like boiling the ocean. I wonder why this can’t work with OpenID validation or some
other authentication protocol that has already been validated by developers and is
seeing some adoption?
At the end of the day, I think the problem Salmon attempts to solve is one that needs
solving as activity streams become a more popular and intrinsic feature across the
Web. However in its current form it’s hard for me to see how it actually solves the
real problems that exist today in a practical way.
Of course, this may just be my misunderstanding of the protocol documents currently
published and I look forward to being corrected by one of the protocol gurus if that
is the case.
Now
Playing: Chris
Brown - I
Can Transform Ya (feat. Lil Wayne)
Social_Software
Syndication_Technology
shared
from google
he’s been doing on creating a protocol for syndicating comments associated with activity
streams in his post The
Salmon Protocol: Introducing the Salmon Project. Key parts of his post are excerpted
below
A few days ago, at the Real
Time Web Summit, we had a session about Salmon,
a protocol for re-aggregated distributed conversations around web content. I
was hoping for some feedback and to generate some interest, and I was overwhelmed
by the positive reactions, especially after Louis Gray's post "Proposed
Salmon Protocol aims to unify Conversations on the Web". Adina Levin's "Salmon
- Re-assembling distributed conversations" is a good, insightful
review as well. There's clearly a great deal of interest in this, and so I've gone
ahead and expanded Salmon's home at salmon-protocol.org with
an open source project, salmon-protocol.googlecode.com,
and a mailing list, groups.google.com/group/salmon-protocol.
Louis Gray’s post on the topic includes an embedded presentation which captures the
essence of the protocol
Before talking about the technical details of the protocol it is a good idea to understand
the end user problem the protocol solves. For me, it solves a problem I have in the
way that RSS
Bandit integrates with Facebook. The problem is that although there is a way to
get regular updates on changes to the user’s news feed by polling Facebook’s stream
and getting
data back in the Activity Stream format there isn’t a mechanism today to get updates
on the comments on items in the feed. What it means in practice today is that once
an item rolls off of the news feed, there is no way to keep the comments up to date
in RSS Bandit.
The Salmon Protocol aims to address this problem by piggybacking on PubSubHubBub as
a way for applications to get real-time updates on comments on items in an activity
stream not just updates on new activities.
There have also been several mentions of Salmon being a way to aggregate distributed
conversations on an item (e.g. this blog post is syndicated to FriendFeed and
there are comments there as well as in the comments on my blog) but I am less clear
on those scenarios or whether Salmon is enough to solve the various tough problems
that need to be solved to make that work end to end.
Any API for posting comments to a site needs to solve two problems; identity and dealing
with comment spam. I decided to take a look at the Salmon
Protocol Summary to see how it addresses these problems.
The meat of the Salmon Protocol format is excerpted below
A source provides an RSS/Atom feed of content. It includes a Salmon link in its
feed:
<link rel="salmon" href="http://example.org/salmon-endpoint"/>
An aggregator reads the feed (ideally via a push mechanism such as PubSubHubbub),
and sees from the link that it is Salmon-enabled. It remembers the endpoint URL for
later use.
When an aggregator's user leaves a comment on a feed item, the aggregator stores
the comment as usual, and then also POSTs a salmon version of it to the source's Salmon
endpoint:
POST /salmon-endpoint HTTP/1.1
Host: example.org
Content-Type: application/atom+xml
<?xml version='1.0' encoding='UTF-8'?>
<entry xmlns='http://www.w3.org/2005/Atom'>
<author>
<name>John Doe</name>
<uri>acct:johndoe@aggregator-example.com</uri>
</author>
<content>Yes, but what about the llamas?</content>
<id>tag:aggregator-example.com,2009:cmt-441071406174557701</id>
<updated>2009-09-28T18:30:02Z</updated>
<thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0'
ref='tag:example.org,1999:id-22717401685551851865'/>
<sal:signature xmlns:sal='http://salmonprotocol.org/ns/1.0'>
e55bee08b4c643bc8aedf122f606f804269b7bc7
</sal:signature>
<title/>
</entry>
The commenter is identified in the published comment using the atom:uri
element. How this author is authenticated in situations outside of public comments
on a blog such as RSS Bandit posting a comment to Facebook on my behalf isn’t really
discussed. I noticed an offhand reference to OAuth headers which seems to imply
that the publishing application should also be sending authentication headers as well
when publishing the comment. How these authentication headers would flow through the
systems involved is unclear to me especially given the approach Salmon has taken to
deal with spam prevention.
The workflow for dealing with spam comments is described as follows
A major concern with this type of distributed protocol is how to prevent spam
and abuse. Salmon provides building blocks to allow in-depth defense against
attacks. Specifically, every salmon has a verifiable author and user agent.
The basic security flow when salmon swims upstream looks like this:
aggregator-example.com: "Here is a salmon, authored and signed by
'acct:johndoe@aggregator-example.com'; please accept it."
Recipient: "I know that this is really aggregator-example.com due
to its OAuth headers, and it has a good reputatation, but I do not trust it completely;
I will do a double check."
Recipient: Uses Webfinger/XRD to discover salmon validation service for
acct:johndoe@aggregator-example.com, which turns out to be hosted by aggregator-example.com.
Recipient: "Given that johndoe has delegated Salmon validation to
aggregator-example, and I know I'm talking to aggregator-example already, I'll skip
the actual check." (Returns HTTP 200 to aggregator-example.com)
The flow can get more complicated, especially if the aggregator is not also providing
identity services for the user. In the most general case, the recipient needs
to take the salmon, discover a salmon validator service for the author via XRD discovery
on the author's URI, and POST the salmon to the validator service. The validator service
does an integrity / signature check against the salmon and returns 200 if the salmon
checks out, 400 if not. The signature check means that the given author (johndoe
in this case) signed the salmon with the given id, parent id, and timestamp.
It does not attempt to do a full, XML-DSig style verification, though such a service
is another reasonable extension.
This flow seems weird and it is unclear to me that it actually solves the problems
involved in distributed commenting. So let’s say I post a comment to Facebook from
RSS Bandit, in step 3 above they are now supposed to use WebFinger to
lookup my email address provider and determine which service I use for digitally signing
comments. Then they ask it if the comment looks like it was from me.
Hmm, this looks like a user authentication workflow in disguise as a comment validation
workflow. Shouldn’t the service receiving the comment (i.e. Facebook) in the example
above be responsible for validating my identity not some third party service? Maybe
this protocol wasn’t meant for sites like Facebook?
Let’s say this protocol is really meant for situations when the comment recipient
doesn’t intend to be the sole identity provider such as commenting on Robert
Scoble's blog where he allows comments from anyone with just an email address
and an optional web page URL as identifiers. So each commenter needs to provide an
email address on an email service provider that supports WebFinger and validates
digital signatures in the specific situation related to the Salmon protocol? Sounds
like boiling the ocean. I wonder why this can’t work with OpenID validation or some
other authentication protocol that has already been validated by developers and is
seeing some adoption?
At the end of the day, I think the problem Salmon attempts to solve is one that needs
solving as activity streams become a more popular and intrinsic feature across the
Web. However in its current form it’s hard for me to see how it actually solves the
real problems that exist today in a practical way.
Of course, this may just be my misunderstanding of the protocol documents currently
published and I look forward to being corrected by one of the protocol gurus if that
is the case.
Now
Playing: Chris
Brown - I
Can Transform Ya (feat. Lil Wayne)
november 2009 by cloudseer