NNSquad - Network Neutrality Squad

NNSquad Home Page

NNSquad Mailing List Information

 


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ NNSquad ] SSL vs. "Referers": Friend or Foe?



                   SSL vs. "Referers": Friend or Foe?

             http://lauren.vortex.com/archive/000895.html


In a recent posting in some other venues, I noted with pleasure that
Google is now testing the use of "SSL by default" for Google Search
( http://j.mp/nRuYTG [NNSquad] ).

In passing, I very briefly touched on the implications of SSL for
"referer" data that is traditionally passed along to Web sites when a
user clicks a link.

I received a surprisingly high level of diametrically opposed
reactions.  On one side, people were saying, "Good riddance!  Referers
are privacy invasive and never should have been implemented in the
first place!"

On the other hand, I also got many messages with claims along the
lines of, "This is just Google's attempt to ruin my analytics -- they
don't really care about privacy."

The latter assertion is the easier to address.  I've been talking to
Google folks for years about SSL issues, and there has been a
consistent desire to move their services toward this protection on a
default basis (as they've already done with Gmail and Google+).  The
collateral impact on referers has been an issue of concern all along,
and possible workarounds such as enhanced Webmaster Tools data and
other techniques have always been part of the discussions.

But the still largely status quo of "postcard security" data on the
Internet, where any entity -- commercial, government, or others -- who
have access to a data stream can read most information in the clear,
has become intolerable, and securing these paths to the extent
practicable must be viewed as an important priority.  For now, SSL is
a practical means to that end.

The "Good Riddance" reaction probably needs a bit more exploration.

Let's remember what "referers" (typically misspelled in this manner
due to an original misspelling in the HTTP specifications) really do.

When a user views info on a Web site, the associated site's logs will
typically record a variety of data regarding the connection, including
source IP address, various browser-related configuration information,
and other information -- most notably for our discussion the referer.

The referer is the URL of the page that contained the link that the
user clicked to reach the destination site -- the page that "referred"
the user.  In the case of a search results page, that referer will
usually including the user's search query as embedded in the URL
itself.

However, when a user click arrives via a site that was viewed through
SSL, the information that would otherwise normally have been relayed
(like the referer) will usually no longer appear.  Note however that
the IP address of the user will still be present.

The passing of referer information is a function not only of the sites
involved but also of the user's browser.  Various browser extensions
and plugins have long existed that allow users to optionally block
referers if they wish.

There are various reasons why referers were originally implemented.
One important one was to aid in session sequencing, since knowing the
full URL of the previous page -- that referring page -- could be
useful to maintaining session transactional states, especially in the
absence of more advanced methodologies that would further evolve
later.

Some critics of referers make the claim that only "snooping
businesses" are interested in such data, and so cutting it off would
harm nobody of real merit.

But this really is not true.  I believe if you took a poll, you'd find
that the vast majority of Web site operators -- including nonprofits,
individuals, and so on, not just commercial enterprises -- use referer
data to better understand what people find to be of interest on their
sites, and to have some sense of how their sites are being referenced
by the broader world.

I know that I find this data to be of significant interest, and I
don't run any ads or other monetizing elements on my blog.  While
there are other ways to discover relevant links over time, being able
to see immediately when there's a "flood" of hits referring from a
particular site (e.g., a Slashdot posting!) can be very important not
just as a point of knowledge but from a site management standpoint as
well.  Visible search terms in referers tell me what issues from my
postings are of particular worth to readers, and help me determine
followups and future emphasis.

Could I continue posting new items if all log referers suddenly
vanished?  Sure.  It would mean switching to more limited tools that
were less real-time in nature, like retrospective searching and such,
to try understand the dynamics of users viewing my site, but the
fundamental ability to run my blog would of course not be
significantly undermined.

But there would be a notable diminishing of the "value proposition"
between readers and the site.

While you may never have thought of them in this way, referers can be
viewed as something of an "equalizing" agent between large and small
Web sites.

When you conduct a search on a search engine, that site obviously
knows your query, so that they can provide you with a list of results.
You then usually visit sites based on that list, and (hopefully)
obtain the information of interest.  This transaction -- that
typically occurs without your being charged any fee by either party --
still has real value.

Questions: Is it unreasonable for the site that actually provides the
information that answers your query, to see the same data (the search
query itself) that the search engine itself had?  The search engine
must have the query to process your request, and can use this
information to improve its search results over time.  Is it reasonable
to argue that the actual content site should have the same opportunity
to improve its services through the use of this data?

These questions can certainly be argued either way.  I personally come
down on the side of best possible use of data in a responsible and
egalitarian manner whenever possible.

In any case, the increasing routine and default use of SSL, with the
many important benefits it brings, is likely moving the era of
traditional referers toward a gradual diminution and ultimately an
effective closure in many respects.  Other analytical mechanisms
(either existing or yet to be developed and deployed) will likely take
up some of the slack, and in some cases provide even greater insights.

But perhaps of even greater importance in the long run, is the reality
that questions surrounding the collection and use of transactional
data, even related to relatively routine operations on the Internet,
can be much more complex than they might appear at first glance, and
that seemingly obvious "simple" solutions (such as blanket
restrictions) may actually create or exacerbate far more problems than
they might solve.

This is true regardless of who is referering to ... I mean referring
to ... uh, *talking about* these issues!

--Lauren--
Lauren Weinstein (lauren@vortex.com): http://www.vortex.com/lauren
Co-Founder: People For Internet Responsibility: http://www.pfir.org
Founder:
 - Network Neutrality Squad: http://www.nnsquad.org
 - Global Coalition for Transparent Internet Performance: http://www.gctip.org
 - PRIVACY Forum: http://www.vortex.com
Member: ACM Committee on Computers and Public Policy
Blog: http://lauren.vortex.com
Google+: http://vortex.com/g+lauren
Twitter: https://twitter.com/laurenweinstein 
Tel: +1 (818) 225-2800 / Skype: vortex.com