Open Peer Review by a Selected-Papers Network

Update 2: we are working on implementing this, initially as an open peer review site for arXiv.  See the forum discussion here, and the code repository here.  You are invited to join this effort!

Update: this paper is now published, as a substantially shorter version (that’s a good thing!).  I suggest you read the published version; if you have comments or feedback you are welcome to post comments here.

Abstract

A Selected-Papers (SP) Network is a network in which researchers who read, write and review articles subscribe to each other based on common interests. Instead of reviewing a manuscript in secret for the Editor of a journal, each reviewer simply publishes his review (typically of a paper he wishes to recommend) on the SP network, which automatically forwards it to his subscribers. I present a three phase plan for building a basic SP network, discovering and measuring the detailed structure of research communities, and transforming the SP network itself into an effective publisher of research articles in areas that are not well-supported by existing journals. I show how the SP network provides a new way of measuring impact, catalyzes the emergence of new subfields, and accelerates discovery in existing fields, by providing each reader a fine-grained filter for high-impact.

Introduction

What is a Selected-Papers Network?

I wish to begin by immediately presenting the basic proposal of a Selected-Papers (SP) network, and then relate it to existing institutions and literature. I will briefly outline the basic idea and how it would work, in three distinct phases. I will present its benefits for two key constituencies, readers and referees. Later, I will analyze in depth the concept of impact, and show how the SP network provides a new way of measuring impact, catalyzes the emergence of new subfields, and accelerates discovery in existing fields, by providing each reader a fine-grained filter for high-impact. Finally, I analyze its benefits for the production of new research, in other words for its most important consituency, authors — the producers of all new research.

Conceptually, I propose only an exceedingly simple change: instead of reviewing a manuscript in secret for the Editor of a journal, a referee simply publishes his review (typically of a paper he wishes to recommend) on an open Selected Papers network, which automatically forwards his review to readers who have subscribed to his selected papers list because they feel his interests match their own, and trust his judgment. Note that this model works equally well for papers that have already been published by a traditional journal (in which case the SP network functions as a social network in which recommendations flow efficiently among people with overlapping interests), as well as for new manuscripts (in which case the author could choose to use the SP network as the official publisher of the paper). From the referee’s point of view, the only change is that he reviews papers in “his own name” (in public, and can earn reputation and influence for this work) rather than in secret for a journal.

I propose implementation of this concept in three distinct phases:

  • Phase I: the basic SP network. Building a place where reviewers can enter paper selections and post reviews, readers can search and subscribe to reviewers’ selections, and papers’ diffusion through research communities is automatically measured.
  • Phase II: discovering and measuring the detailed structure of scientific networks. The SP network will produce an unprecedented dataset consisting not only of the evolution of the subscription network (revealing sub-communities of people who share a common interest as shown by cliques who subscribe to each other), but also the exact path of how each paper spreads through the network. Together with a wide range of automatic measurements of each reader’s interest in a paper, these data constitute a golden opportunity for rigorous research on knowledge networks and social networks (e.g. statistical methods for discovering the creation of new sub-fields directly from the network structure). Properly developed, this dataset would enable new research and will produce a wide variety of new algorithms (e.g. Netflix-style prediction of a paper’s level of interest for any given reader) and new metrics (e.g. how big is a reviewer or author’s influence within his field? How accurately does he predict what papers will be of interest to his field, or their validity? How far “ahead of the curve” is a given reviewer or author?). Note that the SP network needs only to capture the data that enables such research; it is the research community that will actually do this research. But the SP network then benefits, because it can put all these algorithms and metrics to work for its readers, reviewers and authors. For example, it will be able to create publishing “channels” for new sub-fields as soon as new cliques are detected within the SP network structure.
  • Phase III: A better platform for scientific publishing. During this phase, the SP network will give authors the option of publishing their paper directly via the SP network (i.e. by simply asking SP reviewers to “select” their manuscript, instead of first getting it published in a traditional journal). To make this an attractive publishing option, it will give authors powerful tools for quickly locating the audience(s) for a paper, and it will give reviewers powerful tools for pooling expertise to assess its validity, in collaboration with the authors. All of this is driven by the SP network’s ability to target specialized audiences far more accurately, flexibly and quickly than traditional journals. One way of saying this is that the SP network automatically creates a “new journal” (list of subscribers) optimized for each individual paper, and that this is done in the most direct, natural way possible (i.e. by each reviewer deciding whether or not to recommend the paper to his subscribers). Once the total number of SP network subscribers from a given subfield matches the average number of readers from that subfield in a typical journal, publishing via the SP network becomes an equally valid option (i.e. achieves the same total readership). Note that this strategy aims not at supplanting traditional journals but complementing them. This alternative path will be especially valuable for specialized subfields that are not well served by existing journals, for newly emerging fields, and for interdisciplinary research (which tends to “fall between the cracks” of traditional journal categories).

Phase I: Building a Selected Papers Network

The Essential Ingredients

Technically, the initial deployment requires only a few basic elements:

  • a mechanism for adding reviewers (“Selected-Paper Reviewer” or SPR): the SP network restricts reviewers in a field simply to those who have published peer-reviewed papers in that field (typically as corresponding author). Initially it will focus on building (by invitation) a reasonably comprehensive group of reviewers within certain fields. In general, any published author from any field can add themselves as a reviewer by linking their email address to one of their published papers (which usually include the corresponding author’s email address). Note that the barrier to entry need not be very high, since the only privilege this confers is the right to present one’s personal recommendations in a public forum (no different than starting a personal blog, which anyone can do). Note also that the initial “field definitions” can be very broad (e.g. “Computational Biology”), since the purpose of the SP network is to enable sub-field definitions to emerge naturally from the structure of the network itself.
  • a mechanism for publishing reviews: Peer reviews represent an important contribution and should be credited as such. Concretely, substantive reviews should be published, so that researchers can read them when considering the associated paper; and they should be citable like any other publication. Accordingly, the SP network will create an online journal Critical Reviews that will publish submitted reviews. The original paper’s authors will be invited to check that a submitted review follows basic guidelines (i.e. is substantive, on-topic, and contains no inappropriate language or material), and to post a response if desired. Note that this also triggers inviting the paper’s corresponding author to become an SP reviewer (by virtue of having published in this field).Reviews may be submitted as Recommendations (i.e. the reviewer is selecting the paper for forwarding to his subscribers), Comments (neutral: the review is attached to the paper but not forwarded to subscribers), or Critiques (negative: a warning about serious concerns. The reviewer can opt to forward this to his subscribers). Recommendations should be written in “News & Views” style, as that is their function (to alert readers to a potentially important new finding or approach). Comments and Critiques can be submitted in standard “Referee Report” style. Additional categories could be added at will: e.g. Mini Reviews, which cover multiple papers relevant to a specific topic (for an excellent example, see the blog This Week’s Finds in Mathematical Physics :cite:`BaezTWF`); Classic Papers, which identify must-read papers for understanding a specific field; etc.Note also that the SP network can give reviewers multiple options for how to submit reviews: via the Science Select website (the default); via Google Docs; via their personal blog; etc. For example, a reviewer who has already written “News & Views” or mini-reviews on his personal blog, could simply give the SP network the RSS URL for his blog. He would then use the SP network’s tools (on its website) to select the specific post(s) he wants to publish to his subscribers, and to resolve any ambiguities (e.g. about the exact paper(s) that his review concerns).
  • a subscription system: it’s trivial to set up a website where anyone can register, search for SPRs in their field, and receive recommendations automatically from their subscriptions. Subscribers could opt to receive them either as individual emails; weekly / monthly email summaries; an RSS feed plugged into their favorite browser; a feed for their Google Reader or other preferred news service, etc. Invitations will emphasize the unique value of the SP network, namely that it provides the subscriber reviews of important new papers specifically in his area (whereas traditionally review comments are hidden from readers).The real question is how to bootstrap the system to grow successfully. An obvious answer is to focus initially on the reviewers themselves as the initial subscriber base. The fields where the SP network has the most reviewers also represent the areas where it offers the most value. Concretely, reviewers will choose to subscribe to other reviewers whose work and interests are highly relevant to their own. The initial seeding of the network will make this easy by grouping reviewers by area, so that each reviewer will immediately see a list of relevant reviewers they could choose to subscribe to. To further assist this bootstrap process, all subscribers will be asked to identify their own most important paper from the last three years, and the three most important papers for their field from the last three years (not their own). (Note that if a subscriber supplies a publication, then they are eligible to become a reviewer as well!). All subscribers will be asked for a list of keywords representing their research interests, with auto-completion so that users will preferentially use an existing keyword if already present. Also, every reviewer will be asked for a list of people in their field to invite to subscribe to their list. As long as some fraction of people do suggest someone else to invite, the subscriber base in a given field will grow exponentially until it saturates the set of early adopters who might be interested in trying a new system.
  • an automatic history-tracking system: each paper link sent to an individual subscriber will be a unique URL, so that when s/he accesses that URL, the system will record that s/he viewed the paper, as well as the precise path of recommenders via which the paper reached this reader. In other words, whereas the stable internal ID for a paper will consist of its DOI (or Arxiv or other database ID), the SP network will send this to a subscriber as a URL like http://doc.scienceselect.net/Tase3DE6w21… that is a unique hash code indicating a specific paper for a specific subscriber, from a specific recommender. Clicking the title of the paper will access this URL, enabling the system to record that this user actually viewed this paper (the system will forward the user to the journal website for viewing the paper in the usual way). If this subscriber then recommends the paper to his own subscribers, the system sends out a new set of unique links and the process begins again. This enables the system to track the exact path by which the paper reached each reader, while at the same time working with whatever sources the user must access to actually read any given paper. Of course, the SP network will take every possible measure to prevent exposure or misuse of these data; for a detailed discussion of privacy issues, see the FAQ at the end of the paper.
  • an automatic interest-measuring system: click-through rates are a standard measure of audience response in online advertising. The SP network will automatically measure audience interest via click-through rates, in the following simple ways:
    • The system will show (send) a user one or more paper titles. The system then measures whether the user clicks to view the abstract or review.
    • The system displays the abstract or review, with links to click for more information, e.g. from the review, to view the paper abstract or full paper. Each of these click-through layers (title, review, abstract, full paper) provides a stronger measure of interest.
    • The system provides many options for the user to express further interest, e.g. by forwarding the paper to someone else; “stashing” it in their personal cubbyhole for later viewing; rating it; reviewing or recommending it on their SP list, etc.
  • a paper submission mechanism: while reviewers are encouraged to post reviews on their own initiative, the SP network will also give authors a way to invite reviews from a targeted set of reviewers. Authors may do this either for a published paper (to increase its audience by getting “selected” by one or more SPRs, and spreading through the SP subscriber network), or for a preprint. Either way, authors must supply a preprint that will be archived on the SP network (unless they have already done so on standard repository such as arXiv). This both ensures that all reviewers can freely access it, and guarantees Open Access to the paper (the so-called “Green Road” to open access). (Note that over 90% of journals explicitly permit authors to self-archive their paper in this way :cite:`ecs10209`). Authors use the standard SP subscriber tools to search for relevant reviewers, and choose up to 10 reviewers to send the paper link to. Automatic click-through measurements (see section below) will immediately assess whether each reviewer is interested in the paper; ignoring the paper is a negative review (“insufficient interest for me”); actually proceeding to read the paper (“whoa! I gotta read this!”) triggers an invitation to review the paper. These automatic interest metrics should be complete within a few days. For reviewers who exhibit interest in a paper, the authors follow up with them directly. As always, each reviewer decides at their sole discretion whether or not to recommend the paper to their subscribers. As in traditional review, a reviewer could demand further experiments, analysis, or revisions as a condition for recommending the paper. While each reviewer makes an independent decision, all reviewers considering a paper would see all communications with the authors, and could chime in with their opinions during any part of that discussion.It is interesting to contrast SP reviewer invitations vs. the constant stream of review requests that we all receive from journals. While SP reviewers could in principle receive a larger number of “paper title invitations”, this imposes no burden of demands on them; i.e. no one is asking them to review anything unless it is of burning interest to them. There is no nagging demand for a response; indeed, reviewers will be expressly instructed to ignore anything that doesn’t grab their interest!
  • automatic “Reviewer Alert” mechanism: the SP network will give journal referees a unique mechanism that lets them submit their review immediately to the SP network, but keep it “on hold” until the paper is actually published. They simply include the paper’s author list, title, and abstract; the SP network automatically detects when a matching paper is published (even if significantly revised), and notifies the reviewer, including a “diff-view” showing what if any changes have been made. The reviewer is then given the option to finalize and publish their review. This serves a valuable function, since journals generally do not inform reviewers when their “confidentiality embargo” actually ends (and they in fact have no way to do so if the paper is rejected and then published by a different journal).

Phase I Goals

The essential goal of phase I is to keep things simple enough to quickly deliver a usable basic system, and to perform rapid experiments in carefully targeted fields that teach us what works vs. what doesn’t. The goal is to learn rapidly what factors make the system successful and compelling vs. not worth the trouble or impractical. Some concrete goals should include

  • partner with at least one journal: the SP network complements rather than competes with existing journals. Its message to journals is positive: our members want to recommend your papers and thereby increase your readership. Moreover, it is in no way dependent on their cooperation; it does not need to access their content (it simply forwards users to their websites); nor does it wish to persuade them to change their review or publishing practices. Given this complementarity and flexibility, it could partner with journals on specific collaborations that benefit both parties, for example enabling journals to link to its reviews of their papers, or providing links for users to recommend a paper to their SP subscribers. Of course, some journals such as the Nature family wish to lock their readers into a “walled garden” that will be a “complete solution” fully controlled by Nature’s publisher. On the other hand, journals such as the Public Library of Science family might see the SP network as a natural tie-in to their public mission. To be able to build the SP network either as part of or in close partnership with a journal family like PLoS could be beneficial.
  • target at least one area where preprints represent an important category of publication: in many areas such as math and physics, preprint distribution represents a major share of the impact (i.e. readership) for many papers and subfields. The SP network could offer important benefits to these subfields. First, it provides a mechanism for formal review of preprints, which addresses the primary criticism of preprint distribution (i.e. that the papers have not been peer-reviewed). Second, it provides a powerful “distribution channel” for preprints to circulate naturally to the readers who will find them interesting. Preprint databases like Arxiv make it possible to perform literature searches on preprints (fulfilling a function analogous to Pubmed for published papers). Beyond this, however, preprint distribution consists of little more than authors emailing a PDF to a few of their friends. Given that such areas by definition represent subfields that are not well served by traditional journals (if they were, authors would publish their work in a journal rather than just circulating preprints), they should be fertile ground for developing the SP network and making it work well for its users.
  • achieve 50% subscription rates relative to typical journal readerships within one or more targeted fields: in other words, to obtain a total number of subscribers within that field that is one half of the average number of downloads per paper for typical journals within that field. For example, I published a paper on HIV genetics / evolution in PLoS ONE in 2007, whose PDF has been downloaded approximately 190 times (1700 HTML page views). If this were taken as an “average paper” in this subfield, that would imply a target of 95 subscribers specifically within this subfield.
  • achieve 25% market share within one or more targeted fields: Here we compare the number of subscribers who actually viewed a paper via its SP network link, as a fraction of the total number who downloaded it from its journal site. Note that since readers find papers via many other paths (e.g. Pubmed; Google; the journal website; journal-generated spam emails; word of mouth; etc.), a 25% market share might well be one of the larger shares.
Benefits for Readers

The core logic of the SP network idea flows from inherent inefficiencies in the existing system.

For readers, journals no longer represent an efficient way to find papers that match their specific interests. In paper-and-ink publishing, the only way to make distribution cost-effective was to rely on economies of scale, in which each journal must have a large audience of subscribers, and delivers to every subscriber a uniform list of papers that are supposedly all of interest to them. In reality, most papers in any given journal are simply not of direct interest to (i.e. specifically relevant to the work of) each reader. For example, in my own field the journal Bioinformatics publishes a very large number and variety of papers. The probability that any one of these papers is of real interest to my work is low — like calling random people from the UCLA Medical Center phonebook. For this reason, readers no longer find papers predominantly by “reading a journal” from beginning to end (or even just its table of contents). Instead, they have shifted to finding papers mainly from literature searches (PubMed :cite:`pubmed`, Google, Google Scholar :cite:`googleScholar` etc.) and word of mouth. Note that the latter is just an informal “Selected Papers network”.

For readers, an SP network offers the following compelling advantages:

  • Higher relevance. Instead of dividing attention between a number of journals, each of which publishes only a small fraction of directly relevant papers, a reader subscribes (for free) to the Selected Papers lists of peers whose work matches his interests, and whose judgment he trusts. Note that since most researchers have multiple interests, you typically subscribe specifically to just the recommendations from a given SPR that are in your defined areas of interest. The advantage is fundamental: whereas journals lump together papers from many divergent subfields, the SP network enables readers to find matches to their interests at the finest granularity — the individuals whose work matches their own interests. For comparison, consider the large volume of email I receive from journals sending me lists of their tables of contents. These emails are simply spam; essentially all the paper titles are of zero interest to me, so now I don’t even bother to look at them. The subscription model only makes sense if it is specific to the subscriber’s interests (otherwise he is better off just running a literature search). And in this day and age of highly specialized research, that means identifying individual authorities whose work matches your own.
  • Real metrics. A key function of the SP network is to record all information about how each paper spreads through the community and to measure interest and opinions throughout this process. This will give readers detailed metrics about both reviewers (e.g. assessing their ability to predict what others will find interesting and important, ahead of the curve) and about papers (e.g. assessing not only their readership and impact but also how their level of interest spreads over different communities, and the community consensus on them, i.e. incorporated into the literature (via ongoing citations) or forgotten).
  • Higher quality. Note first that the SPRs are simply the same referees that journals rely on, so the baseline reliability of their judgments is the same in either context. But the SP network aims for a higher level of quality and relevance — it only reports papers that are specially selected by referees as being of high interest to a particular subfield. “Ordinary research” (i.e. work that follows the pattern of work in its field) is typically judged by a standard “checklist” of technical expectations within its field. Unfortunately, a substantial fraction of such papers are technically competent but do not provide important new insights. The sad fact is that the average paper is only cited 1 – 3 times (over two years, even including self-citations), and indeed this distribution is highly skewed, in which the vast majority of papers have zero or very few citations, and only a small fraction of papers have substantial numbers of citations :cite:`IMU2008`. For a large fraction of papers, the verdict of history is that almost nobody would be affected if these papers had not been published; even their own authors rarely get around to citing them!Since the SP network is driven solely by individual interest (i.e. an SPR getting excited enough about a manuscript to recommend it to his subscribers), it is axiomatic that it will filter out papers that are not of interest to anyone. Since such papers unfortunately constitute a substantial fraction of publications, this is highly valuable service. A more charitable (but scarier) interpretation is that actually some fraction of these papers would be of interest to someone, but due to the inefficiencies of the journal system as a method for matching papers to readers, simply never find their proper audience. The SP network could “rescue” such papers, because it provides a fine-grained mechanism for small, specialized interest groups to find each other and share their discoveries.
  • Better information. In a traditional journal, a great deal of effort is expended to critically review each manuscript, but when the paper is published, all of that information is discarded; readers are not permitted to see it. By contrast, in the SP network the review process is open and visible to all readers; the concerns, critiques and key tests of the paper’s claims are all made available, giving readers a much more complete understanding of the questions involved. Indeed, one good use for the SP network would be for reviewers and / or authors to make public the reviews and responses for papers published in traditional journals.
  • Speed. When a new area of research emerges, it takes time for new journals to cover the new area. By contrast, the SP network can cover a new field from the very day that reviewers in its network start declaring that field in their list of interests. Similarly, the actual decision of a reviewer to recommend a paper can be fast: if they feel confident of their opinion, they can do so immediately without anyone else’s approval.
  • Long-term evaluation. In a traditional journal, the critical review process ends weeks to months before the paper is published. In the SP network, that process continues as long as someone has something to say (e.g. new questions, new data) about that paper. The SP network provides a standard platform for everyone to enter their reviews, issues, and data, on papers at every stage of the life cycle.
Benefits for Referees

Referees get all the disadvantages and none of the benefits of their own work in the current system. Journals ask referees to do all the actual work of evaluating manuscripts (for free), but keep all the benefit for themselves. That is, if the referee does a good job of evaluating a manuscript, it is the journal’s reputation that benefits. This is sometimes justified by arguing that every scientist has an inherent obligation to review others’ work, and that failure to do so (for example, for a manuscript that has no interest to the referee) injures the cooperative enterprise of science. This is puzzling. Why should a referee ever review a paper except because of its direct relevance to his own work? If the authors (and the journal) cannot find anyone who actually wants to read the paper, what is the purpose of publishing it?


Reviewing manuscripts is an important contribution and should be credited as such. The SP network would rectify this in two ways:

  • Liberate referees to focus on their interests. The SP network would urge referees to refuse to review anything that doesn’t grab their interest, for the simple reason that it is both inefficient and counter-productive to do so. If a paper is not of interest to the referee, it is probably also not of interest to his subscribers (who chose his list because his interests match theirs). Note that the SP network expects authors to “submit” their manuscript simultaneously to multiple reviewers seeking an “audience” that is interested in their paper. If the authors literally cannot find anyone who wants to read the paper, it should not be published. Note that if referees simply follow their own interests, this principle is enforced automatically.
  • Referees earn reputation and influence through their reviews. Manuscript reviews are a valuable contribution to the research community, and they should be treated and valued as such. By establishing a record of fair, insightful reviews, and recommending important new papers “ahead of the curve”, a referee will attract a large audience of subscribers. This fact alone should be treated as an important metric for professional evaluation. Moreover, the power to communicate directly with a substantial audience in your field itself constitutes influence, and is an important professional advantage. For example, a referee by default will have the right to communicate his own papers to his subscribers; thus, through his earned reputation and influence, a referee builds an audience for his own work.
  • Eliminate the politics of refereeing. Note that a traditional journal does not provide referees these benefits because their role is fraught with the political consequences of acting as the journal’s “agent”, i.e. the power to confer or deny the right of publication, so crucial for academics. These political costs are reckoned so serious that journals shroud their referees in secrecy to protect them from revenge. Unfortunately, this political role incurs many other serious costs (see for example the problem of “prestige battles” analyzed in section 4.4).These problems largely vanish in an SP network, for the simple reason that each referee represents no one but himself, and is not given arbitrary power to block publication of anyone’s work. In many traditional journals, if one reviewer says “I don’t like this paper”, it will be rejected and the authors must start over again from scratch (since they are permitted to submit to only one journal at a time, and the paper must typically be re-written, or at a minimum reformatted, for submission to another journal). By contrast, in an SP network authors submit their paper simultaneously to multiple referees; if one referee declines to recommend it, that has no effect on the other referees. The referee has not “taken anything away” from the authors, and has no power to block the paper from being selected by other referees.Moreover, the very nature of the “Selected Papers” idea is positive, that is, it highlights papers of especial interest for a given community. Being “selected” is a privilege and not a right, and is intended to reflect each referee’s idiosyncratic interests. Declining to select a paper is not nececessarily a criticism; it might simply mean that the paper is not well-matched to that reviewer’s personal interests. Since most people in a field will also themselves be reviewers, they will understand that objecting to someone else’s personal selections is morally incompatible with preserving their own freedom to make personal choices. Note that standard etiquette will be that authors may submit a paper to as many referees as they like, but at the same time referees are not obligated to respond. As a result, outright rejection will be rare; in most cases, authors will only hear from referees who are interested in their paper.Of course, in certain cases a referee may feel that important concerns have been ignored, and will raise them by publishing a negative review on his SP list. We believe that referees will feel free to express such concerns in this open setting, for the same reasons that scientists often speak out with such concerns at public talks (e.g. at conferences). That is, they are simply expressing their personal opinions in an open, public forum where everyone can judge the arguments on their merits. They are only claiming equal rights as the authors (i.e. the right to argue for their position in a public forum). What creates conflict in peer review by traditional journals is the fact that the journal gives the referee arbitrary power over the authors’ work — specifically, to suppress the authors’ right to present their work in a public forum. This power is made absolute in the sense that it is exercised in secret; the merit of the referees’ arguments are not subject to public scrutiny; and referees have no accountability for whether their assertions prove valid or not. All of these serious problems are eliminated by the SP network, and replaced by the benefits of openness, transparency and accountability.


Precedents

This proposal is hardly original–it merely synthesizes what many scientists have argued for in a wide variety of forums :cite:`Hitchcock2002` :cite:`nielsen2008` :cite:`FOSP2009` :cite:`neylonBlog` :cite:`Smith2009` :cite:`vonMuhlen2011`. As usual with such things, the main barrier to realizing the benefits of a new system is simply the entrenchment of the old system. In my view, the advantage of this proposal is that it provides a seamless bridge between the old and new, by working equally well with either. In the context of the old system, it is a social network in which everyone’s recommendations of published papers can flow efficiently. But the very act of using such a network creates a new context, in which every user becomes in a sense as important a “publication channel” as an established journal (at least for his subscribers).

There is powerful precedent for both a public publishing service, and for a recommendations-based distribution system. For example, arXiv is the preeminent preprint server for math, physics and computer science :cite:`arXiv`. A huge ecology of researchers are using it as a de facto publishing system; it provides the real substance of publishing (lots of papers get posted there, and lots of people read them) without the official imprimatur of a journal. Second, an immense number of researchers are using blogs to discuss and review their latest finds in the literature, and some of them are extremely influential (e.g. Terry Tao :cite:`TTaoBlog`, and John Baez / n-category cafe :cite:`BaezTWF` :cite:`nCatCafe`, to cite two examples). Note that the SP network benefits from and synergizes with these existing public resources. It does not compete with them, because it is simply an open platform for making it easy for users to share all of these different resources. For example, a scientist who posts reviews to his blog could deliver them to his SP network subscribers simply by using its interface to indicate which of his blog posts are reviews, and what papers they recommend.

PLoS ONE :cite:`plosONE` represents an interesting precedent for the SP network. In terms of its “back-end”, PLoS ONE resembles some aspects of the SP network. For example, its massive list of “Academic Editors” who each have authority to accept any submitted paper is somewhat similar to the “liberal” definition of SPRs that allows any SPR to recommend a paper to his subscribers. However, on its “front-end” PLoS ONE operates like a traditional journal: reviews are secret; no effort is made to search for a paper’s audience(s); and above all there is no network structure for papers to spread naturally through a community.

Biology Direct :cite:`BiologyDirect` is another interesting precedent. It employs a conventional (relatively small) editorial board list. However, like the SP network, it asks authors to contact possible reviewers from this list directly, and reviewers are encouraged to decline a request if the paper doesn’t interest them. Moreover, reviews are made public when a paper is published. Again, however, Biology Direct’s front-end is that of a conventional journal, with no network structure.

Nature Publishing Group’s Connotea.org service offers users a way to save citations and share them on the web. Connotea does not operate as an SP network style subscription service, but appears to be used mainly as an online “citation manager” for individual users. It appears to be controlled completely by NPG, but bills itself as a .org site. It was developed in perl and its source code is open source. Connotea provides an API for retrieving its user, citation, and tag data programmatically. Thus in principle an SP network could integrate with Connotea on a number of different levels, e.g. treating Connotea as a “feed” enabling SP network users to seamlessly subscribe to Connotea users’ recommendations. But it’s unclear how useful this would be. A quick random sampling of Connotea user listings found that most were either spam (ads for unrelated products) and / or completely empty. To my mind this highlights the importance of linking each SPR to a corresponding author email address in one of their published papers.

In my view, it is very important that the SP network be developed as an open source, community project rather than as a commercial venture. To the extent that they become valuable, commercial sites tend to become “walled gardens” in which the community is encouraged to donate content for free, which then becomes the property of the company. That is, it both controls how that content can be used, and uses that content strictly for its own benefit rather than that of the community. The SP network would provide enormous benefits to the community, but from the viewpoint of a publishing company (e.g. NPG) it might simply look like a threat to their business. The SP network cannot be developed as a walled garden, because its data belong to the community and must be used for the community’s benefit. It must be developed “of the people, by the people, for the people”, or it will never come to be in the first place.

Phase II: Analysis and Metrics for Scientific Networks

What Analysis Will the SP Network Enable?

The primary value of the SP network is that it “unpacks” inefficient aggregations (i.e. huge numbers of unrelated papers lumped together in the same journal) into the distinct interest groups that naturally form when people are allowed to choose who they want to subscribe to. And because this is all done online, it automatically records the precise details of how each paper spreads through the public community. Over time, these data will transform scientific publishing by making it possible for researchers to find and communicate with others who share their specialized or interdisciplinary interests, with unprecedented ease, specificity, power, speed, and efficiency. The real foundation for doing this is that the SP network will help enable the emerging new era of data-driven research (scientometrics) on key goals for scientific publishing:

  • automatic discovery of subfields and newly emerging fields from analysis of the SP network structure. Intuitively, when a group of researchers all subscribe to each other, that reveals a specific research interest that unites them.
  • rigorously controlled and validated methodologies for automatic measurement of reader interest. The basic SP network approach of dividing content into “access layers” (e.g. title; review; abstract; full-paper; etc.) and measuring click-through rates provides a foundation for automatic measurement of interest in a paper within specific audiences. However there are many questions about how best to “control” for various sources of noise to produce a robust, uniformly normalized measure of interest. These are research questions and should be answered by experimentally testing different “control” methods and directly validating their results. As a trivial example, click-through rates can be artifactually depressed if an unusually large fraction of the target audience is “offline”, e.g. during holidays or a major conference in the discipline. Such artifacts can be eliminated by measuring interest relative to a consistent control, i.e. by including multiple titles in any test mailing, one of which would be a “control”. Different papers for a given audience would be measured relative to the same control during any given time frame. Optimal signal-to-noise requires a control with a moderate interest level (neither too high nor too low), raising many interesting research questions about optimizing and automating these methods.
  • standardized measures of comparative interest for all papers. Currently, the universal standard metric is simply the name of the journal in which the paper was published (i.e. “Nature” >> “Nucleic Acids Research” >> “unpublished preprint”). Many studies have shown that this “metric” is fatally flawed by huge variations in impact among papers published in the same journal :cite:`IMU2008`. Another standard metric, citation impact, cannot be measured until two calendar years after publication, and thus is not useful during the period when readers need an interest metric (i.e. to guide their choice of what to read among recently published papers). Using its rigorous foundation of immediate interest metrics measured in real-time, the SP network can supply an important market need for a standardized measure of comparative interest that readers will intuitively understand. For example, since the SP network measures interest for all papers in the same, consistent way, it could report each paper’s interest level in terms of its “journal equivalent” by comparing the paper’s interest-metric vs. the median interest-metric for papers in a well-known journal. Note that by this measure some Nature papers might be reported as having an interest level equivalent only to an average Nucleic Acids Research paper, whereas some Nucleic Acids Research papers would be reported as having interest as high as an average Nature paper.
  • total vs. field-specific measures of interest. A fundamental problem with standard impact measures (e.g. total readership, or citation count) is that they are not normalized by the relative sizes of different fields, which vary enormously. For example, highly specialized fields such as math inherently produce small citation counts (because each subfield tends to be small), whereas large fields (such as popular areas in biology) inherently produce much larger average citation counts. Note that this is a difficult problem, because the structure of subfields is largely invisible to these metrics, and trivial solutions are not reliable (e.g. some subfields of math are large, while some some subfields of biology are small). Because the SP network directly analyzes the fine-grained structure of subfields (through the graph structure of its subscriber network), it can automatically produce both total interest metrics and field-specific interest metrics within the relevant subfield(s) for a paper. Both are useful, for different applications. For example, for choosing a new faculty hire, a department might use total interest (to seek the largest impact for its new hire), whereas for faculty promotion decisions it might use field-specific metrics to focus on the faculty member’s impact within his field.
  • automatic prediction of what papers will be of greatest interest to any individual researcher, using “Netflix-style” prediction algorithms. Although the SP network subscription system itself is designed to greatly boost the relevance of papers it delivers to any subscriber (relative to what they would find in any journal), its detailed interest data measured over all papers and all readers should make it possible to predict interest even more accurately.
  • automatic “audience search” to identify the set of distinct audience(s) that would be interested in a specific new paper. For a completely new paper, the system can predict its level of interest for different audiences, but its confidence intervals might be poor. By quickly measuring the actual interest in the most promising audiences (i.e. by showing the title to random samples of individuals from the target audience(s) and measuring click-through rates) it can both get more confident estimates for these audiences, and updated predictions for other audiences / individuals who are likely to be interested. Multiple cycles of this process can be run automatically over a timeframe of a few days, for example to give authors a validated list of target audience(s), among whom they could then ask reviewers to consider their paper. Note that such methods would enable the SP network to auto-generate a “virtual journal” (unique list of subscribers) optimized for each specific paper. Whereas traditional journals function as purely “passive containers” with essentially static audiences, the SP network would gradually transform itself into an “active matrix” that uses rapid cycles of interest-prediction and online test-marketing to actively seek out the true audience(s) for each paper.
  • automatic analysis of paper “life-cycle” stages and diffusion patterns, e.g. “specialist”, “interdisciplinary”, “broad interest”. Because the SP network both tracks in real-time how each paper spreads through the network of readers, and also can directly see the fine-grained structure of distinct audiences (in the form of the subscriber graph structure), it can directly measure the extent to which a paper’s interest and spread is confined to a single audience, occurs at a specific intersection between audiences (i.e. individuals who share the same specific combination of interests), or extends across multiple distinct audiences. Furthermore, using epidemiological models of disease spread, it can characterize a paper’s current activity as bootstrap phase (still seeking an initial audience); breakout (entering a new audience where it appears to have strong growth potential); exponential phase (rapid expansion); reservoir phase (occasional new readers but no resulting breakout); or extinct (no new readership). These data can themselves lead to important further analyses: since the vast majority of papers have only a single growth phase (i.e. they spread through a single specialist community), detecting multiple growth phases indicates that a paper has unusual cross-disciplinary or long-term interest.
  • referee metrics: these could range from trivial e.g. number of subscribers, to more sophisticated e.g. average audience size (the average number of downstream readers of his recommendations, including all recommendations that resulted from his recommendation), expressed either as a total or as a percentage of each paper’s total readership; subscriber-impact (i.e. what fraction of his subscribers read or recommend the papers that he recommended to them); “early-bird” rank (what fraction of each paper’s readership occurred after his recommendation), etc.
Who Will Do All This Research?

The goal of Phase II is to transform scientific publishing into a science, by providing a powerful data collection system that researchers can use to ask and answer questions about all the above goals. Note that there already exists a very exciting and rapidly growing field of scientometrics research (see for example the journal Scientometrics). I propose that the SP network be made available as a “research instrument” to the entire scientometrics research community, for individual researchers to perform specific research projects, similar to recent “Open Access Scientometrics” proposals :cite:`ecs13804`. These divide into two categories:

  • data analysis and validation studies: the SP network dataset represents an unprecedented opportunity for analyzing how research communities really work, how to measure interest and impact, and how to identify subfields. Research using this dataset is limited only by researchers’ ability to analyze the data (e.g. funding for compute time), and implementation of important ethical controls (e.g. researchers must work with strictly de-identified data containing no personal identifiers; focused case-studies must be performed on data completely unrelated to the researcher’s field of research, etc.).
  • experimental tests: to directly test hypotheses, researchers can test different protocols on random samples of real SP network users. This will of course require an approval process by a committee that reviews research proposals and allocates “time on the instrument”.
Benefits
  • changing the metric can change what authors publish: under the current system, the most widely used metric is simply the nominal publication count with a vague “premium” for prestigious journals and “discount” for lesser-known journals. Grossly, this metric rewards authors in proportion to the number of papers they publish. In particular, publications in a “competent research” journal count the same regardless of whether they are of strong interest (contain novel insights) or not. It should come as no surprise then that a large fraction of published papers have apparently little or no interest to their fields (i.e. citation impact of zero or very few counts, even including self-citations). Authors are providing what the metric measures, i.e. larger publication counts, and often not providing what it fails to measure, i.e. interest to other researchers.By contrast, in an SP network such “minimum publishable units” are likely to count for very little, for the simple reason that a paper will only be selected by a reviewer if it personally excites him. By definition, the SP network filters out papers that are not of strong interest to anyone. Such papers will score close to zero, removing any incentive for authors to produce and publish them. Instead, authors will be liberated to do valuable categories of work that are poorly served by the existing system, for example:
    • exploratory innovation: work that incorporates higher-level innovations is correspondingly harder for most workers in a field to understand at first. A signature of such work is that while the average reviewer may not understand it (making it hard to publish in a traditional journal), reviewers with unusual expertise or sophistication will rate it as high-interest, and interest will spread and grow over time (instead of declining over time, for a typical specialist paper). SP network metrics will reward such a paper a high long-term interest score, even though it might have been rated as “unpublishable” by traditional journals.
    • interdisciplinary research: interdisciplinary work is widely recognized as one of the most important opportunities for new discoveries (by combining insights and tools from different fields). It is also more observed in the breach than in the practice. While universities and grant agencies endlessly commend the need for more interdisciplinary research, actual interdisciplinary research results face many barriers to publication in traditional journals. Fundamentally, journals target established fields, and their review process is ill-equipped for papers whose content goes outside the expertise of any given reviewer. This is often fatal for interdisciplinary papers at a traditional journal, but poses no serious problem in an SP network because of its fundamentally different review process (we will address this specific issue in much more detail in the Phase III section). Interdisciplinary papers face a similar scenario as the previous category: while they are initially less likely to be understood by the average reviewer, reviewers with unusual expertise or sophistication will rate them as high-interest, and interest will spread and grow over time. Again, SP network metrics will reward such papers with high interest scores.

No journal has ever had such a detailed view of research impact, reviewer influence, or of distinct communities representing emerging subfields. Such a comprehensive and detailed view can only be produced by the community itself, for its own benefit, and under its control.

Phase III: The SP Network as Publisher

The capabilities developed in phases I and II provide a strong foundation for giving authors the choice of publishing their work directly via the SP network (rather than in a traditional journal). To do this, the SP network will make these capabilities available as a powerful suite of tools 1. for authors to search for the audience(s) that are interested in their work; 2. for authors and referees to combine their different expertises (in synthesis rather than opposition) to identify and address key issues for the paper’s impact and validity; 3. for long-term evaluation after a paper’s publication, to enable the community to raise new issues, data, or resolutions. This will be particularly useful for newly emerging fields (which lack journals) or subfields that are not well-served by existing journals.

However, it must be emphasized that this is not an attempt to compete with or replace traditional journals. Instead, the SP network complements the strengths of traditional journals, and its suite of tools could be useful for journals as well. Concretely, the SP network will develop its tools as an open-source project, and will make its software and services freely available to journals as well as to the community at large. For example, journals could use the SP network’s services as their submission and review mechanism, to gain the many advantages it offers over the very limited tools of traditional review (which consist of little more than an ACCEPT/REJECT checkbox for the Editor, and a text box for feedback to the authors).

I first define the goals of the Phase III publishing system in terms of how to measure success, then describe the details of the three “toolsets” outlined above, and finally analyze how this system can improve peer review and publishing effectiveness.

Measures of Success

I wish to emphasize three basic measures of success for scientific publishing in general, and for the Phase III SP network tools in particular:

  • readership: the effective value of any publishing channel is just the coverage that it achieves for each paper, i.e. what fraction of that paper’s total potential audience actually reads the paper via the channel.
  • publication cost ratio: the effective cost of publishing a paper in a given channel, in terms of the total time, effort or money spent to do so. For most first-world authors, the time and effort greatly outweigh actual publication fees (i.e. the monetary value of that time and effort greatly exceeds the fees). To “normalize” this cost for a given paper, I define the publication cost ratio as this total cost divided by the cost of writing the manuscript (not including the actual research effort). For manuscripts that require only modest revisions, I will consider a publication cost-effective if its publication cost ratio is less than or equal to one (i.e. the work of getting the completed manuscript published is no more than the work of writing the manuscript in the first place).
  • synthesis of relevant expertise: the purpose of publication is to connect a paper with its total potential audience. Frequently this audience includes different expertise than the original authors’. Thus “making the connection” requires not merely searching for interest across different fields, but actually modifying the paper so that people from these different backgrounds can easily understand it, and to address their questions and concerns about it. This involves a process of synthesis during the period of review and revision, in which representatives of each relevant expertise share their questions and concerns, then work together with the authors and each other to resolve them. The value of such synthesis is absolutely critical; it literally opens the door for the paper’s potential audience to understand it (or if such synthesis fails, slams the door shut).That said, how can we measure a concept as open-ended as “synthesis”? I propose that we measure it simply in terms of outcomes: that is, the goal of synthesis is to maximize the perceived value of the paper for each member of its potential audience, specifically its “value-added” relative to the existing published literature. If synthesis succeeds, each reader will have a clear and accurate sense of exactly what advance(s) the paper contributes beyond the previous literature. (This includes the possibility that the study is fatally flawed, in which case sending the authors “back to the drawing board” is the right decision.) On the other hand, if synthesis fails, readers will have a lot of uncertainties about whether it’s a real advance. Thus the operational measure of synthesis (taking the paper’s core data as a fixed constant) is the degree to which it maximizes the “value-added” of the paper to its potential audience, relative to the existing published literature. For clarity, I will refer to this measure consistently as the “recognized added value” (RAV) of a paper.

Taken together, these metrics assess how well a channel reaches a paper’s audience, how well its review process clarifies the paper’s value to that audience, and how much this process “costs” in terms of time and effort.

What Assumptions Will the SP Network Change?

The assumptions of traditional peer review reflect the conditions in which it developed. For example, both manuscripts and review comments had to be sent by regular mail, often with reviewers on another continent, a process that took weeks or months. The SP network will operate under a completely different of conditions, and it is worthwhile to ask what assumptions are simply no longer relevant. I briefly identify six assumptions of traditional peer review that the SP network will modify.

  • Expert Peer Review (EPR): this means the assumption that the referee is expert in all aspects of the paper. In other words, the referee is not just a peer (who knows some aspects of what the authors know, but not others) but instead has universal expertise about everything relevant to the paper. I contrast this with multi-expertise peer review (MEPR), in which a referee is not expert in all aspects of a paper (because the paper combines multiple expertises).
  • Shoot first, ask questions later: this means a referee returns a recommended decision before considering the authors’ responses. Traditional peer review enforces this by making referees submit an ACCEPT/REJECT recommendation as their first response. The only way a referee conceivably could consider author responses before making this decision would be to contact the authors directly (in violation of most journals’ anonymous review process).Note that this assumes there is no need for synthesis, i.e. the reviewer does not need to first combine his knowledge with that of the authors and other referees, in order to make his decision. Instead, it is implicitly assumed that he knows everything needed for making a decision about the paper (i.e. EPR). Of course, the very best referees get around this assumption by honestly admitting “I don’t know” or even “I was wrong, you were right” and changing their decision when presented with convincing evidence. However, the psychology of the system discourages this. Admitting that one lacks the required expertise doesn’t appeal to most referees, since under EPR this is like confessing incompetence, and losing face in front of the most important authority of all, the Editor.
  • The “pigeon-hole” model of separate audiences: this means the assumption that any given audience is a “separate box” containing a uniform expertise. In other words, each researcher is a member of a single audience and shares roughly the same interests and expertise as other members of that audience. Therefore any competent reviewer from this audience can judge both impact (because their interests match) and validity (because everyone in the audience, including the paper’s authors, shares the same expertise). Indeed, this would appear to be the essential foundation for the EPR assumption (see above). Furthermore, since each member is representative, obtaining feedback about a manuscript from just two or three members is adequate.By contrast, the SP network views each audience as a Venn diagram of overlapping interests and expertises. That is, each member combines multiple expertises; different members have different combinations of expertise and interests; and as a group they come together around one common interest (e.g. a problem they all want to solve).
  • delegated review: this means assuming that a referee can be “delegated” to predict a paper’s interest even if it is not of direct interest to him. In extreme cases (e.g. conference proceedings) submissions are simply divided among a set of committee members, each delegated to review their “share” of papers regardless of personal interest. In other words they have to guess whether each paper will be of interest to other people. By contrast, the SP network emphasizes impact-driven review, in which each referee is explicitly instructed to refuse to review manuscripts that are not of direct interest to him, thereby directly measuring the paper’s interest to different audiences (i.e. different sets of referees who are shown the paper’s title).
  • false-positive screening: this means assuming that the primary challenge of the review process is screening out “false-positives”. I.e. there is a large excess of submissions relative to the publication’s “channel capacity”, so we should adopt increasingly conservative criteria to filter out this excess. This reflects on the one hand the dominance of publication counts as the measure of productivity (which pressures researchers to publish as many papers as possible), and on the other hand the economics of paper-and-ink publishing, where publishing a paper incurs large costs of printing and distribution. Note that systems tuned for low “false-positive” rates correspondingly can incur high “false-negative” rates (in which innovative or important papers are rejected) and that journals do not even consider this problem. Indeed, high rejection rates are considered to be an inherently good thing, i.e. “higher rejection rate” = “better journal”.The SP network adopts a very different approach, because it considers the economics of publishing not from the point of view of a journal, but from the point of view of the research community as a whole. For the community, publication costs are largely a fixed cost, in the sense that it has only a fixed amount of resources (money and human time) for writing, reviewing and reading papers. (For most first-world researchers, the monetary value of human time spent on these activities greatly outweighs actual publication fees). If researchers were allowed to publish five times more papers, these costs would not increase by a factor of five. Instead, the bulk of the papers would simply be pushed into “cheaper” publication channels (both in terms of “consuming” less community attention and of very low distribution costs). In this sense, “false-positives” have little effect on the total budget.These economics suggest a different concern: since only a minor fraction of papers in any field offer truly new insights (as opposed to yet-another-application of the field’s current thinking), the community’s productivity is maximized by optimizing the amount of attention these important advances receive out of the fixed total “budget”. (For example, under a power-law model perhaps just 20% of papers would contain 80% of the most important advances). Since innovative thinking is rare, we wish to amplify it as much as possible; when it does occur, we above all do not want to throw it away. From this point of view, false-negative errors are of the greatest concern. (I analyze this issue in depth in Section 5).
  • value metric = predicted impact: promotion committees measure a scientist’s publications primarily by what journal they were published in. Thus a paper’s value-metric is locked in at the moment of its acceptance by the journal. The review process thus predicts its value to the community, and locks in that prediction before the community is allowed to read the paper. By contrast, the SP network seeks to measure each paper’s impact directly, automatically, and systematically (see Sections 3 and 5), both at the time of review and long-term.
The SP Network Publication Process

The SP network will provide tools for “market research” (i.e. finding the audience(s) that are interested in a given paper) and for synthesis (integrating multiple expertises to maximize the paper’s value for its audience(s)), culminating in publication of a final version of the paper (by being selected by one or more SPRs). I will divide this into three “release stages”: alpha (market research); beta (synthesis); post-publication (long-term evaluation). These are analogous to the alpha-testing, beta-testing and post-release support stages that are universal in the software industry. The alpha release cycle identifies a specific audience that is excited enough about the paper to work on reviewing it. The beta release cycle draws out questions and discussion from all the relevant expertise needed to evaluate the paper and optimize it for its target audience(s). The reviewers and authors work together to raise issues and resolve them. Individual reviewers can demand new data or changes as pre-conditions for recommending the paper on their SP list. On the one hand, the authors decide when the paper is “done” (i.e. to declare it as the final, public version of the paper). On the other hand, each reviewer decides whether or not to “select” the paper for their SP list. On this basis, authors and reviewers negotiate throughout the beta period what will go into the final release. As long as one SPR elects to recommend the paper to his subscribers, the authors have the option of publishing the paper officially in the SP network’s journal (e.g. Selected Papers in Biology). Regardless of how the paper is published, the same tools for synthesis (mainly an issue tracking system) will enable the entire research community to continue to raise and resolve issues, and to review the published paper (i.e. additional SPRs may choose to “select” the paper).

Alpha Release Tools

For alpha, the tools already provided by Phase I and II are sufficient: see for example the paper submission mechanism (Phase I); methods for measuring reader interest (Phase II); and audience search methods (Phase II). Here I will simply contrast it with traditional peer review.

  • assess impact, not validity: I wish to emphasize that alpha focuses entirely on measuring the paper’s impact (interest level) over its possible audiences. It does not attempt to evaluate the paper’s validity (which by contrast tends to dominate the bulk of referee feedback in traditional peer review). There are three reasons. First, impact is the key criterion for the SP network: if no SPR is excited about the paper, there is no point wasting time assessing its validity. Second, for papers that combine multiple expertises, its impact might lie within one field, yet it might use methods from another field. In that case, a referee who is expert in evaluating the validity of the methodology would not be able to assess the paper’s impact (which lies outside his field). Therefore in MEPR impact must often be evaluated separately. Third, the SP network is very concerned about failing to detect papers with truly novel approaches. Such papers are both less common, and harder for the average referee to understand in their entirety. This makes it more likely that referees will feel doubt about a novel approach’s validity. To avoid this serious risk of false-negatives, the SP first searches for SPRs who are excited about a paper’s potential impact, completely separate from assessing its validity.
  • impact-driven review, not delegated review: traditional journals do essentially nothing to help authors find their real target audience, for the simple reason that journals have no tools to do this. Exploring the space of possible audiences requires far more than a single, small sample (2-3 reviews). It requires efficiently measuring the interest level from a meaningful sample for each audience. The key is that the SP network directly measures interest (see the metrics described in Phase I and Phase II) over multiple audiences. By contrast, delegated review tends to produce high false-negative rates, because people are not good at predicting the interest-level of papers that they themselves are not interested in. Being unaware of a paper’s interest for a problem outside your knowledge, and being unaware that another group of people is interested in that problem, tend to go together.
  • speed: because alpha requires no validity review, it can be fast and automatic. The SP network’s click-through metrics can be measured for 10 – 100 people over the course of just a few days; advertisers (e.g. Google) measure such rates over vastly larger audiences every day.
  • journal recommendation system: whenever a researcher expands the scope of his work into a new area, he initially will be unsure where to publish. The SP network can automatically suggest appropriate journals, by using its interest measurement data. Simplistically, it can simply relate the set of SPRs who expressed strong interest in the paper to the set of journals which published papers recommended by those same SPRs.
Beta Release Tools

Beta consists of several steps:

  • Q & A: This means that reviewers with different relevant expertise raise questions about the paper, and work with the authors to resolve them, using an online issue tracker that makes it easy to see what issues have already been raised, their status, and detailed discussion. Such systems provide great flexibility for synthesizing a consensus that draws on multiple expertises. For example, one referee may resolve another referee’s issue. (A methodology reviewer might raise the concern that the authors did not follow one of the standard assumptions of his field; a reviewer who works with the data source analyzed in the paper might respond that this assumption actually is not valid for these data). Powerful issue tracking systems are used universally in commercial and open-source software projects, because they absolutely need such synthesis (to find and fix all their bugs). Using a system that actually supports synthesis changes how people operate, because the system makes it obvious they are all working towards a shared goal. Note that such a system is like a structured wiki or “threaded” discussion in that it provides an open forum for anyone to discuss the issues raised by the paper.The purpose of this phase is to allow referees to ask all the questions they have in a non-judgmental way–a conversation with the authors, and with the other referees–before they even enter the Validity Assessment phase. This should distinguish clearly several types of questions:
    • False positive: Might result / interpretation X be due to some other explanation, e.g. random chance; bias; etc.? Indicate a specific test for the hypothetical problem.
    • False negative: is it possible your analysis missed some additional results due to problem Y? Indicate a specific test for the hypothetical problem.
    • Overlap: how does your work overlap previous study X (citation), and in what ways is it distinct?
    • Clarification / elaboration: I didn’t understand X. Please explain.
    • Addition: I suggest that idea X is relevant to your paper (citation). Could that be a useful addition?

    Each referee can post as many questions as he wants, and also can “second” other referees’ questions. Authors can immediately answer individual questions, by text or by adding new data / analyses. Referees can ask new questions about these responses and data. Such discussion is important for synthesis (combining the expertise of all the referees and the authors) and for definitive clarification. It should leave no important question unanswered.

  • validity assessment: eventually, these discussions culminate in each reviewer deciding whether there are serious doubts about the validity of paper’s data or conclusions. While each reviewer decides independently (in the sense that only he decides what to recommend on his SP list), they will inevitably influence each other through their discussions.
  • improving the paper’s value for its audience: once the critical validity (false-positive) issues are resolved, referees and authors should consider the remaining issues to improve the manuscript, by clarifying points that confused readers, and adding material to address their questions. To take an extreme example, if reviewers feel that the paper’s value is obscured by poor English, they might demand that the authors hire a technical writer to polish or rewrite parts of it. Of course, paper versions will be explicitly tracked through the whole process using standard software (e.g. Git :cite:`gitSCM`); this can even be used to enable authors to add corrections if necessary after publication.
  • public release version: the authors decide when to end this process, and release a final version of the paper. Of course, this is closely tied to what the reviewers demand as conditions for recommending the paper.
Publication

Journals and conference program committees may elect to participate in this process, for papers that fit their focus. A journal editor can simply join the beta process for such a paper; like the other SPRs, he decides (based on the complete synthesis of issues and resolutions in the issue tracker) whether he wishes to “select” the paper. The only difference is that he is offering the authors publication in his journal, whereas the other referees are offering a recommendation on their SP lists. Of course, the paper will typically have to be re-formatted somewhat to follow the journal’s style guidelines, but that is a minor issue; extra material that does not fit its size limits can be posted as an online Supplement.

Note that this process offers many advantages to the journal. It does not need to do any work for the actual review process (i.e. to find referees, nag them to turn in reviews on time etc.). More importantly, it gets all of the SP network’s impact measurements for the paper, allowing it to see exactly what the paper’s level of interest is. Indeed, the journal can get a “free-ride” on the SP network’s ability to market the paper, by simply choosing papers that multiple (or influential) SPRs have decided to recommend to their subscribers. If the journal decides to publish such a paper, all that traffic will come to its website (remember that the SP network just forwards readers to wherever the paper is published). For a journal, the SP network is a gold mine of improved review process and improved marketing — all provided to the journal for free.

Authors can use the SP network alpha and beta processes to demonstrate their paper’s impact and validity, and then invite a journal editor to consider their paper on that basis. It is of course conceivable that more than one journal might offer to publish a paper. In that case, the authors decide which offer to accept.

However, the real value of the SP network review system is for areas that are not well-served by journals. If an SPR selects a paper for recommendation to his subscribers, the authors can opt to officially publish the paper in the SP network’s associated journal. Note that this serves mainly to get the paper indexed by search engines such as PubMed, and to give the paper an “official” publication status. After all, the real substance of publication is readership, and being recommended on the SP network already provides that directly.

Benefits

I now consider the benefits of the SP network review and publication system in terms of readership, cost, and synthesis. These benefits arise from addressing fundamental inefficiencies: first, how poorly traditional journals fit the highly specialized character of research and the emergence of new fields; and second, how journals have implemented peer review. Criticisms of this peer review system are legion, and most tellingly, come from inside the system, from Editors and reviewers (see for example :cite:`Smith2009`). While assessment of its performance is generally blocked by secrecy, the studies that have been done are alarming. For example, re-submission of 12 previously published articles was not detected by reviewers in 9 out of 12 cases (showing that reviewers were not familiar with the relevant literature), and 8 of the 9 papers were rejected (showing a nearly total lack of concordance with the previous set of reviewers who published these articles) :cite:`Peters82`. While we each can hope that reviewers in our own field would do better, there is evidently a systemic problem. That is, the system itself promulgates a high level of errors. I now argue that the SP network can help systematically address some of these errors.

Readership

In my view, the main purpose of the SP network is to provide a channel for innovative work that is blocked in the current system, e.g. because its specialized audience does not “have its own journal”, or because it is “too innovative” or “too interdisciplinary” to fare well in EPR. Let’s consider the case of a paper that introduces a novel combination of two previously separate expertises. In a traditional journal, the paper would be “delegated” to two or three referees who have not been chosen on the basis of a personal interest in its topic. So the probability that they can understand its significance for its target audience is low. For each of these referees, approximately half of the paper goes outside their expertise, and may well not follow the assumptions of their own field. Since they lack the technical knowledge to even evaluate its validity, the probability that they will feel confident in its validity is low. Even if the authors get lucky, and one referee ranks it as both interesting and valid, traditional “false-positive” screening requires that all three reviewers recommend it. Multiplying three poor probabilities yields a negligible probability of success. In practice, this conservative criterion leads to conservative results: it selects what “everybody agrees is acceptable”. It rewards staying in the average referee’s comfort zone, and penalizes innovation.

By contrast, the SP network explicitly searches for interest in the paper, over a far larger number of possible referees (say 10 – 50), using fast, automatic click-through metrics. Obviously, if no one is interested, the process just ends. But if the paper is truly innovative, the savviest people in the field will likely be intrigued. Next, the interested reviewers question the authors about points of confusion, prior to stating any judgment about its validity. Instead of requiring all referees to recommend the paper for publication, the SP network will “publish” a paper if just one referee chooses to recommend it. (Of course in that case it will start out with a smaller audience, but can grow over time if any of those subscribers in turn recommend it). A truly innovative, sound paper is likely to get multiple recommendations in this system (out of the 10 or more SPRs to whom it was initially shown). By contrast with traditional publishing, it is optimized for a low false-negative error rate, because it selects what at least one expert says is extraordinary (and allocates a larger audience in proportion to the number of experts who say so).

By definition, for the entire category of false-negative errors (papers whose potential audience would rate them highly, if only journals enabled it to read them!), the effective coverage (fraction of potential audience reached) is zero. Any reduction the SP network makes in this false-negative rate will produce a dramatic increase in coverage for these papers.

The only remaining question is whether the SP network can increase readership for papers that are published by journals. We have already addressed this in Phase I and II, which were focused entirely on this goal.

Cost

The SP network reduces the costs of publishing to the community (in terms of human time and effort) in several ways:

  • it eliminates the costs of “serial reviewing” and the “non-compete clause”: markets work efficiently only to the extent they actually function as free markets, i.e. via competition. It is worth noting that while papers compete to get into each journal, journals do not compete with each other for each paper. Journals enforce this directly via a “non-compete clause” that simply makes it illegal for authors to submit to more than one journal, and indirectly via incompatible submission systems and incompatible format requirements (even though there is little point applying such requirements until after the journal has decided to accept the paper). In practice an author must “start over from scratch” by re-writing and re-submitting his paper to another journal. Note that this multiplies the publication cost ratio for a paper by the number of times it must be submitted. It is not uncommon for this to double or triple the publication cost ratio.From the viewpoint of the SP network, these “serial review” strictures are wasteful and illogical. The idea that a paper can only be considered by different channels one at a time simply degrades every aspect of review, compared with the “unified parallel review” process of the SP network. On the one hand, it means that each editor gets only a small slice of the total review information (since the different reviews are kept separate, rather than pooled). On the other hand, it wastes an immense amount of time re-reviewing the same paper over and over. Indeed one could view serial review as a very crude and extraordinarily inefficient attempt at “audience search” (i.e spend months re-writing the paper for different journals and fighting with different sets of referees). In the SP network, audience-search is automatic, measures interest among a much larger number of readers, and takes just a few days. Measuring impact quickly is the key first step, because in the absence of a clear audience (in which the paper has strong impact) the serious effort of validity-evaluation is pointless. Unfortunately, journals treat impact and validity evaluation as a single merged process, which is inevitably slow. Finally, the SP network pools the parallel review efforts of all interested SPRs in a single unified process. Each SPR sees the complete picture of information from all SPRs, but makes his own independent decision.Let’s consider the publication cost ratios for different cases. For a paper that is not of strong interest to any audience, traditional journal review typically involves months of serial review. By contrast, the SP network will simply return the negative result in a few days (“no interested audiences found”). Thus, the SP network reduces the publication cost ratio in this case by at least a factor of ten. For papers that require extensive audience-search (either because they’re in a specialized subfield, or because they contain “too much innovation” or “too many kinds of expertise”), they again are likely to fall into the trap of serial review, consuming months and possibly yielding no publication. In the SP network, the authors should be able to find their audience (possibly small) within days, and then go through a single review process leading to publication by one or more SPRs. Because serial review is avoided, the publication cost ratio should be two to three times less. Finally, for papers with an obvious (easy to find) audience, the SP network still offers some advantages, basically because it guarantees a single round of review with a very low false-negative risk. By contrast, traditional peer review requires unanimity. This unavoidably causes a significant false-negative rate. Under the law of serial review, this means a certain fraction of good papers waste time on multiple rounds of review. For this category overall, we expect the publication cost ratio of traditional publishing to be 1-2 times that of the SP network.
  • it eliminates the costs of “gambling for readership”: when researchers discover a major innovation or connection between fields, they become ambitious. They want their discovery published to the largest possible audience. Under the non-compete clause, this means they must take a gamble, by submitting to a journal with a large readership and correspondingly high rejection ratio. Often they start at the top (e.g. a Nature or Nature Genetics level journal) and work their way down until the paper finally gets accepted. Summed over the entire research community, this law of serial-review imposes a vast cost with no productive benefit, i.e. the paper gets published regardless. The SP network completely avoids this waste, by providing an efficient way for a paper’s readership to grow naturally, as an automatic consequence of its interest to readers. Neither authors nor referees have to “gamble” on predictions of how much readership the paper should be “allocated”. Instead, the paper is simply released into the network, where it will gradually spread, in direct proportion to how many readers it interests.
  • it eliminates the costs of “prestige battles”: referees for traditional journals play two roles. They explicitly assess the technical validity of a paper, but they also (often implicitly) judge whether it is “prestigious enough” for the journal. Often referees decide to reject a paper based on prestige, but rather than expressing this subjective judgment (“I want to prevent this paper from being published here”), they justify their position via apparently objective criticisms of technical validity details. The authors doggedly answer these criticisms (often by generating new data). If the response is compelling, referees will commonly re-justify their position simply by finding new technical criticisms. Unfortunately, this process often doubles or triples the review process, and is unproductive, first because the referee’s decision is already set, and second because the “technical criticisms” are just red-herrings; answering them does not address the referee’s real concern. Even if the paper is somehow accepted (e.g. the editor intervenes), this will double or triple the publication cost ratio.By contrast, in the SP network this issue does not even arise. This problem is a pathology of delegated review — i.e. asking referees to review a paper they are not personally interested in. In the SP network, there is no “prestige factor” for referees to consider at all (first because the SP network simply measures impact long-term, and second because that metric has little dependency on what any individual reviewer decides). Indeed, the only decision a referee needs to make initially is whether they’re personally interested in the paper or not. And that decision is measured instantly (via click-through metrics), rather than dragged out through weeks or months of arguments with the authors.
  • it eliminates the costs of delegated review: currently, researchers are called upon to waste significant amounts of time reviewing papers that are not of direct interest to their own work. By definition, this time constitutes a cost with no associated gain. By contrast, time spent reviewing a paper that is of vital interest for the referee’s research gives him immediate benefit, i.e. early access to an important advance for his own work.
Synthesis

Is there a need for synthesis of multiple expertises beyond that provided by traditional peer review? The widespread importance of small conferences, symposia and seminars demonstrates that there is. Scientists view their value as not merely hearing presentations (which after all is like reading papers), but asking questions, hearing dialog between different experts, and having long discussions. By definition, these activities are synthesis. I have already described in detail how the SP network provides tools for enhanced synthesis during review. I now briefly summarize how this could serve as a platform for long-term synthesis, in which the “review site” for each highlighted paper becomes in effect an “online symposium” where everyone in a field discusses the issues it raises.

  • the SP network makes reviews public: in traditional journals, peer reviews are kept secret. The SP network exposes them, and all the issues they raise, to the entire research community.
  • the SP network makes review distribution the main channel for discovering papers: some journals (e.g. PLoS ONE) provide a way for readers to comment on papers. However, in practice this is a minor frill that everybody ignores. By contrast, the SP network uses reviews (i.e. paper recommendations) as the main mechanism for distributing papers to readers in the first place. Most readers are going to see the review, probably before they read the paper. Thus synthesis is placed front-and-center, rather than as a minor footnote that nobody sees.
  • the SP network makes review an ongoing process: in a traditional journal, review is done long before publication, and then forgotten. In a setting like PLoS ONE, there is little reason to write review comments — the paper is already published! By contrast, in the SP network, “discovering” important (published) papers and recommending them is the SP network’s core mission. If an SPR finds a paper interesting, he recommends it, regardless of how long ago it was published, because his subscribers will probably find it interesting too. In a sense, every new recommendation “re-publishes” the paper, providing a compelling reason for writing a review at that moment: concretely, the SPR has to explain why he’s excited about this paper, and what his readers will get from it.
  • the SP network opens reviewing a paper to everyone: any reader who gets excited about a paper is encouraged to recommend it. Equally well, anyone who perceives a serious concern is encouraged to raise it in the issue tracker (which is the actual platform for all the reviews of a paper).

What We Need: a Focus on Impact

We have now defined the concrete details of what the SP network would do. I wish to conclude by analyzing in much more depth the role it can play in identifying high-impact research, and in accelerating the emergence of new fields. I begin by asking what “measuring impact” really means.

The Metric Problem: Not All Citations Are Created Equal

Many researchers have pointed out a basic problem with existing impact (citation) metrics: they treat all citations as having equal value, but in actual fact people cite papers for very different reasons, with very different inherent “values”. By convention, a citation is assumed to reflect credit to the original paper, i.e. to acknowledge the importance of its findings or ideas for subsequent work. In reality, however, sometimes a paper is cited because new data contradict its claims or assumptions; in this case the paper is cited to discredit it. We obviously would like to count these two types of citations very differently, but existing citation metrics offer no way to do that.

More generally, I wish to distinguish several very different reasons for reading or citing a paper:

  • must-read (MR): this means a paper that is essential reading for work in a specific subfield. Its signature: we actually need to read these papers to do our work. If we see the title, we feel we must read the abstract. If we read the abstract, we feel we must download and read the full paper. If we read the paper, we feel we must alert our colleagues by recommending it to them. This is a more demanding standard. We generally don’t feel this way about most papers published in our field, or even necessarily about most papers we cite. Thus MR represents the greatest impact a paper can have for an individual reader. For measuring impact, we don’t want to mix up this category with other (lesser) forms of impact! However, standard citation count methods provide no obvious way to achieve this.
  • relevant-reference (RR): this means work that is in some way relevant (but not a key contribution) to a researcher’s own work. The crucial difference is that whereas he must thoroughly read and understand the MR papers in order to do his own work, he may not need to actually read RR papers, but instead may treat them as “reference works” that he can simply look up when he needs them (via a literature search). For example, when writing a paper about a new method he has developed, professional etiquette requires that he look up and cite papers about competing methods, or papers that present data illustrating the need for his new method. Its signature: if a researcher accesses an RR paper, he is unlikely to recommend it to his colleagues.
  • general-interest (GI): this means work that a researcher reads “recreationally”, for its general interest rather than because of a direct value for his own work.
  • browsing (B): this means a paper that a researcher views (usually just the abstract) in the course of searching for relevant papers (usually during a literature search).
  • negative-reference (NR): this means a paper that a researcher cites in order to raise negative concerns, i.e. to cast doubt on its claims, assumptions or methodology.

It should be emphasized that each of these classifications is specific to an individual reader; that is, a paper might be MR for one person, but only RR for another. Thus, this measure of impact is entirely orthogonal to the question of audience size. Given that most research is highly specialized, we should expect that typically a paper could be MR for only a small number of readers (i.e. people within that specialized subfield). Thus a paper could both have very high impact within its field, but only a small audience. Conversely, a paper could have a very large audience (or citation count), but actually a low impact (i.e. RR, not MR). For example, one problem that has often been noted for standard citation count metrics, is that they tend to give their highest rankings to “standard methodology” papers (e.g. Lowry protein assay). Such citations are strictly RR; in most cases the researcher has not even read the paper, but simply copied the citation from another paper or from a literature search. To be specific, when these papers were first published they undoubtably were MR for researchers developing similar methodologies, but at present they are simply part of the standard “template” of research. Everybody may cite them, but almost nobody actually reads them anymore.

This classification suggests a basic ranking of levels of interest in a paper:

  • below RR: concretely, papers that nobody chooses to read (e.g. if shown the abstract) or to cite.
  • RR only: papers that nobody chooses to recommend (after reading it).
  • MR in one subfield: papers that people within one subfield choose to recommend.
  • MR in multiple fields: papers that people from different fields choose to recommend.

Indeed, these classifications can even be specified relative to an individual project. In that case MR means papers which the researcher(s) had to read in order to be able to do that project; RR means papers that were relevant background for that project; etc.

Retooling Metrics to Focus on Impact

How does this affect impact measurement? First, I note that this is a serious problem, in that MR citations probably constitute only a minor fraction of total citations in any given paper. Of course, the definition of MR is in the eye of the beholder. A survey of bibliographies from my own recent papers found typically 2 – 3 MR citations per paper, out of 20 – 30 total citations per paper. I could not have done the work without having read these MR papers; by contrast, the other citations just provided useful background. In many cases I looked up these secondary citations after having done the actual work, e.g. to cite relevant material for an Introduction or Discussion. Thus only 10% – 20% of citations were MR; the remaining 80% – 90% were RR (for me — they each probably had other readers for whom they were MR). These numbers will undoubtably vary from field to field or author to author. However, the very nature of the MR concept suggests that out of the large number of relevant background citations for any paper, in general only a fraction will be its crucial, direct antecedents.

Objective data support this conclusion. Analysis of misprints in citations found that 60-90% of citations are copied from other citations rather than from the original paper, with an overall value of approximately 80% :cite:`Simkin03` :cite:`Simkin05`. Since the copied source of these citations was not even the original paper, they are likely to be RR, not MR (indeed, this pattern is widely interpreted as evidence that the citing authors did not read the original paper). Moreover, for the remaining 20% of citations that are copied from the original paper, we have no reason to assume that all of them are MR. Overall, these data indicate that less than 20% of citations are MR. (Note: nearly all discussion of this pattern assumes that it is inherently “bad”– a form of cheating. But when we recognize the existence of RR as a valid citation category (i.e. not every citation must be MR), we see that it may simply be appropriate and efficient processing of RR citations. After all, must every user of a protein assay kit based on Lowry actually go back and read the original Lowry paper? No — only if that paper is MR for their project.)

Unlike standard citation counts, the SP network provides a way to directly measure a paper’s MR impact, separate from its RR or other readership. Specifically, the SP network measures each reader’s response to a paper, usually through multiple rounds (e.g. if shown the title, does the reader click to see the abstract? If shown the abstract, does he click to download the full paper? If he reads the paper, does he recommend it to his subscribers or colleagues?). By now, it must be obvious that the whole purpose of the SP network is that it functions as an MR detector. Reviewers will be constantly reminded to invest their precious time only on reviews of papers they consider must-read (MR). (Of course they can also publish negative or neutral reviews, but those ratings will be explicitly captured at the time of submission). Note that this measurement is automatic for every reader regardless of whether he’s a reviewer or not (if not, he is still given a link for recommending / forwarding the paper link to anyone he specifies). Furthermore, since the SP network maps the detailed structure of subfields, including what subfield(s) each reader belongs to, it can measure the MR interest of a paper not only for each reader, but for each distinct audience (subfield).

Authoritative vs. Search-based Distribution Systems

Next, let’s consider the problem of distribution. I define an “authoritative” distribution channel as one that claims to filter papers for high impact. For example, a peer-reviewed journal claims to publish only papers with important results for a given field. Similarly, word-of-mouth communication between colleagues focuses primarily on papers that one researcher believes will be of especial interest to his colleague(s). I define a “search-based” distribution channel as one that filters papers based on keyword search criteria (but not explicitly for impact). For example, PubMed or Google.

These definitions can help us understand the realities that now challenge scientific publishing:

  • Search engines have replaced journals as the primary path for finding relevant-references (RR). Typing a keyword query and obtaining relevance-ranked results from the whole literature is far more efficient than reading individual journals from cover to cover (or trying to run the same keyword searches over and over on different journal websites; or reading all the spam emails that journals send you; etc.). Journals simply do not represent an efficient distribution channel for finding RR.
  • Neither journals nor search engines offer an effective mechanism for finding MR papers. This is due to a series of problems:
    • a large fraction of papers published in sub-median journals are simply not MR for anybody. In other words, there is no one who feels they “must read” them, as reflected by the fact that no one later bothers to cite them (i.e. zero or very few citations, even including self-citations). Note that even for those that are cited, a large fraction of these are RR citations.
    • More to the point, for any individual researcher the fraction of MR papers in any journal is small. Taken together with the previous point, this means that even if one could define a query that identified papers in one’s desired subfield perfectly, the search engine results would give a poor yield of MR papers. (Moreover, this ignores the fact that a search query only works if you already know exactly what to look for. But the very nature of research is that we do not know exactly where the key insights (even for our own work) will come from).
    • the “prestige paradox”. In the traditional worldview of journal publishing, the answer to this problem is assumed to be “read better (i.e. more prestigious) journals”. After all, “better” journals filter strongly for higher impact. However, there is a paradox built in to the competitive market economics of conventional publishing, which follows the same model as other mass media such as newspapers or television. That is, the market value of a journal is essentially the size of the audience it can deliver. Authors compete to get into the “best” journal (largest audience) they can, to secure the largest possible readership for their paper. Journals compete to get the highest impact papers they can, in order to increase their readership.But how can a journal increase its audience size? Each subfield has a fixed audience size; the only way to boost readership is to try to appeal to larger and larger numbers of distinct audiences (subfields). In the days of ink-on-paper subscriptions, this produced the effect that journals desired, namely that more and more people (and more and more university libraries) would feel obligated to subscribe to that journal. But it paradoxically has the effect of diluting the fraction of papers in that journal that are MR for any individual reader. The larger the total readership, the larger the number of subfields the journal lumps together in its pages, and the smaller its fraction of papers from any one subfield. Under this inexorable logic, “better” (bigger audience) actually means worse (increased dilution of MR papers for any reader).It also leads to widespread confusion that equates “size of readership” with “impact”. For example, Nature’s high impact-factor is uncritically accepted as a measure of its true impact, when in actual fact it mixes together two different effects: the actual impact of its papers (i.e. whether they are MR for a given reader), and the total size of its readership. Say a Nature paper receives ten times the readership as a paper in a small journal. Even assuming they have the exact same impact, we would expect that the Nature paper would correspondingly be cited ten times more, and that (as usual) most of these citations would be RR, not MR. Indeed, it is certainly my experience that top journals like Nature, and high-end “specialist” journals (e.g. Nature Genetics or PLoS Computational Biology) have a low fraction of MR papers for my specific interests. A survey of bibliographies from my recent publications indeed found a large fraction of citations to high-end journals, but essentially all of these citations were actually RR, not MR.

    From the point of view of each reader, ideally journals would each focus on a single subfield, so that a large fraction of its papers would be MR for that audience. Unfortunately, the economics don’t work. As I outlined above, economies of scale push journals in the opposite direction, towards larger readerships and “building the brand”. Fundamentally, there is a mismatch between journal size (in terms of number of readers) and the actual size of research subfields (in terms of numbers of researchers). Journals want to be big (or at least big enough to survive); but most research subfields are small. Research today is simply too highly specialized for journals to match.

The SP Network as an Authoritative Channel

The SP network solves this problem of mismatch of scales, because it uses the finest granularity possible — the recommendations of an individual reviewer. We can define its “maximum focus ratio” as how much more finely it can focus, compared with a journal. We can estimate this from the number of distinct referees a journal uses per year. For example, if a journal uses three referees per paper, and some fraction of referees review more than one paper for that journal per year, the maximum focus ratio might be approximately two times the number of papers the journal publishes per year. Depending on the journal, this might be two to three orders of magnitude. We can also define its effective focus ratio for a specific subscriber as the total of reviewers he ends up receiving recommendations from (via one or more subscription “links” away from him), divided by the total number of distinct referees used by all the journals these papers came from.

Note that the SP network’s structure automatically adjusts this focus ratio according to each subscriber’s interests. If he works in a small subfield, he will tend to receive recommendations only from within that small community, and his effective focus ratio will be very high. By contrast if he is part of a large field (or is interested in several different fields), he will receive recommendations from a much larger set of reviewers, and his effective focus ratio may be lower. The point is that the SP network naturally delivers an individualized focus level that is appropriate for each subscriber. By contrast, the editor of a journal must choose a single focus level (“our journal will cover the following list of topics…”), which then is imposed uniformly on all readers of that journal.

Focus is only useful if you actually have the ability to focus on the right things, in this case, MR papers for each subscriber. Fortunately, the SP network does exactly that: it both distinguishes MR interest vs. RR interest in its automatic measurements, and by definition it only relays MR papers to subscribers. That is, reviewers are only asked to review papers that they personally consider MR, that grab their interest, that they simply can’t stop themselves from reading. In the SP network the journal editor’s eternal problem (“I can’t get anybody to review this paper…”) becomes a powerful positive principle: lazy referees who can’t be bothered to review anything that’s not vital to them = good reviewers, and strong filtering for MR papers.


It is useful to ask how much better the SP network can discover MR papers relative to other methods. We can state this as a principle: the SP network is equivalent to all other methods to the N-th power, where N is the number of reviewers in a given subfield. In other words, the SP network is not a different, competing way of finding MR; it is just a way of propagating what each individual finds. The SP network does not care how a reviewer found a given MR paper (i.e. literature search; journal website; word-of-mouth; etc.); it simply relays whatever he finds. In this sense, for a single reviewer it is simply equivalent to the sum of all methods he uses for finding MR. But then there is the network effect. Without the SP net, each individual only has access to what he himself finds. With the SP net, he automatically gets what everyone in his community finds (using all the methods at their disposal). It’s like changing from a “little antenna” to a “huge antenna”, where the multiplier is the size of the entire community. In probabilistic terms, if the independent probability that any given reviewer fails to discover one MR paper is p, then the probability that the SP network fails to discover it decreases exponentially as p^N.

Let’s make this completely concrete: comparing journals vs. the SP net, what is the end game for an individual researcher to identify a good MR channel? For journals, there is no direct end-game, only vague directions that one could try. For example, a researcher could look at back issues of different journals, or look at their editorial board member lists; or run literature searches, etc. For the SP net, the end-game is immediate: list colleagues and competitors who do the same work as you, and subscribe to them. The SP network works for two reasons: 1. because everyone in a subfield can immediately identify at least a few other people in that subfield; 2. because the size of the connected subgraph grows as a power law r^s, where r is the “radius” of recommendation-steps between any two members of the subfield, and s is the average number of subscriptions per person. Thus, even if someone only knows a few other members of their subfield, everyone in that subfield will be only a few links away in the network. We say the SP network is “effective” as an MR distribution system because it gives each person an end-game for getting what they want.

This power-law also gives the SP network speed for disseminating MR papers throughout a community. Since “opinion leaders” within a community are likely to subscribe to each other, once a paper is recommended by one of these reviewers, it can quickly spread to all of them and through them to the entire community. This process could take as short as a few days — after all, the SP network runs on Internet-time. In general, the time required to reach the entire community grows only as the log of the size of the community.

How Impact Drives Research Productivity and the Growth of New Subfields

Most analyses of impact consider it purely in terms of retrospective assessment, that is, how to “grade” researchers, journals etc. However, the ultimate interest of a scientific publication network is to make research more productive, by accelerating key connections between researchers. I now briefly present a model of how impact drives research productivity, and of how an SP network can make an important contribution.

By definition, MR papers include the key innovations in any field (i.e. workers in the field will eventually consider these innovations “must-read”). Thus, the rate at which such MR papers spread through the community, and their total coverage (percentage of the community that finds each one) constitute “rate-limiting” factors for the pace of key innovations in the field. Other things being equal, progress on a research opportunity will be proportional to the number of researchers working on it, the range of different expertise they bring to it, and the speed with which they can share new findings, questions, and insights. All of these are affected by the MR rate and coverage. Concretely, we must consider several ways in which MR papers drive research:

  • MR papers recruit new researchers to a research opportunity they previously were unaware of. The lifecycle of a research problem always begins with definition, in which someone first formulates a question as an important research opportunity. Such a publication is by definition MR for subsequent work on that question, and it is self-evident that progress on this question depends on the speed and coverage of the paper’s distribution to its potential audience.
  • MR papers introduce novel connections between different expertises that help solve the problem. Thus they both directly contribute part of the solution (by identifying the specific combination of tools for cracking it), and recruit people with diverse expertise, by showing that their expertise really is useful for this problem.

MR papers play an essential role in the first two stages of the life-cycle of a new field:

  • nucleation: MR papers lay the key foundations for a new field, by defining a new research question, and attracting researchers with the right mixture of expertises to work on it. Note that this audience might be diverse (in the sense of coming from different expertises) and disorganized (in the sense of there existing no organized channel for communicating with members of this audience). Thus MR papers play a crucial, nucleating role for “growing” a research community focused on a new question.
  • growth: once there exists a community of researchers working on a shared set of questions, they will produce a progression of MR papers that gradually solve key problems for answering these questions. As a result the field grows both in ability to solve problems and in number of researchers.
Barriers to MR Paper Distribution

It is therefore crucial to consider what factors limit the speed and coverage with which an MR paper can find its potential audience.

  • no existing channel: journals focus on specific, existing fields. A journal can only serve as an effective channel for a specific research question if it both reaches a good fraction of the potential audience for that question, and also will reliably match such a paper to referees from this audience (who can assess its impact and validity). If this audience does not closely match an existing audience (subfield) that a journal targets, there will commonly not be any journal that meets both these criteria. Note that even if an MR paper could be published in this case, it would have little impact (because it would reach only a negligible portion of its target audience). This problem blocks the recruitment of researchers to a new research question during the nucleation stage of a new subfield.
  • the innovation paradox: the more a paper deviates from the standard assumptions of a field, the lower the fraction of people in the field who will understand it if asked to referee it. For example, for a paper that follows a standard approach, say that 90% of potential referees understand that approach (allowing for some variation in referee backgrounds). Then the probability of drawing three reviews in which all three referees understand the approach is 73%. By contrast, if a paper introduces a novel approach that only 50% of referees will readily understand, the chance of drawing three referees who understand it falls to just 13%.
  • the interdisciplinary paradox: this problem gets worse if an MR paper combines different kinds of expertise. The probability that a referee will feel comfortable about his understanding of a paper’s methodology will fall to zero, if it requires material from a discipline that is outside his expertise. As we saw in the previous example, a paper becomes effectively unpublishable if it requires a combination of expertise that less than one half of reviewers possess. Unfortunately, interdisciplinary MR papers typically fall far below this threshold.

Note that in both of the previous cases, the very characteristics that will eventually make a paper MR to its target audience (major innovations; new combinations of expertise that solve a problem) can initially make it unpublishable. In that case the “speed and coverage” of distribution of such MR papers fall to zero. Alternatively, these characteristics can simply reduce the speed and coverage of distribution many-fold relative to papers that lack these MR characteristics. Since MR papers constitute the key events that drive progress in a subfield and the emergence of new subfields, this reduction in MR speed and coverage correspondingly slows down the pace of discovery in toto for those subfields. These problems block both the nucleation and growth of new subfields.

I wish to stress in the strongest terms that this analysis in no way implies that successfully (or prominently) published papers lack innovation, are not interdisciplinary, or are not MR! The key question for successful publication of MR papers is whether there is a pre-existing constituency (with its own journals and referees) that closely matches a paper’s approach and combination of expertises.

How the SP Network Can Accelerate the Creation and Growth of New Fields

These problems suggest an important opportunity for accelerating the pace of discovery, by giving researchers better tools for searching for the true target audience of any paper. Whereas publication in a traditional journal can be blocked by even a modest reduction (e.g. 50%) in the probability of a matching reviewer expertise, the SP network can search rapidly and automatically for the specific audience for which a paper is MR. The ability to match each paper to its MR target audience (if any such audience exists!) enables the SP network to play a useful new role in accelerating the nucleation of new subfields, and in the growth of subfields via major innovations, and combining expertise from different fields — steps which are especially hard to communicate in the traditional publication system.

The SP Network FAQ

It sounds like you’re proposing a lot more referees look at each paper. Isn’t that a lot more work?

A paper will initially be shown to a larger number of readers, simply to measure their level of interest. However they do not have to do anything (i.e. write a review): the SP network simply measures their click-through rates. Readers will be explicitly instructed to simply ignore any title that does not interest them. If a given reader shows a strong interest in possibly recommending a paper, they will be invited to join the beta review phase, which is more equivalent to review in a traditional journal (i.e. you actually write a review). Note that this is likely to be less frequent than traditional journals’ review requests, simply because most papers published in a field do not achieve this high level of interest for a given reader. Remember that the SP network will never ask anyone to review a paper that is not of burning interest to them. Thus, a typical referee’s total burden will probably go down, and their benefit from reviewing will go way up.

This sounds ambitious. Who’s going to pay for all this?

The first thing to bear in mind is that the SP network represents a net cost savings, not a net expense, because it shifts expensive functions (i.e. that use up human time) to cheap, efficient infrastructure (e.g. automated click-through metrics). It will reduce net costs for authors, readers, reviewers, and journals. That said, let’s consider its capital requirements. These consist of software development, web services, and management costs.

  • software: All of the mechanisms proposed in Phase I can be implemented using existing (free) open source software, with a small amount of custom software written to integrate them and to create an attractive look-and-feel. Concretely, this could all be done with less than $50k of contract development, but in practice would be better off done as an open source project, in which interested developers participate without salary. Getting the right people excited about the project is far more important than money.
  • servers: relative to common research activities such as collecting and analyzing next-gen sequencing data, the IT requirements of the SP network are trivial. Specifically: 1. when an SPR recommends a paper, send an email to his subscribers (a short list of people); 2. when a reader clicks a URL to access an abstract, review or paper, record the request and forward the reader to the external website serving the paper (usually a journal). The entire traffic of SP network subscriptions and history-tracking could probably be run for the first few years from a handful of Linux servers. Indeed, if necessary, it could served for free in a highly scalable way using Google App Engine.
  • management: the SP network only requires human management for new initiatives, rather than day-to-day management. Unlike a journal, it has no need for editorial staff to find reviewers, nag them, make editorial decisions etc. The authors and SPRs simply manage all this themselves, because the SP network is built on giving each SPR autonomy to make his own decisions. Moreover it is designed to allow management tasks to be distributed over the entire research community that uses it. For example, different people in different fields can each invest effort to identify and invite SPRs within their fields to join the SP network. From this point of view I see the SP network being managed very much like a professional society, i.e. by volunteers from the research community who see it as important for their community.

Finally, the wide array of research topics proposed in Phase II presumably will be paid for in the same way that all research projects are paid for. I.e. researchers who want to use the SP network as a “research instrument” for answering these research questions, will simply do this research using whatever existing funds they already have, and by writing new grant proposals. However, none of this represents a capital requirement for the SP network itself.

Ideally, the SP network idea would be developed by (or in partnership with) an organization that has a strong stake in open access publishing, such as PLoS, Arxiv, PubMed / National Library of Medicine etc. The costs outlined above would be minor for such an organization. However, it is also feasible to develop the SP network as an open source project without such an organization.

What’s to stop an author from “bribing” a referee?

For people used to the traditional system, the openness of the SP network may seem to make it especially vulnerable to improper activity. After all, if “anyone” in a field can act as referee for a paper, what’s to stop “anyone” from doing something improper? For example, what if an author “bribes” a referee or editor to recommend his paper? We compare the payoffs vs. penalties for this behavior under the two systems.

In the context of a traditional journal this will seem like a very serious corruption of standards, because it directly translates into a large reward for the author, namely getting his paper published in the journal. Note that since the standard metric of a paper’s value in traditional publishing is simply the prestige of the journal it was published in, the author gets as high a reward for his act of bribery — regardless of his paper’s actual level of interest to readers — as he could have obtained from publishing a paper of high interest to the journal’s readers. Now consider the reward the author could hope to obtain by “bribing” an SP network reviewer to recommend his paper. Since the SP network “value metric” for the paper is based on directly measuring readers’ interest in the paper, getting one phony recommendation will boost the paper’s score very little. Concretely, a few more people (that specific reviewer’s subscribers) will be shown the paper’s title, but if the paper is not of interest to them, little follow-up activity will result (i.e. they will not take the time to download and read the paper; recommend it to their own subscribers etc.). The crucial difference is that in the traditional journal system, getting a paper “published” is an end in itself (and constitutes its primary value for the author’s career), whereas in the SP network, getting recommended by a given reviewer has little value in itself; the real value lies in the paper’s actual interest to readers, because that is what the SP network measures. Thus, while authors are motivated by a very high reward to build close personal and political relationships with journal editors and likely reviewers (and some take every opportunity to do so), in an SP network such forms of improper influence confer very little benefit.

Next, let’s consider the penalties in the two systems which might discourage such behaviors. Unfortunately, in a traditional journal the total lack of transparency of the review process virtually guarantees that no one will be allowed to see any evidence of the misbehavior, and that the guilty parties face no possible disadvantage or censure. The referees’ names are kept secret (even from each other), and in many journals even the name of the responsible editor is not published. The actual reviews are of course also kept secret. Even if some discerning readers independently notice serious problems in a paper (“another embarrassingly inaccurate paper in PNAS?!?”), the chain of evidence has been wiped clean and there is no way to hold anyone accountable. Moreover, because the paper is mixed into a very large pool (i.e. all the papers published by the journal), one bad paper will have little effect on the overall quality of the journal and little or no effect on its reputation. In short, there are no penalties. By contrast, in an SP network the situation is completely different. First, the review process is completely transparent. The reviewer’s identity is not merely visible but foremost in view: instead of hiding behind the facade of a journal, he publishes his recommendations in his own name. Furthermore, he must publish his actual review. This is the “smoking gun” of phony review. Either he must justify his recommendation by publishing outright falsehoods or write an empty review (“plausible deniability”) that offers praise but no substance. Either way he commits to the public record a written proof of his own cupidity. His relationship with his subscribers is personal; they individually chose to subscribe to his recommendations because they trusted him. If he violates that trust, his individual readers are going to notice. The fact is that reputations are built slowly but lost quickly. If a reviewer does this repeatedly or in some kind of noticeable pattern, he will soon have few subscribers left. Indeed, recommending even a single article that does not interest his subscribers will immediately reduce his “influence” metric — because the SP network measures these interest levels in real-time. Since an individual reviewer typically recommends only a small number of papers (compared with the large number of papers published by a journal), a phony recommendation will have a relatively large effect on his influence metric. The SP network is all about matching your subscribers’ interests. Deviating from this fundamental mission produces immediate penalties.

Of course, no system can eliminate these human behaviors. It is to be expected in either system that some fraction of reviewers will favor their buddies’ papers out of proportion to their actual interest. However, the very different balance of “payoffs” vs. penalties in the SP network will shift the balance of attitudes and behaviors. This effect is universally understood in political science and economics: transparency favors accountability and a culture of adherence to community standards; secrecy eliminates accountability and rewards the attitude of “whatever you can get away with”.

What’s to stop an author from only suggesting his buddies as referees?

If an author wishes to reduce his article’s readership, by artificially restricting the set of referees he invites to view it, in this system he hurts only himself. Of course, one could apply standard restrictions such as excluding referees from the same institution or recent collaborators. But this seems unnecessary, since in the SP network the impact metric reported for a paper is its actual readership and level of interest to readers. One could even develop “incest” metrics that explicitly measure whether a paper is only recommended by collaborators or by people whom the authors have themselves recommended.

What about controversial areas like climate-change research?

Research areas that are under attack for external, non-scientific reasons will undoubtedly require vetting of SPRs seeking to review and publish within that field, to ensure that they have actually published peer-reviewed research in that field. In general, the SP network’s rigorous impact and influence metrics should help make clear whether an individual is a respected contributor in a field, or not. Moreover, the SP network will follow membership guidelines in keeping with its purpose. It is a public forum for scientific research, and membership must be earned, both by recognized contribution to that purpose, and by adherence to professional guidelines for that purpose. In addition to the usual “inappropriate language” guidelines (no flame-wars etc.), this must include a strict commitment to deciding questions based on data rather than preconceptions.

Doesn’t all this tracking violate users’ privacy?

If we compare the SP network proposal with common services that most people use, we find that it poses considerably fewer privacy risks than standard services such as email and web query engines (e.g. Google). That is, using an email service means in principle that your service provider (i.e. your university; or Google; etc.) “knows” every word in the emails you’ve sent or received: your data are not only stored by the service provider, but also their content is systematically scanned by the service provider (e.g. spam filtering). For most people, the fact that their emails are stored and scanned is not in itself a major concern; the real question is how the service provider protects the user’s data from exposure or misuse. Moreover, the service provider’s access to the data is intrinsic to the service it provides you (i.e. it has to receive and store your emails, in order to give you access to them), as opposed to an artificially imposed attempt to “steal” information from you.

Similarly, the SP network provides a service (superficially, subscriptions to “selected paper” lists; more fundamentally, a system for connecting authors to the audience(s) that find their papers to be of vital interest). Intrinsic to performing that service it must obtain and store the information that make the service possible (superficially, subscription information; more fundamentally, measurements of article interest and how papers spread through the network). Its actual data (essentially, paper readership) pose dramatically fewer privacy risks than standard services such as email, because they involve no personal or other sensitive information (unlike email, for example). Of course, the SP network will take every possible measure to prevent exposure or misuse of even these non-personal data (e.g. de-identifying both papers and readers in any dataset to be used for algorithmic or metrics research). Finally, the SP network is the collective property (and will be under the control) of the research community it serves, so the community will always be able to enforce its preferences for the handling of its data. This poses far fewer privacy concerns than the serious conflicts of interest that arise when a for-profit advertising monopoly (e.g. Google; Facebook) record and gain control over huge amounts of private information from their users.

Concretely, user data are handled in the following ways:

  • public data: published reviews. The whole purpose of publishing “selected paper” recommendations is to make them public. Published metrics (e.g. of a paper’s readership or interest level, or of a reviewer’s influence / impact). Note that these metrics are no different from paper statistics (e.g. number of PDF downloads; number of HTML page views) readily available on many journal websites today.
  • protected data: subscription information; readership information. These data will not be made public anywhere, and even for purposes of research analysis will be strictly de-identified in research datasets. Note that this is a far stronger privacy protection than most scientists undertake, e.g. at conferences, other people can see which talks you choose to attend, and most scientists would not consider this to be a significant breach of privacy.

Finally, researchers can always opt-out of using the SP network for specific topics that they consider sensitive (e.g. if you really don’t want some computer to record that you keep reading papers about the sex lives of small furry animals, then you should access those papers directly from the journal rather than through the SP network. Of course, the journal itself may track your access events).

10 Responses to “Open Peer Review by a Selected-Papers Network”

  1. Salvatore McDonagh Says:

    Hi Christopher (took some digging to find out your name – not on your blog post or your about page!)

    This is an excellent concept that really needs to be pushed hard to overturn the archaic and often unfair (to both reviewer and reviewed author – as you point out in your article) system – that requires scientists to pay for the privilege of publishing their work, or provide free reviews, or both,. And having to then purchase at a premium price the same publication which is profiting from their work.

    The few, big name, high impact traditional journals have provided a much needed service in filtering and quality control, yet they also have conflicting needs – to publish a certain number of pages, on schedule, to sell advertising, and to turn a profit. For publicly traded journal owning corporations, the shareholders care mainly about profits, share prices and dividends – not the advancement of science. As such, the main goal of the commercial journals is not to educate, entertain or inform – it is to sell more copies of the publication. Granted, they no longer have the market cornered, with open journals, small niche journals, and the universal accessibility of online publication eating at their market share. Yet they still dominate the university publications purchasing budgets.

    It seems to me (and please correct me here – I’m only new to science) that there is an embarrassing lack of business acumen amongst scientists in general – practices that would not be tolerated for long in other industries seem ingrained in the scientific tradition, and it is almost with pride that scientists live a tenuous vocational existence. Many (perhaps most) good scientists are reduced to becoming grant application writers, paper resubmission editors, and slaves to the whims and fashions of government education policies, having to uproot themselves and their families every few years to pursue research in another institute that will find what they are interested in doing (or that they can convince themselves that they might be interested in doing, just to stay in science.)

    It is high time that individual scientists had more control of their own funding, with a direct relationship between their work and the people who pay for it, and this looks like a good place to start. Democratizing science – I’ll vote for that!

    Thanks and regards,

    Salvatore

  2. Golan Yona Says:

    Chris, this is a very impressive and well thought plan. I think it is
    an excellent proposal. It won’t be easy to change old traditions, but I
    think that with the prevalence of social networks, the ground is ready
    for this new approach to peer review.

  3. Serge Says:

    What about ppls who don’t publish paper but work in industry?
    I’m working in computer vision and would be happy to review some paper in computer vision, optimization and compressive sensing, especially those claiming practical applications, but I don’t have any paper in computer vision

  4. More on Peer Review 2.0 « Researcher's Blog Says:

    [...] another interesting and very detailed proposal on an alternative peer review model is Open Peer Review by a Selected-Papers Network by Chris Lee. Share this:EmailPrintFacebookTwitterLinkedInRedditDiggStumbleUponTumblrLike [...]

  5. Researcher Says:

    One thing I didn’t quite catch is this: can the reviewers choose to be anonymous (but still get the points for the reviews)?

  6. leec Says:

    I don’t see why not…

  7. leec Says:

    I think a model similar to arXiv can work: to become a member of arXiv, you either have to be at (have an email address from) a university, or be “invited” by someone who already is a member. Membership in this is intended to be open, restricted only to prevent abuse (such as spam).

  8. Travis Korte: Connecting The Dots: Lessons in Rebellion From the Math Network | USA Press Says:

    [...] system being developed to leverage the power of such professional relationships is Dr. Christopher Lee’s Selected-Papers Network (“SP net”), a sort of Pinterest for peer-review in which researchers with common [...]

  9. Sofia airport Says:

    Old habits die hard and it will be very difficult to change the status quo, especially in this field. As someone commented above “Democratizing science – I’ll vote for that!”. Good work, keep it up and the current system will change, slowly but change is inevitable.

  10. e-lab-book.com Better Peer-Review,… and less money gauging by Journals | e-lab-book.com Says:

    [...] quickly skimmed a proposal here http://thinking.bioinformatics.ucla.edu/2011/07/02/open-peer-review-by-a-selected-papers-network/, while Googling for something totally unrelated (distracted once [...]

Leave a Reply