Welcome to p2pnet.net - The original daily p2p and digital news site. Always First!
REGISTER | LOGIN
Cool Stuff
MPAA News
Games / Consoles
News
Music
Movies
Reviews
Open Source
Mobiles
Advertising
Products
P2P
Off Topic
Freedom
Politics
Interviews
Security
DRM
Links
Kids and Kartels
Scroogle Search: 
Search
 
Web p2pnet   
Search: 
Search
Torrent Site Tracker
    Sponsored by
Frostwire
 
p2pnet
 


mp3rocket
 
Add real-time p2pnet headlines to YOUR site ! Click here to download our newsfeed code

P2P and Radio Play

p2pnet.net News:- P2p isn’t going away any time soon and, “The music industry must adapt or risk being swamped by technical progress,” says Waynn Lue in his fascinating new paper, Peer-to-Peer Networks and Radio Play: An Unexplored Link.. “Technology cannot be held back once developed, and trying to stamp it out through litigation is impossible.”

Lue told us he’s a CS/Econ double major at Stanford, “starting my Master’s in Computer Science.


“I’ve been interested in p2p networks since the original Napster,
so I was hoping to combine a few areas of expertise when I wrote this thesis.”

Read on >>>>>>>>>>>>>>>>>>>>>>>>

I. Introduction and History

Peer-to-peer networks (P2P) have become ubiquitous in the last six or seven years. In general, P2P networks consist of software that allows users to trade files with each other. This software provides search functionality so that users can find specific files, then facilitates a connection between someone who has the file and the person who wants to download it. These files can consist of anything from copyrighted music to movies to open source software distributions.

Napster, the first major peer-to-peer (P2P) network, was introduced in 1999. Designed to allow people to share and download music in MP3 format, it was based on a centralized server that kept an index of all currently shared songs. Word of mouth and easy access to music caused the network to grow to over 13 million users by early 2001, and many other programs sprang up to both take advantage of Napster functionality as well as compete directly with it. Wrapster, an add-on written by a third-party, allowed multiple file types to be shared (since the original Napster allowed only MP3 files to be shared). But while various other P2P programs like Scour, Audiogalaxy, iMesh, Freenet and many others gained popularity in the same time frame, Napster remained the dominant network. Its popularity led to legal action by the recording industry, which won an injunction against the service in mid-2001, effectively shutting it down.

With Napster`s death, the other P2P networks continued to provide similar services, growing in importance and public exposure. KaZaA, which utilized the FastTrack protocol, provided a semi-centralized system, hoping to avoid the same legal problems that Napster had. Without a central server or single point of failure, the company could not control what content passed through its network, and so could not assume legal responsibility. As the most popular P2P network today, it had over three million simultaneous users and over half a billion shared files as of 2003. It also is the biggest target for the music industry, and its legality is currently being challenged. On March 29, 2005, oral arguments in MGM vs. Grokster were heard in the Supreme Court—Grokster uses the same FastTrack protocol as KaZaA. The outcome of this case will determine whether KaZaA`s technological choices will prove a sound defense.

But even if the FastTrack clients are ruled illegal, there are still other P2P networks. Another popular P2P network is Gnutella, developed by Nullsoft, a division of America Online. Gnutella is a protocol that uses a completely decentralized system, so no one company can be targeted. Even if Limewire or Bearshare (different clients on the Gnutella network) goes out of existence, the network will still exist. Each client connects by getting a list of people in the network from other clients, which means there is no single point of failure. A comparison of different architectures for P2P networks can be found in Minar 2001, and a technical analysis of those that use the networks can be found in Saroiu et al. 2003.

Regardless of the type of P2P network, though, there are problems that they all face, one of which is freeloaders. A freeloader is someone who downloads files but does not share any files for other people to download.

Various papers have been written to address this problem, treating a digital file as a shared resource, and framing the analysis around the tragedy of the commons. While Adar and Huberman 2000 set the number of freeloaders on the Gnutella network at 70%, Saroiu et al. 2003 refine the analysis much further by doing a crawling search across all peers in the network periodically, hitting around 40-60% of the network per crawl. Once that information is stored, each peer is queried against a list of files. Since Saroiu et al. search a fairly large sample of the network, their measurement that around 25% of users do not share any files should be fairly accurate. Their numbers for Napster were taken at the height of its use (given that the network was more or less dead by 2003, when they published their results), and so should be representative of Napster`s users.

Others, such as Shirky 2000, have countered with their own papers, arguing that a digital good can be copied infinitely and so cannot be a commonly owned good. If a sheep eats a mouthful of grass (the most common example of the tragedy of the commons), the overall level of grass available decreases. But if a person downloads a song, the overall level of available songs either remains the same or increases. If that person chooses not to share the file, the level is the same, and if he or she does share the file, the level increases. Golle et al. 2001 provide another interesting approach to the free-rider problem, using a game-theoretic model to analyze user behavior.

Recent innovations in P2P networks have resulted in new systems designed to combat this problem. BitTorrent has a tit-for-tat algorithm that ensures everyone contributes by setting a client`s download speed (often referred to as downstream) proportional to the client`s upload speed. Each client therefore has incentive to contribute to the distribution of the file, which makes BitTorrent a good system for popular releases, as it scales very well. Bram Cohen, the creator of BitTorrent, provides a more technical description of the protocol and its model of how users can be motivated to help amortize bandwidth costs in his paper Incentives Build Robustness in BitTorrent.

As P2P networks have become more popular, the controversy surrounding their use has also grown. The Recording Industry Association of America (RIAA) has filed lawsuits, making the claim that P2P downloads negatively affect CD sales. Total shipments in the US have declined from over $14 billion in 1999 to under $12 billion in 2003 (the latest year for which the RIAA has publicly available figures in its Consumer Profile), corresponding to the same period in which P2P grew in popularity. Given these numbers, the RIAA makes the argument that downloads displace CD sales, or in other words, that people choose to download music instead of buying CDs.

II. RIAA Lawsuits

The RIAA began filing lawsuits against individual users on September 8, 2003, as reported by Dean 2003. 261 users on P2P networks each sharing around 1,000 files were sued, and settlements in the range of $3,000 were reported. As the months passed, the RIAA continued its legal campaign. The following table provides a brief timeline of the lawsuits filed to date.

This series of lawsuits represents one of the first times digital infringement has been dealt with at this level, with over 10,000 people sued to date. While most other prosecutions in the past have involved infringing companies (the current lawsuit against Grokster, for example), the RIAA`s campaign focuses on preventing individual users from sharing files, a move they hope will choke off any incentive to use P2P networks. Their strategy relies on the widely-reported figure that a small fraction of the users on a network provide the vast majority of the files shared—in fact, Adar and Huberman 2000 argue that 20% of users share 98% of all files on a network.

Barker 2004 analyzes the RIAA lawsuits from a legal perspective, arguing that the damages the RIAA has suffered are far smaller than what copyright law allows ($750 per copyrighted work). Even if one download results in one lost album sale (which assumes that everyone who downloads would have otherwise bought the album, an assumption that clearly does not hold), the actual economic harm can be no more than $20. If that is the case, the punitive amount of $750 is more than 30 times the actual damage, which could be seen as grossly excessive. Barker cites Campbell, in practice few awards exceeding a single-digit ratio between punitive and compensatory damages, to a significant degree, will satisfy due process.

The Motion Picture Association of America (MPAA) has also started following a similar strategy, as Borland 2004b reports. On November 16, 2004, the MPAA filed lawsuits against an unspecified number of people in hopes of curbing movie piracy. While this form of piracy is not a focus of my thesis, I will note that the movie industry would do well to learn from the lessons of the music industry. As bandwidth costs decrease and storage space increases, movies will become as easily traded as music files currently are. Already BitTorrent sites provide access to popular movies, sometimes before they even air in theaters.

While some papers claim that the lawsuits have been effective in reducing P2P activity (some are referenced later), Karagiannis et al. 2004 present a fairly detailed technical analysis that shows that P2P traffic has not decreased over the period from April 2002 to January 2004, when the lawsuits were first filed. They analyze traffic at the link layer, capturing packets and looking at their payload to determine if it belongs to a P2P network or not. This analysis relies on two factors. First, port numbers, since P2P applications sometimes have well-defined ports that they operate on. For example, BitTorrent by default uses 6881-6889, though this is configurable. Second, they look at requests within the packet themselves, looking for recognizable sub-sequences. Once a particular string is identified, that traffic can be classified as P2P activity.

After getting traces from three random days, over a three-year period from 2002-2004, they find that there is no decline in P2P usage, and possibly an increase. They attribute other studies that measure a decrease in P2P usage to two factors. First, people are using different P2P networks, branching out and distributing their traffic across multiple programs. As such, while traffic on a specific network may decrease (and some studies have shown that), P2P traffic as a whole has not been affected. Second, the traffic is becoming harder to identify, as P2P applications employ multiple ports and try to hide their traffic. They note that much of traffic that now is reported as Unidentified is actually P2P traffic. Some more studies are cited below.

III. Survey of Past Literature

The academic landscape is filled with papers addressing the question of whether or not P2P downloads affect CD sales. While I originally intended to write yet another paper on that topic, I quickly became convinced that it has become saturated. However, as it currently is the most recognized topic involving P2P systems, and has the most significance in terms of current events, I provide a detailed look at and criticisms of literature involving this topic.

I begin with a discussion of possible economic factors influencing record sales, provided by Liebowitz 2004. In this paper, Liebowitz analyzes a variety of factors that could affect CD sales, then arrives at the conclusion that downloads are hurting CD sales. He sets the ratio of downloads to lost sales is 5:1 or 6:1 by comparing MP3 downloads and CD sales.

There are a few issues to consider. First, the publicly available RIAA data uses a derived list price as its value. Liebowitz dismisses any effect that this has on the change in value, showing that income is not responsible for any decrease in record sales. Next, he also dismisses price, format changes, and substitute forms of entertainment. On the last claim, he uses two sources of information. The first finds the correlation between video game sales and CD sales. The second is from the US Statistical Abstract, used to derive the amount of time an average person spends in a day on listening to recorded music (45 minutes), going to the movies (2 minutes), watching prerecorded movies (9 minutes), and playing video games (7 minutes). Here I disagree most with his analysis.

Liebowitz 2004 chooses to look at video game sales from the period of 1990 to 2000. He finds a positive correlation between the two, which he argues to mean that there is no substitution, since an increase in video game sales does not cause a decrease in CD sales. He justifies dropping 2001-2003 from his analysis by arguing that he wants to avoid having to control for any possible external effect of MP3s on CD sales, and so excludes all years after the introduction of Napster. But if the entire period is included, the correlation between the two is -0.16, implying that an increase in video game sales decreases CD sales, a fact that Liebowitz hides in a footnote. Yet it can be argued that it was the period after 2000 when video game industry really started to mature, and when these games started to take consumers away from music consumption.

A series of articles in late 2003 addressed the decline in TV ratings, an example of which is Donaldson-Evans 2003. They attribute this decline in part to video games and the advent of electronic media. It is only a short intuitive leap to apply the same logic to CD sales. In fact, a recent study done in 2005 by Phoenix Marketing International found that households containing at least one game system played video games 13 hours a week, or a little less than two hours a day.

In addition, both the Phoenix Marketing International survey and the Statistical Abstract do not differentiate between different ages when they give these numbers for the amount of time people spend playing video games. If people generally agree that downloading differs based on age (as Liebowitz does later in his paper), the analysis on game playing behavior should make the same distinction when classifying how people spend their time. For example, it seems clear that the breakdown for playing games in the average household is not evenly spread out among the 4.5 members of the household, but is heavily biased towards the younger audience, those that are accused of downloading the most.

Liebowitz 2005 takes this issue up again, showing a positive correlation between videogame revenue per capita and change in per capita record sales. But again he uses data only from 1991 to 1999, ignoring the increases since then. The videogame playing time is revised upwards to ten minutes in the 2005 Statistical Abstract, still below the two hours per household number cited above, and again not broken down by age. He also makes the observation that playing videogames does not preclude listening to recorded music, an assumption which may or may not hold, depending on what type of video game is being played.

Blackburn 2004 comes to a similar conclusion that downloads hurt sales, by using data from both BigChampagne and SoundScan. In his paper, he separates the effects of file-sharing on both well-known and lesser-known artists in an attempt to isolate two conflicting effects. The first effect is the straightforward substitution argument, that downloads are a substitute for CDs and therefore directly negatively impact CD sales. The second effect is the informational effect, in that obscure artists gain more exposure from being on a network.

He comes to the conclusion that the first effect is stronger for well-known artists, and the second effect is stronger for lesser-known ones. However, since well-known artists make up a much larger part of CD sales, there is an overall decrease in CD sales. If available files online were reduced by 30%, he estimates that sales would have increased by 10% in 2003. He also concludes that the RIAA strategy of lawsuits targeting individual users has had a positive effect on sales, increasing them by 2.9% over a 23 week period. This conclusion corresponds with Madden and Lenhart 2004, although there are problems with that study. Madden and Lenhart show that the share of downloaders has fallen to 14%, but also say that 20% still share files, which seems a strange juxtaposition. In addition, they fail to take into account changing technology, only measuring the same four P2P networks and not looking at new systems like BitTorrent. Cachelogic 2004 places BitTorrent usage at 53% of all P2P usage, and also shows that P2P usage has increased, not decreased.

Blackburn bases his analysis on the assumption that songs are most popular the week they come out, and then follow a quadratic decay over time. His Figure 3 illustrates the trend between number of weeks since album release and average implied mean album utility, showing how this decay closely matches the actual data. Einav 2004 discusses in more detail seasonality issues in the motion picture market, which Blackburn notes could apply just as well to the recorded music market.

Zentner 2003 takes a different approach to the question, instead using data from surveys of Europeans. To solve the simultaneity problem, he uses the speed of the internet connection and internet sophistication as instruments, arguing that faster connections should make it easier to download without being correlated to music tastes. But his second instrument seems like an odd choice. Zentner argues that a high level of internet sophistication is needed to download MP3s. Yet programs like Napster and KaZaA are extremely easy to use, and require a much lower level of internet experience than publishing a web page or even using email. Issues with using a survey methodology are discussed later. His conclusion using these instruments is that downloading reduces the probability of buying music by 30%, and without filesharing, sales in 2002 would have been around 7.8% higher.

Rob and Waldfogel 2004 focus their analysis on US college students, conducting surveys of expenditure and downloading habits. While they did ask the standard questions about broadband access, album collections, and various other demographics, they also had a more unique angle, asking respondents how much they value certain albums. These questions were intended to establish ex ante and ex post valuations, an important consideration when analyzing experience goods (discussed in a little more detail later). To control for the simultaneity problem, they use an individual`s broadband access as one instrumental variable, and the school the students attended as another. The schools that were included in this survey varied widely in broadband coverage, with University of Pennsylvania students leading the way.

One argument they make is that downloading can help reduce the deadweight loss associated. On average, their respondents downloaded music that they valued a half to a third less than music they purchased. This statistic implies that people download music they otherwise would not buy, an assumption that seems fairly straightforward. As such, for those people, those downloads reduce deadweight loss without any loss in sales.

However, there still is a decline in CD sales, and they calculate that an additional download reduces sales by 0.1 to 0.2 units. As such, for the individuals they surveyed, a conservative estimate was that downloading reduced their expenditure by 10%. Of course, as Rob and Waldfogel point out, these individuals are not representative of the population as a whole, and as such this result cannot be generalized.

Rob and Waldfogel note one problem with using broadband adoption or bandwidth as an instrumental variable. This information can be used as an instrument only if an individual`s choice of getting a fast connection is exogenous to whatever variable the study is trying to instrument for. For Rob and Waldfogel`s analysis, if people purchase high-speed internet access because of their interest in music for the purpose of downloading files, that would be a poor instrument to use. This problem applies to any analysis that uses internet access as an instrumental variable. In fact, as broadband companies advertise how their high-speed connections can be used to download files faster, using broadband access as an instrument seems less and less viable.

There has also been research supporting the opposing view that filesharing has either no effect on CD sales, or a positive effect. Oberholzer and Strumpf 2004 create a dataset from downloads by viewing traffic on a P2P network, both searches as well as specific files being transferred. In addition, they look at Nielsen SoundScan data for CD sales, getting genre-specific data.

Oberholzer and Strumpf then use a series of instruments for their regressions. They first choose album-specific instruments like album average and minimum track length. The justification is that as songs get longer, so too do their digital counterparts. Some have argued that this relationship does not necessarily hold, as quality also determines file size. I do not believe this argument is true, though, for a few reasons. First, while file size definitely does depend on the quality of an MP3, the quality of an MP3 does not depend on the actual song being ripped. In other words, there should be a proportionally equivalent distribution of quality for any given song. People choose a quality when they rip an album (that is, convert it from CD to MP3 format), and this choice is independent of the album or song itself.

Second, all major P2P networks return information about the bitrate of a song in addition to its existence whenever a search is conducted. Since a downloader knows the bitrate of the song, he also knows approximately how long it will take to download it. In general, most downloaders choose to download the same bitrate across all files (the most common is 128 kbit/s). Any possible variation can be attributed to either different valuations or availability. While some people might choose to download a higher quality song for archival purposes, those transfers will definitely be outliers. And the availability issue goes back to the distribution of files on the network—there should be no systematic different in quality between songs. Of course, the previous complaint about bandwidth as an instrument still holds here.

Oberholzer and Strumpf also use a few other instruments like German school holidays and network congestion. After using all of these instruments, they find that with their most conservative of estimates, more than 5000 downloads are needed to displace one album sale. But as Liebowitz 2005 points out, the instruments they use to control for an upward bias seem to cause their results to actually be even higher, not solving the simultaneity problem at all. Finding a better instrumental variable might improve the analysis.

Blackburn 2004 provides his own analysis of Oberholzer and Strumpf`s paper as well, concurring that there are no distinctions when taking albums as a whole, but not when differentiated by artist popularity. However, Oberholzer and Strumpf explicitly state that their instrumented regression actually predicts an increase in album sales for more popular albums, saying that in the top quartile of CD sales, 150 downloads increase sales by one copy.

Hong 2004 uses information from the Consumer Expenditure Survey along with survey data on downloads to find that approximately 20% of the total sales decline in CDs can be attributed to Napster. The bulk of that downloading activity came from households with children aged 6-17. This conclusion seems odd, though, given the findings of Rob and Waldfogel that college students also made up some of the decline. Given that the age of college students is 18 and over, the discrepancy seems out of place.

Cho 2004 uses downloads data from BigChampagne and sales data from Hits Magazine, applying a fixed effects model to find that for established artists there is no effect of downloads on CDs, and there is a positive effect for new artists. Unfortunately, since the paper is only privately available, I cannot discuss the models used in any more detail.

The United States is not the only country on which P2P economic research has focused. Tanaka 2004 looks at Winny, a file-sharing program extremely popular in Japan. He gathers two sources of data. First, the Winny protocol has a very unique property. The network measures the total number of bytes of a specific file that has been transferred (by comparing hash values). So it is possible to calculate the number of times a file has been downloaded by dividing the total amount transferred by the size of the file, both of which are public information. This property avoids an issue that is addressed later, that a third party does not have the capability to observe a network transfer. Tanaka takes the number of downloads and divides it by total sales to get a ratio, then compares that ratio with songs that never show up on the charts. He uses this ratio to come to the conclusion that downloads help drive CD sales, instead of hurting them, because of the distribution of the values.

Second, he also conducts a panel survey of over 500 students, asking about CD purchases and downloading activity. These are university students, connected to the Internet through high-speed access, most of which have computers. He finds that file-sharing has a positive effect on CD sales, by finding a positive (and significant) value for a dummy variable that indicates when a student started to use P2P software.

The major flaw with Tanaka`s study, as he himself points out, is the lack of a good instrumental variable. There is a simultaneity problem here, as popular songs will be downloaded more often and their CDs will be bought more often. Other researchers have controlled for this problem by using variables such as bandwidth, while Tanaka tries to omit non-downloaded albums and CD titles without previous sales in other regressions. In addition, there are the same problems with the survey methodology that will be addressed later.

Geist 2005 examines the same issue, but for the Canadian music industry. He cites an internal study (as reported by an October 2004 Economist article) commissioned by a major music label which finds that between two-thirds and three-fourths of the decline in CD sales had nothing to do with Internet music downloads. He also cites a few other reasons (many of which are discussed here in more depth) for the decline in CD sales.

First, DVD sales increased from C$0 in 1999 to over C$170 million in the period 2000-2004. Geist argues that not only are DVDs substitutes for CDs in the entertainment market, but also that the increase in DVD popularity caused CD shelf space to decrease in retail stores across Canada, also contributing to declining CD sales. Another contributing factor to declining shelf space is the increasing dominance of big retail chains in CD sales. In Canada, Wal-Mart and Costco combined account for 25% of the music retail marketplace, while in the US, Wal-Mart, Target, and BestBuy sell over 50% of all CDs. These large retail chains generally stock only new CDs as they come out, which results in a decreasing availability of older titles. As Geist points out, for an industry that traditionally depends on catalog sales for 40% of their revenue, this decline in shelf space can have an adverse effect.

Geist uses statistics and research provided by both the music industry and the government to come to the conclusion that P2P downloading has only a slight effect on CD sales, and that any lost sales is more than made up for by a levy that the Copyright Board of Canada has established on blank recordable media and equipment like MP3 players.

Other research in the area has focused on the nature of digital media, providing a more theoretical framework for this topic. Discussion about digital music as an experience good, and therefore a quasi-public good, is found in Gopal et al. (forthcoming). While they also conduct their own surveys of students, the most interesting part of their paper is the economic model they build around sampling and its effects on purchases. They come to the fairly intuitive conclusion that lower sampling costs have a positive effect on the consumer surplus of samplers, but the effect of sales depends on the true intrinsic value of the music item.

IV. The Nature of Digital Media

Digital media has many properties that differentiate it from physical items. Central to any argument about digital media is bandwidth, the capability of a computer to download or upload a file. Traditional modem connections download at 4 or 5 Kilobytes a second, broadband connections vary between 50 and 200 Kilobytes a second, and university or corporate connections can be even higher. For a sense of size, a typical MP3 file is about four megabytes, which means it can be downloaded in one or two minutes on a fast connection. As broadband connections get more and more prevalent bandwidth becomes less and less of an issue.

Bandwidth by definition is time-independent. Using bandwidth one second has no impact on its speed in the next second. Since most plans in the United States are monthly and unlimited, there is no marginal cost to using bandwidth except that anything else a consumer is doing at the exact same time will be slower. I therefore make the assumption that bandwidth does not play a part in a consumer`s decision to download a song, only the consumer`s preferences. After all, when songs range from 4 MB to 10 MB, the time to download each song ranges from 1 to 3 minutes, a more or less trivial amount of time.

In fact, it is the nature of digital goods to be non-rivalrous. One person`s consumption (downloading) of a digital file only affects other people`s consumption if the person uploading the file has saturated his upstream. When there are multiple sources for a file, as is the case of most popular music, this effect is negligible. It does not use up the good in a traditional sense, since a file can be copied infinitely without preventing its consumption by anyone else. Shirky 2000 argues that the tragedy of the commons does not apply here, and presents a few other relevant points about digital versus physical goods.

There is another interesting difference between experience goods and physical goods. In the physical world, alternate means of transport are substitutes, competing with each other. Transporting goods by trains prevents anyone from transporting them via trucks. But with music delivery, alternate distribution media are complements. People often listen to music on the radio, then buy it on CD. Indeed, the concept of payola (asking DJs to play songs in exchange for cash or bribes) is built around the belief that the more people hear a song, the more likely they will buy the CD. Katunich 2002 discusses payola in more detail, talking about its history and eventual outlawing in 1960. Even if payola has been banned, it still continues in other forms. See both Katunich and Boehlert 2001 for a discussion of how record companies currently exploit legal loopholes by employing middlemen to pay radio DJs in return for airtime.

More generally, hearing something on one medium prompts one to experience it on other media as well. The sampling hypothesis argues that people download music, and then go out and buy the corresponding CDs. Liebowitz 2004 disagrees with the sampling argument. He first compares a CD to a candy bar, following the same model that Hirshleifer 1971 uses for light bulbs, and then argues that an equivalent hypothesis is that sampling reduces the risk associated with the purchase of a CD, thus allowing higher expected utility with each CD. His conclusion is that while the utility is higher, the satiation point remains the same, so consumption can actually decrease.

Yet while candy bars may provide the same satiation point regardless of size, CDs do not necessarily have a fixed satiation threshold. Good music is good music to the listener regardless of quantity, and it is rare to find consumers who would balk at listening to good music, claiming that they have had enough. Again, music is an experience good, not a physical one.

In fact, the sampling hypothesis does not necessarily only apply to reducing the risk of purchase of a specific good (in this case, a CD). Many have made the argument that sampling allows people to find artists that they never would have otherwise. If the marginal cost of sampling is more or less negligible, that reduces the opportunity cost of finding new artists, not just of getting a specific album.

Blackburn 2004 makes a similar argument, saying that consumers are not aware of all the albums that can purchase. People can download songs and share them with other people, or play them in the presence of other people. Both actions help educate people who otherwise would never consider buying that album. Blackburn points out that this is very similar to a network effect, except instead of raising the valuation for an individual consumer, it raises the valuation of the average consumer, since increasing the number of listeners of a song increases the share of consumers who are aware of it.

With an increase of awareness, there might also be an increase in the number of niche markets, because the entire internet is now both the supply and demand. People can now download songs from lesser known artists (and as Blackburn finds, actually do), with very little cost. As such, they are much more willing to take a risk on artists they have never heard of since they no longer have to spend $15 on an album. So the market could be moving into one of specialization, where all consumers can fall into clearly defined niches. Consumers are no longer limited to just hearing top-40 hits that the radio stations in their area choose to play, but can now find songs across the nation, and even internationally.

In fact, at the extreme, niches are broken down to the level of individuals. There is no longer a need for certain genres, because each individual has access to whatever music he or she wants, at any time. Instead, users can custom-tailor their playlists to their liking. The only limiting factor is the time a consumer needs to listen to a song and see if he or she likes it.

That limitation might soon be gone, as various services have sprung up to help users find songs that might be to their liking. Some sites are passive like Audioscrobbler, which installs a plug-in into a person`s media player of choice, like Winamp or iTunes, and then reports all played files to a global server. This information is centrally located, and anyone can do a search to see what artists are commonly found together in people`s playlists.

Last.Fm, a related service, allows a little more personalization in terms of information. Users can actively skip or approve songs for their playlists, which increases the amount of information the website can use in making recommendations. Finally, UpTo11.net actively scans file collections on various P2P networks to summarize information for recommendations. All these services provide the same basic idea—helping users discover new artists they otherwise would not find.

Peitz and Waelbroeck 2004 argue the sampling issue from another angle, showing that sampling can help the music industry in two ways. First, it can directly increase music sales, as stated above, which the authors view as a matching problem. Sampling can lead to a better match between a product and a buyer, which increases the buyer`s willingness to pay. This argument is more or less the same as the one stated above.

Their second argument combines both the sampling hypothesis and the nature of digital music. Sampling provides an alternative channel of information, which is equivalent to marketing and promotion costs for music labels. So not only do downloads increase the exposure of an artist, they also save the label money by providing a form of advertising. In this situation, even if total revenues decrease, costs could decrease even more, leading to increased profits for the music industry.

One other factor that might help the sampling hypothesis is the quality issue. Most downloads are of lower quality than their physical counterparts; that is, since an MP3 is created using lossy compression, it usually is of lower quality than the original CD audio or WAV file from which it was created. In fact, there are a few differences between CDs and downloads.

One set of differences revolve around the physical nature of CDs. Buying CDs requires shipping and handling costs compared to the bandwidth costs of downloads. The downloads market is easier to access since it is not limited by geographic region, while a CD can only be shipped from certain distribution places. This need for shipping means consumers have a limited selection of CDs they can purchase, even with the advent of the Internet. Any CD can be made into MP3s, which means the downloads market is a superset of the CD market.

The quality issue addresses another set of differences in multiple ways. Since MP3 (and most other popular formats, including the open source Ogg Vorbis, Microsoft`s WMA, and Apple`s AAC) is a lossy compression, its quality depends largely on the choices of the person who does the actual encoding, from the bitrate to the type of encoder used. Bitrates can vary between 32 and 320 kbit/s, although the most common choices for constant-bit rate (CBR) are 128 and 192, and variable bit-rate (VBR) is becoming more and more popular. In addition, there are multiple MP3 encoders, like Blade, Xing, Fraunhofer, and LAME. With all these possible combinations, the file that someone grabs online might not be the quality that person wants.

In addition, the RIAA and other industry groups have started adding spoofed files to the networks, which are named to look like popular hits but only consist of snippets from the song, or static. Companies like MediaDefender help distribute these files, hoping to overwhelm the network with fake files, eventually driving away any potential downloaders. In essence, they are raising the search costs and hoping that will turn potential customers away from P2P networks.

For now, the best guaranteed level of quality a consumer can get is by buying a CD. In addition to the music being higher quality, CDs also come with various other extras that provide additional value for the consumer, like biographies, track lists, lyrics, and oftentimes special promotions. Research has often used this conclusion to model downloads as of lower quality and value to consumers, and the sampling hypothesis holds that people will buy CDs after sampling the music for cheaper.

But as always, technology provides its own solutions to these problems. Programs like KaZaA often provide information about the bitrate of files, as well as providing a rating system to allow users to mark the integrity of certain files as good or not. The more users contribute, the harder it becomes to add spoofed files to the network, as they get rated poor quality. As more and more people learn about different encoding formats, labeling of correct albums will start becoming a priority.

And as technology advances, lossy compression will be replaced with lossless compression, which means digital files will have the exact same quality as their original CD counterparts. Various lossless encoders are already available like Monkey`s Audio and FLAC, which allow perfect compression and decompression, similar to ZIP and RAR files. The only drawback is the file size—most lossless encoders achieve a little less than 50% compression (the encoded file is a little under half the original file size), while MP3 can get more than 75%, depending on what settings are used. When people want to use their bandwidth and disk space to store these higher quality files, they will switch over.

Nor is the packaging for a download necessarily worse than that of a CD. With a download often comes links many different places on the web that have either biographical information, song lyrics, or other information that can be relevant to the download. MP3s often come with ID3 tags that provide information for the listener, or are linked from webpages that provide additional value, more than a physical CD could contain.

An analog to radio play versus downloads is comparing streaming versus ownership. A song played on the radio is a stream while a download is, in some sense, a permanent fixture. In this case, though, the download seems like a clearly better good. Not only are downloads usually of higher quality than both radio and online streams, they also provide a time-shifting capability. That is, the consumer can choose to listen to the song at any time, instead of waiting for a song to come on randomly. VCRs and TiVo are comparisons in the television market. The only disadvantage of a download is the requirement for more bandwidth, but the cost is amortized over the number of streams a consumer would otherwise have.

But the theory does not explain what happens in practice. Why is radio still popular if people can download all their hits? It cannot just be the search costs associated with finding a source to download a file from. Perhaps it is the randomness factor, hearing favorite songs out of the blue and receiving some utility from validation of a consumer`s musical tastes. After all, if it is being played on the radio, multiple people must desire to hear it. Or it might also be the informational factor mentioned above, like when DJs provide biographical information for artists.

The legal online market has also been segmented into two similar, although not perfectly analogous, camps. One group is led by Apple`s iTunes, which sells digital downloads on a per song basis. Most songs are priced at ninety-nine cents, and once bought, can be replayed at any time through iTunes or on an iPod. Apple currently uses a form of digital rights management (DRM) called Fairplay that prevents other MP3 players (both software and hardware) from playing any songs downloaded from iTunes. The other is occupied by companies like Rhapsody and to a certain extent the new Napster, that allow all the music they have to be listened to as long as the customer remains subscribed to the service. These companies generally also allow customers to buy tracks for a fixed amount of money, taking a cue from iTunes.

The first group is equivalent to downloads, in that once a purchase is made, it is owned forever and can be listened to at any time. The second is more similar to streaming, both in terms of technology and access. While most of the companies in the second camp provide the same time-shifting capability that downloads do, they also require an active Internet connection in order to listen to anything, instead of a one-time download cost.

Some have made the argument that all P2P music is effectively streamed. That is, people download songs, listen to them a few times, and then promptly forget that they ever owned them. This behavior could be typical of the hit-mainstream category, in that people go and download the newest Britney Spears song, and then listen to it only until the next hit comes along. If that is true, then the flow of shared files would be much more important than the stock. The number of files shared in any particular week would have no relevance except to help measure the change since the previous week.

Sariou et al. peg the average number of shared files for a client at around 30 or so in the Gnutella and old Napster networks. This value could corroborate the above theory, as we would expect many more stored files if people were hoarding them. For a more recent set of numbers, a quick search on KaZaA shows 2,540,216 users, 536,897,935 files, and 33,590,272 GBs worth of files shared. Those numbers translate to 211 files/user, and 64 MB/file. These numbers are by no means accurate, as the number of users includes people who are sharing no files, and the files shared are not just music files, but also movies, programs, and other digital files. They are provided merely as a reference point.

If this behavior accurately reflects how users view downloaded files, then an additional factor is whether or not they choose to delete the files. If only 30 files are shared, it seems much more likely that people are choosing to delete files. But given that the cost of hard drive storage space is rapidly decreasing (prices have dropped to less than $1/GB, and a GB can store well over 200 MP3s) and most computers these days come with hard drives of at least 60GBs, there would seem to be no direct incentive to delete these files other than to preserve bandwidth or perhaps avoid legal liability. And as mentioned before, with broadband adoption getting higher as time passes on, the first cost will become almost negligible as well.

Other topics that have not been covered in depth in previous literature involve looking at the effects of digital rights management (DRM) on CD sales. Some albums are copy-protected, playable only in CD players and not computers or some car stereos. Is the negative impact of reduced portability outweighed by the expected increase of sales as digital files are harder to come by?

But my thesis chooses to look at these issues in a new light. Instead of focusing on the effects of P2P downloads on CD sales, I decide to look at radio spins instead, analyzing a few separate issues that have not yet been covered in the literature. First, to my knowledge, there has not been any analysis of the relationship between radio and music downloads. The underlying theory can hold in two directions. One way is that a consumer hears a song on the radio and likes it, and as a result, downloads the song. This could be analogous to going out to buy the CD—in other words, radio is a kind of information medium, exposing people to music and getting them interested in it. The causation could flow both ways, though. People could download a song, and then request it on the radio after being exposed to it. The sampling argument makes a similar point, only with CD sales as opposed to radio requests. Finding a relationship between P2P downloads and radio spins could prove invaluable for both P2P networks and radio programming.

Second, this thesis also looks at the nature of a hit in the digital market. How does a song become popular online? Is the growth model for popular songs different from that of unpopular songs? How long do songs stay popular? With the inconclusive evidence of P2P effects on CD sales, knowing how digital hits are made could help add another piece of information to the analysis.

The issue of hits in entertainment media is not a new one. Blackburn and Einav both address hits in the recorded music and movie industry, respectively. However, the nature of hits in the digital market is less established. While they may follow similar trends to other media, a more empirical analysis is needed. This thesis tries to provide that missing piece.

V. Data

The data for my thesis comes from BigChampagne.com, a web site that tracks peer-to-peer (P2P) traffic over a variety of protocols. These protocols include all FastTrack clients (KaZaA, KaZaA Lite, Grokster) and all Gnutella clients (Grokster, iMesh, Limewire, Bearshare, Morpheus, etc.). It is here that I make my first simplifying technical assumption.

While BigChampagne tracks a variable called downloads in its data, it is technically infeasible to track the number of downloads in a network without controlling it entirely. Given that BigChampagne is drawing its information from various P2P networks that other people control, they must be doing something else. Since they use a proprietary formula to calculate downloads as well as their own Index variable, I have to make a few assumptions here. An example best illustrates this. Say that user A transfers a file to user B. On most networks, a third-party C cannot see that transfer occur, since a direct connection is established between A and B. What C can see is what files are shared on the network. As such, what BigChampagne is able to track is some proportion of the number of files shared on the network at any given point in time.

However, even that gives an incomplete picture, because a user can only see some portion of the network at any given time. Gnutella, for example, is by default configured only to show people connected a fixed number of levels away from a specific client. Even without this limitation, there is only so far someone can go in the network before hitting network latency and other issues outside of their control. So here I assume that the proportion of files viewed by any client accurately reflects the general trends in the network at large. In addition, BigChampagne must employ multiple connections to try to ensure as broad coverage as possible, which reduces how strong that assumption needs to be. With a larger sample size, the true population is more accurately observed.

Number of files shared alone does not provide the full picture. While any download requires a source, so there is a correlation between number of files shared and number of downloads, there are still other issues. Consider a file that has 1000 people sharing it. If the next week, there are again the same 1000 people sharing it, that file is not being traded often or at all. So the change between weeks is another important factor in considering downloads. This problem is addressed in a little more detail below, when BigChampagne`s Index variable is discussed.

In addition, I have to assume a strong correlation exists between shared files and downloads that holds true across all genres and markets. If it turns out that people who download classical music are more likely to share their music than people who download alternative music (possibly because they are less technologically savvy, and clients like KaZaA have sharing turned on by default), that is another factor I have to take into account. Or perhaps younger age groups want to publicize their tastes with other people, so they are more likely to share files. I avoid this problem by focusing only on the hit-mainstream market, but future studies should be aware of it.

BigChampagne breaks down its data by genre and by geographic market, also providing radio playlists through Mediabase for every radio station in the country. A playlist is defined as the list of songs a radio station played, along with the number of spins for each song. As a caveat, not all market/genre combinations exist, so while the coverage is comprehensive, it is not complete.

Here is a listing of the relevant variables in BigChampagne`s spreadsheets.

1. Artist-Artist name
2. Title-Title of song
3. Index TW-Current week`s online popularity out of a 100 point scale
4. Cume Aud-The number of people who are sharing this song
5. Spins TW-Number of times this week that the song has been played

Most of the columns are self-explanatory, with a few exceptions like the Index variables. This index is calculated not just by how many people are currently sharing the song, but also by how much it is being transferred. As such, a song that is present on many different people`s computers but no longer sought after will have a lower ranking.

In addition, trends over time are another factor to take into consideration. As stated above, the same number of people sharing a file over time is an indication that the song is not actively being traded. The same applies to searches—if there is a decrease in searches for a particular song, that song probably is not as popular in comparison to other songs on the network.

After all these factors are added up, BigChampagne applies a proprietary formula to get a rating. The ratings are normalized, with 100 being the highest score, and all other songs a fraction thereof. While it may not be ideal, this methodology does seem like a good one for tracking the popularity of a song.

I use another simplifying assumption here. While BigChampagne does provide playlists for all stations in a given market, their download ranking charts take the number of spins (and subsequently spins ranking) from specific stations that they classify in the same genre. In other words, taking Los Angeles as an example, while KYSR, KOST, KBIG, and KIIS probably all will play songs that can be found on the hit-mainstream genre (for example U2`s Vertigo and John Mayer`s Daughters), only the spins from KIIS will count towards the number of spins when I look at a particular song since KIIS is classified as the hit-mainstream station in Los Angeles.

I make this assumption for a few reasons. First, this simplifies data analysis. Second, I assume that the proportion of songs played will be the same, and since I use number of weeks in my data points, one station (the main station in the genre in that geographic market) is representative of the general trend. The number of spins should have the same general pattern, with the same growth to the peak value. And if one of my measurements is the growth to the peak for radio spins, looking at one station instead of summing up all their spins should give the same data. Finally, the people who download songs in the hit-mainstream genre probably also listen to the main radio station in that genre in that market. Any flows of causation that involve that genre will primarily involve those people. In other words, someone looking at the hit-mainstream market for downloads probably wants to find people who listen to hit-mainstream music, not classical or country music.

BigChampagne makes a similar assumption when presenting its data on downloads, as seen in the Spins TW variable, which would only be the spins on KIIS, if I again look at Los Angeles. In the geographic markets where multiple radio stations are classified in the same genre, BigChampagne takes the sum total of all those radio stations to get the total number of spins for that week, so that is what I use to calculate any variables based on spins.

The biggest advantage of having access to BigChampagne`s data is that all this data is behavioral, not survey-based. As such, it tells me what people are doing, not what they say they are doing. The problems with surveys are numerous, and as Liebowitz 2004 and Oberholzer and Strumpf 2004 point out, they are only as reliable as their respondents. When faced with a topic as controversial as file-sharing, and in light of the recent RIAA lawsuits, they can be quite inaccurate, as people may have incentives to be untruthful. Most surveys tend to ignore consumer behavior in the absence of downloading. Some conclude that people who download are less willing to buy, ignoring the fact that they might have a lower tendency to buy albums in the first place. There is also a bias in terms of self-selection, in that people who agree to take these surveys or have their Internet habits monitored are not always representative of the population as a whole.

Instead, BigChampagne can see exactly what files people are choosing to share by doing searches for specific files in the network. The number of results returned indicates how many copes of that file are being shared. In addition, BigChampagne also looks at what searches are being conducted. In fact, that is the methodology that Oberholzer and Strumpf employed, looking at what searches were being conducted on an OpenNap network to determine popularity. As a side note, the OpenNap protocol displays what files are being transferred, which avoids the problem discussed above.

VI. Model

Before getting into detail on my model, a little history is needed. I follow the analysis of Griliches 1957 in determining how to build a growth model. In his thesis on hybrid corn, Griliches analyzes the rate of adoption in various districts in the US. He defines three dependent variables in his model: origins, growth rates (or slopes, as he names it), and ceilings. Origin is defined as the date at which an area began to plant ten percent of its ceiling acreage with hybrid seed. Ceilings are the long-run equilibrium percentages of the corn acreage which will be planted to hybrid seed. Growth rate is defined as a value that calculates how fast hybrid seed was adopted between the origin and ceiling. Zvi`s example for growth rate is that a value of 1.0 implies a sequence (12 -> 27 -> 50 ->73 -> 88), or that the distance from 12 to 88 per cent was covered in 4 years. A value of 0.5 implies (12, 18, 27, 38, 50, 62, 73, 82, 88), or a growth rate that was twice as slow, finishing in 8 years.

These three variables determine a logistic growth function, the typical S-curve of technological adoption. Griliches used each of those three variables in a separate regression for the dependent variables, and found other data that he deemed interesting for the independent variables. These independent variables included districts, various region and state (larger geographical area) breakdowns, and other measurements of productivity like acres of corn per farmer and average yield.

He justified his choice of the logistic in his footnote 6:

It may be worthwhile to indicate why it is reasonable that the development should have followed an S-shaped growth curve. The dependent variable can vary only between 0 and 100 per cent. If we consider the development to be an adjustment process, the simplest reasonable time-path between 0 and 100 per cent is an ogive [any continuous cumulative frequency curve (MathWorld)]. While the supply of seed can increase exponentially, the market for seed is limited by the total amount of corn planted, and that will act as a damping factor. Also, if we interpret the behavior of farmers in the face of this new, uncertain development as if they were engaged in sequential decision making, the ASN curve will be bell-shaped, and the cumulative will again be S-shaped. (503)

For my purposes, the regression will look a little different. First of all, each song will probably have a bell-curve distribution, rather than logistic. Instead of the long-run equilibrium of adoption that Griliches defined as a ceiling, there is a clear growth and decay pattern to each song. The logistic model could be applied if I took percentages of the peak number of downloads, as opposed to absolute numbers, but as Blackburn 2004 points out, there is a clear difference between popular and unpopular artists, and losing that information can introduce an unfair bias into the data. In addition, these percentages could only be applied to the growth pattern, and a separate regression would have to be run for the decay portion. Another problem is that determining an appropriate growth rate can be very difficult, as the parameters to the logistic growth function can change the actual values drastically and defining what percentages constitute the correct value for introduction and saturation can be difficult.

In addition, Griliches was looking at a rate of technological diffusion for a specific technology. The logistic curve has been proposed by many economists to model technological adoption, which is different from popularity. Songs do not always follow a traditional distribution as they sometimes have no normal growth pattern to the peak and have very noisy data along the way. Instead of looking at one type of technology adoption, it makes more sense to say that each song has its own model. Some songs have a nice bell-curve, while some peak multiple times, often far apart in weeks. Other songs drop off the charts completely, only to resurface and hit their peak weeks later.

Since curves cannot be neatly fitted to the data I have, I chose instead to measure the number of weeks from introduction of the song to when it peaks in downloads. Using this variable as a measurement of growth avoids all the problems of curve-fitting without losing the essence of what I am trying to find out, the growth rate of songs. By preserving the value of the peak downloads for each field, I also do not have to rely on just percentages and relative growth for my data.

VII. Variables

The dependent variables I chose all measured some aspect of digital song growth, both within the digital market and also in relation to radio spins.

Y1 is the number of weeks between the introduction of a song and when it first hits its peak in downloads. The introduction of a song is set as the first week it ever shows up on any of the download networks, and makes a large enough impact to show up on BigChampagne`s charts. Since BigChampagne`s coverage is fairly thorough, the first time it shows up on the networks probably is the first time it shows up online in general. The range of Y1 is from 0 to 54, the span of my data set. A value of 0 would indicate that it peaked in the week it was introduced. This variable is the basic growth rate measurement. The larger it is, the longer it took for a song to get popular.

Y2 is the number of weeks a song spent above 80% of its peak in downloads. For example, if a song peaked at 1000 downloads, this variable measures the number of weeks it spent above 800 downloads. This variable is constructed by finding the week in which it breaks 80% going down, and subtracting the week in which it breaks 80% going up. While the values can break that 80% of max value multiple times, I choose the ones closest to the peak. As such, it is always an integer value greater than or equal to 1. While the description can be confusing, I provide an example later that hopefully clarifies how it is calculated. This variable provides a measurement of how long a song maintains its popularity, as well as smoothing out some of the small fluctuations at the peak.

Y3 measures the ratio of spins in the week the song breaks 50% of its peak in downloads to peak radio spins. Again, assuming a song peaked at 1000 downloads, if in the week in which it breaks 500 downloads it had 10 spins, and the max spins it ever achieved was 30, the value would be 1/3. As a ratio, it varies between 0 and 1, inclusive. This variable compares the growth rate in the digital market to the radio market. A song that takes off more quickly in the internet world than the radio world should see a smaller ratio, less than 0.5.

Y4 measures the difference between the peak of the song in the digital market, and the peak of the song in the radio market. I calculate it by finding the week in which radio spins hits the peak, and subtracting the week in which downloads hit their peak. A positive value means that radio spins peaked after downloads peaked, while a negative value means downloads peaked first. This variable should provide basic lead/lag analysis between radio spins and downloads.

While the downloads peak is important, I also chose to capture the same information for the index peak. The downloads peak involves a calculation that measures the actual number of files transferred, through a formula of BigChampagne`s. This variable therefore provides some form of absolute scale in terms of a song`s popularity. However, as Blackburn points out, the relative ranking of a song is important as well. So I created four more Y variables (Y5-Y8) that measure the exact same numbers as above, but substituting peak in terms of index instead of peak in terms of downloads. Since index is calculated based on all the other songs in that week, it provides a relative measure of popularity.

With these dependent variables in mind, I set out to create a series of independent variables that were interesting to me, and relevant to my analysis.

I first created a set of dummy variables for genres. These genres are rap/hip-hop, R&B, pop, alternative/rock, world, country, and techno. Classification of songs into genre often involves a subjective analysis, though research in artificial intelligence using various methods including neural networks to analyze wav files has shown promising signs. In addition, many community sites like freedb are building music databases that provide metadata for CDs and downloads.

In order to remove some subjectivity from the classification, I used Rolling Stone`s (a respected magazine focused on the music industry) website to do searches for specific artists. Under their biography for each artist, they have classified artists into a few separate genres, and I chose the one that seemed most appropriate.

Genre was interesting to me because it provided a source of external variation that might see different growth patterns for different audiences. While I limited the analysis of my dataset to only hit-mainstream, there is still a lot of genre variation within that classification, which includes artists from Yellowcard (alternative rock) to Vanessa Carlton (pop).

Next I generated a set of dummy variables for the cities I tracked. When generating my data points for a song/artist/city combination, I tried to choose the same cities for each artist, so that there would be some cross-city variation along with cross-artist variation. The sampling included two large cities, one medium city, and two small cities. Unfortunately, since every song did not show up in every market, the same five cities were not chosen for each song, though I tried to have as much overlap as possible. The city dummies are used to control for possible exogenous variation between cities. For example, Philadelphia is considering a proposal to set up wireless connections across the entire city, which would affect downloads. In addition, demographics have a definite effect on wealth, type of preferred music, and other factors that relate to the dependent variables I have defined.

The final set of dummies was for each song, since I wanted to make sure that any across song variation would be taken care of.

In addition to the dummies, I created a few other dependent variables. X1 and X2 are max index and max index^2, respectively. These values are used to determine the popularity of a song, which Blackburn argues to be crucial to determining growth rates. Any song that has a max index of 100 has been the most popular song across the genre in a week at any time. I use both the regular and squared values since the distribution of songs in a week often does not follow a linear pattern. The difference in popularity between the highest song and the second-highest song is usually much larger than the difference in popularity between the lowest and second-lowest song.

X3 is max radio spins, the most spins in any given week that a song had in the specified geographic market. This variable provides another way to measure the link between downloads and radio spins, by providing an absolute value on the radio side.

X4 and X5 are the number of rounds of RIAA lawsuits and its square. This value was calculated by finding the number of lawsuits that had been filed as of the week of introduction for a song. I use both the variable and its square because I argue that the shock value of the lawsuits declines as time goes on, which would imply a negative coefficient on the squared variable. The first lawsuit probably had the largest impact, and subsequent lawsuits made less and less of an impression on the public. In fact, when researching the history of the lawsuits, it was difficult to find any articles written about the later lawsuits, since they had become so common.

I chose to use the number of rounds instead of the actual number of lawsuits because I think people pay more attention to the rounds of lawsuits filed so far, which is generally what the press covers, rather than the total number of lawsuits. In addition, the number of lawsuits per round does not show much variation after the first few rounds were filed.

X6 is a dummy variable that is on if the song had no radio play in that market, and off otherwise. By getting a sample of songs that grew popular solely through the internet, I hope to control for any radio influence on the dependent variables I defined. I was inspired by DJ Danger Mouse, an artist who grew popular primarily through the internet. While he eventually was picked up by the radio, the bulk of his advertising and marketing was through the internet. Since it was too difficult and subjective to identify all songs that were similar (and the sample size was fairly small), I chose to use this dummy variable instead.

With these variables defined, there are a few issues involved in generating the data. First, there is a window of analysis problem involved here, as I do have data for songs that either peaked before I started collecting data, or very near the beginning. Since 50% and 80% of the peak are values that I need for my analysis, if those values do not exist, I throw the sheet out entirely. In other words, if the peak downloads is 10000 in week 2, but week 1 had 6000, then I do not use that in my data calculations because the value never dropped below 50% of 10000. Figure 2 shows the cutoff line as a red horizontal line at 500. If that song begins above the cutoff line, then the observation is discarded. The same holds for the 80% calculation. If the peak is 10000 in week 54, but the value in week 55 is 8000, I also throw out the sheet because the value never dropped below 80% of 10000 on the descending side. Here is an example calculation. If the values are truncated and do not exist between the two black vertical lines I have drawn at week 14 and 18, the observation is discarded.

In addition, using 50% of the peak overestimates the actual date where 50% is broken, for two reasons related to the scope of my measurement. First, BigChampagne provides data on a weekly basis, which decreases the level of granularity. If the 50% happened on a Tuesday, it is only reflected for the entire week. Second, in a related problem, I choose the week in which the values break 50% of the peak as the week in which I measure the spins to peak spins ratio. There is another granularity problem here. Again assume the peak is 10000. If I see 4500, and then 6000, I still have to take the week in which it hits 6000 as the 50% week. As such, I cannot reliably extrapolate between the two points since I do not know the functional form, and I have to overestimate the date. A similar problem holds for the 80% measurements, but it is mitigated by the fact that the growth side competes with the decline side, hopefully canceling each other out. Empirically, this effect is most obvious when the 50% week corresponds to the peak week, because the week before the peak week is less than 50% of the actual peak value. That is, assume the peak is 10000, and the week before is 4500. While I would calculate the peak week and 50% week to be the same, this equality obviously should not hold, especially for what I am trying to show.

There are also songs that peak, drop out completely for a week, and then show up again after that week. For the purposes of my analysis, I assume that if the song does not show up in a week at all, both the index and number of downloads is 0. So if the peak is 10000 in week 15, the song drops off the charts in the next week (week 16), then comes back at 9000 in week 17, I measure week 16 as the week it drops off on the right side of the 80% measurement. To illustrate the nuances involved in analyzing the data, I have Table 2, which includes all the data points from the graph above.

In this sample calculation, the week of introduction is 10, the week of downloads peak is 15, and the week of radio spins peak is 16. Therefore, Y1 (the number of weeks between the introduction of a song and its peak) is 15 10 = 5. The song breaks 800 (which is 80% of the peak value) in week 14 going up, and breaks 800 in week 18 going down. Therefore, Y2 (number of weeks spent above 80% of peak downloads) is 18 14 = 4. The song breaks 500 (50% of the peak value) in week 12, which means that Y3 = 3 / 7 (the number of radio spins in week 12 / the peak number of radio spins in any week). Y4 (number of weeks between peak in downloads and peak in radio spins) is 16 15 = 1. Again, positive numbers mean that downloads peaked before radio spins did, and negative means the opposite.

If we were to change week 16 to have 0 downloads, all the above variables would remain the same except Y2. Week 16 would be the week in which the song broke 800 (80% of its peak) going down, so the new value for Y2 would be 16 14 = 2. Or if the song started in week 1 instead of 10, and with a value of 800, I would throw the observation out because it started above 500 (50% of its peak).

With these variables chosen, I analyzed each artist-song combination in a specific market, for example Britney Spears` Toxic in Los Angeles. I first generated the artist and songs by taking all the artists and songs that showed up in weeks 1, 30, and 55. I felt that since there is obviously a huge amount of redundancy between weeks (an artist popular in one week will almost surely show up in the next week), weeks 1, 30, and 55 would have a proper coverage of the data that I had collected, without having to check every week.

After selecting these artists and songs, I searched across all 55 weeks of my data to generate every instance where that artist/song combination showed up in a specific market. For a more technical description of how I collected this information, please see Appendix A. I then chose to condense each of those searches to one data point.

Empirically, the problems with the data I noted above tend to show up more in the data for downloads, rather than for spins. One possible explanation is the way this data is gathered. Not only have radio numbers been reported much longer than downloads, but they also are clear in their definitions (a spin is a spin), not relying on a formula to calculate from multiple variables a number. As such, the methodology is more developed. Another is radio consolidation. Dicola and Thomson 2002 argue that ten parent companies control two-thirds of both revenues and listeners nationwide, with Clear Channel and Viacom controlling 42 percent of listeners and 45 percent of industry revenues. This kind of consolidation leads to more controlled growth, as a handful of people can influence the spins for particular songs, so a clearer peak and less noisy data are seen. Considering the other effects are beyond the scope of this paper; please see Dicola and Thomson for a more detailed discussion.

Table 3 has some of the summary statistics for the variables I defined. The values reported are averaged across songs, not across observations. Since not every song was evenly represented in the sample, I had to weight all the observations by the number of times that song was present in the sample. So to find the mean, for example, I first took the mean within a song, and then took the mean of all the means across songs. The rest of the values were similarly calculated for all 101 songs in the sample.

The * indicates a special calculation for these variables. There is a truncation problem here, as my dataset includes songs that were not just introduced after week 1. Since I have songs that were present prior to week 1, it is not accurate to just find the mean across all observations. Instead, I chose to calculate the mean by throwing out all observations in which the song was introduced in week 1, because it is impossible to tell whether it is because the song actually debuted in week 1 or it is a limitation of my data set.

Again, one has to keep in mind that these statistics are generated treating all songs evenly. The reason why min and max sometimes have decimal values is that the observations were averaged by songs. So for example if a song had the values of 1, 1, 1, 2, and 2 for the week of introduction, the average would be 1.4.

There are a few values in the above table worth noting. First, while there is a large variation, the number of weeks between peak in downloads/index and peak in radio spins is negative, which means that songs in general peak in the radio market before the downloads market. Also, the 80% peak variable is higher for downloads than index, which means that downloads seem to change less quickly than index, with more staying power between weeks.

One variable that does not provide much information is the ratio of spins to peak spins variable. I initially defined this variable so that I could see how quickly spins took off in relation to downloads, whether the growth rate was faster. Unfortunately, if the song peaked in the radio market before the downloads market, the ratio does not measure the growth rate. The problem is that this variable assumes the peaks happen at the same time—if they happen at different times, it is not clear what the definition is. In addition, the granularity issues above are also difficult to overcome, so this variable was not included in my analysis.

I did generate a few summary statistics for those observations in which the peaks happen in the same week. There were 40 observations in which the week of the index peak is equal to the week of the radio spins peak, and 30 for the downloads peak. The average of the ratio of spins to peak spins for the 40 observations with the same index peak is .728, with a standard deviation of .312. For the downloads peak the average is .693 with a standard deviation of .340. These results seem to indicate a tentative conclusion that radio spins grow more quickly than downloads (if they were equal you would expect the ratio to be 0.5), but the small sample size and granularity issues caution against a definitive answer.

The entire data set contains 589 observations. Not all fields are complete, however. Some of the observations never had a radio spin at all, and so the value for Y4 and Y8 (# weeks between index/download peak and radio spins peak) was coded as a missing value there. The related variables were affected the same way.

To take care of the truncation problem mentioned above, I ran a truncated regression on Y1 (the number of weeks between the introduction of a song and its peak in downloads) with all the X variables defined above. The upper limit for the truncated regression was the week of the peak in downloads 1, since Y1 can never go above that value. The 1 part is to account for the fact that the week number starts at 1, not 0.

This first regression ended up with a constant factor of over 56, with the most significant variables being the rounds of RIAA lawsuits. But the economic interpretation of those variables is difficult, since the measurements are all relative ones. That is, an additional round of lawsuits reducing the number of weeks it takes a song to peak does not really provide much economic insight, especially if we do not take into account what the actual peak is.

I therefore dropped those values from the regression and re-calculated it, again with the same upper limit. Table 4 provides the results (each table is labeled with the dependent variable that was regressed on).

The variable for no radio play shows a fundamental difference between songs that get popular through the internet and songs that get popular through traditional radio venues, although it is not quite statistically significant. A song with no radio play hits its peak more than three weeks in advance of a song with radio play, even controlling for index. That implies a shorter growth period for internet songs, which helps validate the hypothesis that the internet provides its informational value much more quickly than traditional radio. To further bolster this hypothesis, the Max Radio Spins variable has a positive, statistically significant coefficient, which implies that the more a song is played on the radio, the longer it takes to hit its peak online popularity. One explanation is that the cost of providing information to someone else via the internet is so low that almost everyone can take part in it, while in the radio market only select people have influence.

The genre variables are also suggestive, as R&B, Pop, and World songs seem to take the longest to peak, while more niche markets like Alternative, Country, and Hip-Hop (the omitted genre dummy) peak much more quickly. In fact, the Pop genre dummy indicates that a pop song takes over twelve weeks longer than a hip-hop song to reach its peak. This effect can be attributed to two factors. First, pop music is more likely to hit the markets sooner, as it is more easily found and shared before most other music. Second, niche market music has a smaller audience, which means its digital life probably is shorter, which has an effect on the growth rate. The turnover rate in these niche markets probably is higher, which means songs come and go much more quickly. This conclusion might seem counterintuitive, as many believe that Pop music is much more likely to have one-hit wonders that disappear, with no real staying power. However, the Pop genre by definition implies that there is some universal appeal, so those songs are more likely to propagate throughout the network.

In addition, there is a negative value on Max Index, and although it is not statistically significant, it suggests that the more popular a song is, the quicker it hits its peak. The most popular songs rocket to the top, rather than a proportional growth rate applying to all types of songs. If the song had a Max Index of 100 (the most popular song in any given week), it would peak 8 weeks earlier than a song that never broke the charts. The positive coefficient on Max Index^2 shows that the effect is bounded at a certain point, which was what my model originally predicted.

The constant factor predicts that a hip-hop song with only minimal radio play and P2P activity takes a little more than three weeks to hit its peak online popularity.

Also, some of the city dummies were statistically significant at the 5% level.

Unfortunately these values do not seem to show much. Albuquerque and New Orleans are significantly larger than Albany, while Chattanooga is about the same size, and Gainesville is significantly smaller by population. I think geographic locality and number of people connected to the internet might account for a larger difference, rather than city size.

I then did the same regression for peak downloads instead of index, and generated Table 6.

Table 6 shows an interesting switch from the previous regression. While the variables are still not statistically significant, the signs on Max Index and Max Index squared have now flipped, showing that the more a song is downloaded, the longer it takes to hit a peak. This conclusion shows the difference between relative and absolute measurements, in that more popular songs race to the peak more quickly, but those with real staying power in the form of downloads take longer. In addition, not having any radio play reduces the number of weeks by more than five now, showing the same negative relationship between the two.

The genre dummies have different coefficients here, although they show the same general ranking. This behavior is expected, since one component of index is downloads, which means they are correlated. The different coefficients are probably the result of more variation in downloads, which is usually measured in 1000s, not 10s as index is.

The constant factor is significantly larger here than the previous regression, more than double the previous constant. Some of this difference is reflected in the summary statistics table, which showed that songs on average took two weeks more to hit their downloads peak after hitting their index peak. The other portion of the difference can probably be attributed to the increased negative coefficient on No Radio Play.

Some of the city dummies were significant in this regression as well, but again no clear pattern could be found. This trend continued for the rest of the regressions, so I do not report any of the city dummy coefficients after Table 7.

I next ran two more truncated regressions, this time on Y2 (the number of weeks a song spent above 80% of its peak downloads) and Y6 (the number of weeks a song spent above 80% of its peak index). Instead of an upper limit, I imposed a lower limit of 1, since the granularity problem meant that no matter what the values were, the value of Y2 and Y6 had to be at least 1. Table 8 provides the results.

Some of the genre dummies were omitted due to the truncation regression, since the sample size was small. This regression proves to be one of the more interesting ones, since the values in the table above are all statistically significant at the 5% level.

The coefficients on Max Index and Max Index Squared are again as expected. The more popular a song is online, the longer it spends above 80% of its peak. In other words, popular songs remain popular for a longer amount of time, while unpopular songs tend to die out more quickly. This observation seems analogous to a network effect, similar to the sampling hypothesis stated before. The more people listen to a song, the more other people know about that song and therefore listen to it, leading to a positive reinforcement cycle. The negative coefficient on Max Index Squared follows the prediction that the effect is bounded.

The RIAA lawsuits variables show the expected signs as well. Each additional round causes the songs to remain popular for a shorter amount of time, but it is bounded as well, so the effect is decreasing as more rounds of lawsuits are filed. In fact, if we plug in the value of 17 for both RIAA Lawsuit variables (17 rounds of lawsuits have been filed so far) and multiply by their coefficients, we get (17 * -.57246 + 17^2 * .0264 ) = .361, which implies that the usefulness of the lawsuits has come to an end. While I obviously do not believe that more RIAA lawsuits will increase the amount of time a song spends above 80% of its peak, the coefficients suggest that most of the shock value has been lost, and its effectiveness at preventing downloads has more or less waned.

The radio variables also have interesting interpretations. The negative coefficient on Max Radio Spins implies that the more a song is heard on the radio, the smaller its base of popularity is on P2P networks, which could imply a negative relationship between radio and downloads. This relationship could reveal a fundamental difference between songs that get popular by radio, and songs that get popular by downloads. Or it could mean that people have less incentive to download a song if they already hear it often enough on the radio. However, the dummy for No Radio Play seems to indicate that online songs have a larger growth base in general, as songs that are not played on the radio at all spend a week more above 80% of their peak than songs that are played on the radio.

Indeed, if as stated above the internet provides informational value more quickly than traditional radio, then it could definitely hold that the songs also stay popular longer. The decreased cost of providing information online also allows many users to help an artist build a base of support, causing the song to remain popular longer. Instead of relying on radio DJs, artists who rely on the public through viral marketing can find that their songs have longer staying power.

Table 9 provides the results for downloads instead of index.

The Max Index variables have the same signs as for the previous regression, as do the RIAA Lawsuits variables, but the interesting variable here is Max Radio Spins, which now has a positive value instead of a negative value. These two regressions taken together seem to imply that radio spins help drive and maintain downloads, but not enough to sustain popularity. So the absolute effect is a positive one, in that more radio spins leads to more downloads, but the relative effect is that these songs do not stay as popular as long.

Finally, I ran two ordinary least squares regressions on Y4 (the number of weeks between index peak and radio spins peak) and Y8 (the number of weeks between downloads peak and radio spins peak). I took out the RIAA lawsuit variables for the same reasons I did not include them in the Y1 and Y5 regressions. It is difficult to provide an economic explanation for why RIAA lawsuits would affect the difference between the peak in radio spins and peak in downloads, since it is a relative measure. All the observations in which there were no radio plays were also dropped, given that there is no clearly defined radio spins peak as a result. In addition, the No Radio Play variable was also excluded because by its construction (it is on only for those that have no radio peak) it has a value of 0 for every single one of the observations used in these two regressions. With 31 observations like that, the total number of observations used was 589 31 = 558.

As a reminder, a positive value for Y4 means that the song peaked first in the digital market before peaking in the radio market, and a negative value means the opposite. I was initially most hopefully about this variable, as it seems to most directly measure the lag effects between the digital and radio markets. However, most of the variables are not statistically significant, even at the ten percent level. In general, the more popular a song gets online, the more likely it is to peak online before it peaks in the radio market. Conversely, the more radio spins a song gets, the more likely it is to peak in the radio market before the digital market. The genre dummies could provide some clue as to what kinds of songs get popular online, but the large standard errors preclude any real conclusions. Indeed, the large constant hides most of the possible variation.

The table for Y8 (# weeks between peak downloads and peak radio) contains more of the same data. Again, most of the variables are not statistically significant, and there is an even larger constant in this regression.

VIII. Conclusions

This paper has established that a negative relationship exists between downloads and radio spins. In general, internet-only songs peak more quickly and have wider bases of support. However, for more traditional songs that are both played on the radio and downloaded, they generally peak first in the radio market before the downloads market.

The importance of this issue means it will not be going away any time soon. In fact, just as I was finishing my thesis, I noticed an announcement from BigChampagne.

Nielsen Entertainment, the leading provider of Actionable Entertainment Intelligence (AEI) and BigChampagne Online Media Measurement, have entered a strategic relationship to link airplay monitoring and Peer to Peer download data, it was announced today. This relationship will link Nielsen Entertainment`s BDSRadio.com, to BigChampagne`s Peer to Peer (P2P) charts, combining radio airplay spin data with Top Swaps. This combined analysis of Nielsen Entertainment`s BDSradio.com resources; digital, terrestrial, and satellite airplay data and BigChampagne`s P2P measurement, will provide both the radio and record industries with a unique matrix of music consumption measurement.

BigChampagne is combining with Nielsen Entertainment to provide precisely the type of information that I have been tracking for the last year or so. Radio industry executives already use BigChampagne`s information to help determine their play lists, as Howe 2003 shows, and more data mining is not far off. The P2P market has become the music industry`s biggest focus group, and while its effect on CD sales is inconclusive, its use as a valuable research tool is undeniable.

In general, the relationship between radio and P2P activity is an interesting issue because no one believes that radio plays are big substitutes for downloads or vice versa. Even with the advent of MP3 players that can play in the car, like the iTrip add-on to Apple`s iPod, they are still considered mostly complements. The informational value outweighs any substitution effect, leading to a mostly positive relationship between the two.

Future research can exploit this information, using sites like Audioscrobbler, last.fm, and upto11.net to track P2P activity and usage. This information could be combined with data from companies like BigChampagne—for example, the most played files listed on freedb could provide a good proxy for popularity online.

Other research in this area could focus more on the genre analysis. BigChampagne tracks multiple categories, of which I only analyzed hit-mainstream. Finding the growth rates for genres like Disney songs or adult contemporary could provide some insight into how songs in those genres should be advertised, whether a viral marketing strategy is viable.

In order to more clearly illustrate the link between radio and P2P networks, one could also look at markets where radio stations close down. Is there a spike in downloads after a radio station is gone, as people rush to find alternate sources for the songs they want to hear? Is there a corresponding dip when a radio station starts broadcasting in a certain format, and how long does it take?

Another interesting issue that I feel has not been explored in detail is the lag between when a song is available online and when it is available in stores. For example, Eminem`s CD Encore was released on November 16, 2004 but had already been ripped by November 3, 2004. In the two weeks between, the only place a consumer could find that song was either the radio or the downloads market, not the CD store. With daily download statistics, seeing how the general release of a CD affects the markets could have some interesting conclusions.

The P2P market is not going away any time soon. Biddle et al. 2002 coined the term darknet to describe a collection of networks and technologies used to share and distribute digital content. They conclude that darknets will always exist in some form, regardless of what legal actions corporations take. Instead of vast public networks like KaZaA or Gnutella, users will start congregating in private servers, backed by technology that is designed to allow small groups of friends to share files. In the past, programs like Hotline and FTP servers have provided this functionality, allowing for user verification and limiting access to only certain groups of people.

Currently, I see a few trends in P2P networks. First, more efficient protocols like BitTorrent are coming along that provide features like data integrity and distribution of bandwidth. These new programs will successively eliminate many of the problems of previous P2P programs, like search costs, bandwidth distribution, free riders, etc.

Second, P2P networks will grow both big and small. Bigger public networks like KaZaA will scale reliably into the millions of users, allowing a huge number of files to be shared as hard drive sizes increase exponentially and bandwidth becomes ever cheaper. But smaller private networks that allow groups of friends to exchange files privately will also be developed. Examples include Direct Connect and WASTE, a file-sharing program developed by Nullsoft, the company who made Gnutella. WASTE only allows between 10 and 50 users concurrently, a far cry from the 3 million users on KaZaA.

Yet while these networks grow in both directions, they will be linked by anonymity, privacy, and decentralization of servers. Use of proxies to disguise traffic, multiple points of failure to prevent targeted attacks (both legal and technical), port switching, and various other technical tools will be employed to help achieve these goals.

The music industry must adapt or risk being swamped by technical progress. Technology cannot be held back once developed, and trying to stamp it out through litigation is impossible.

=================

Acknowledgements:

I would like to thank my thesis advisor Professor Timothy Bresnahan for his advice and guidance through this entire project, my advisor Alex Gould for his insight on the radio industry, Adam Toll at BigChampagne for the invaluable access to the data, and Winnie Chen for analyzing the vast majority of my dataset. I`m also grateful to Andrew Lee and the rest of my friends for their last-minute proofreading help. All remaining mistakes are of course my responsibility.

Waynn Lue
Advisor: Timothy Bresnahan

Appendix A-Data Collection

The data collection portion of my thesis is detailed enough to warrant a discussion in and of itself.

After collecting 55 weeks worth of data, I tried manually building my data set, before realizing that it would take too much time. I automated the process by writing a few shell scripts that, given the artist and song combinations, would run a grep on a specific text pattern to generate a data set that had every time a song showed up in a market.

Once these grep/findstr queries had completed, I had to delete extraneous information from each file. The files were organized by genre, then by week, and then within each folder the 100 cities were separated by spreadsheet. Grep would return a line that had the directory name (the week number) which I wanted, then the filename, which I did not.

With that in mind, I used a Perl script to delete every filename from all the files, adding a comma separator between the week and rest of the columns in order to keep it in CSV format. I then used a tcsh shell script that added the column headings (Week, TW, LW, etc.) to every single file. Once those were run, I used Excel`s R1C1 reference style to generate SUM and MAX commands that would automatically load on each spreadsheet, instead of having to manually write that in every time.

Once that was complete, I had a set of folders that had every artist/song combination, and within each folder every market in which that song had shown up. From there I looked at each of the files and started gathering the variables defined previously.

Sources Cited

Adar, Eytan and Bernardo A Huberman. Free Riding on Gnutella, 2000. http://www.firstmonday.org/issues/issue5_10/adar/index.html, (accessed 4-4-05).

Audioscrobbler. http://www.audioscrobbler.com (accessed 5-1-05).

Barker, J. Cam. Grossly Excessive Penalties in the Battle Against Illegal File-Sharing: The Troubling Effects of Aggregating Minimum Statutory Damages for Copyright Infringement, Texas Law Review, Vol. 83, No. 2 83 Texas L. Rev. 525, 2004. Available online at http://ssrn.com/abstract=660601 (accessed 5-7-05).

Bellis, Mary. The History of MP3: Fraunhofer Gesellschaft and MP3, http://inventors.about.com/od/mstartinventions/a/MPThree.htm (accessed 4-3-05).

Biddle, Peter, Paul England, Marcus Peinado, and Bryan Willman. The Darknet and the Future of Content Distribution, Microsoft Corporation Abstract, 2002 ACM Workshop on Digital Rights Management, November 18, 2002. Available online at http://crypto.stanford.edu/DRM2002/darknet5.doc (accessed 4-21-05).

Blackburn, David. On-line Piracy and Recorded Music Sales, Job Market Paper. December 30, 2004. Available online at http://www.economics.harvard.edu/~dblackbu/papers/blackburn_fs.pdf (accessed 4-20-05).

Boehlert, Eric. Pay to Play, Salon. http://dir.salon.com/ent/feature/2001/03/14/payola/index.html, March 14, 2001 (accessed 4-17-05).

Borland, John. MP3 losing steam? CNET News.com. http://news.com.com/MP3+losing+steam/2100-1027_3-5409604.html?tag=nl, October 15, 2004a (accessed 4-28-05).

Borland, John. MPAA touts lawsuits, new P2P-fighting software, CNET News.com. http://news.com.com/MPAA+touts+lawsuits%2C+new+P2P-fighting+software/2100-1025_3-5454939.html?tag=nl, November 16, 2004b (accessed 4-27-05).

CacheLogic, online presentation, The True Picture of Peer-to-Peer Filesharing, http://www.cachelogic.com/press/CacheLogic_Press_and_Analyst_Presentation_July2004.pdf, July 2004. (accessed 5-1-05).

Cho, Ann. An Empirical Study of the Effect of Downloads on CD Album Sales, Stanford Economics Honors Thesis, June 2004.

Cohen, Bram. Incentives Build Robustness in BitTorrent, 05-22-2003. http://www.bittorrent.com/bittorrentecon.pdf (accessed 4-4-05).

DiCola, Peter and Kristin Thomson. Radio Deregulation: Has It Served Citizens and Musicians? A Report on the Effects of Radio Ownership Consolidation following the 1996 Telecommunications Act, Future of Music Coalition, 11-18-2002. Available online at http://www.futureofmusic.org/images/FMCradiostudy.pdf (accessed 4-23-05).

Donaldson-Evans, Catherine. Network TV`s Case of the Missing Men, Fox News, 11-21-03.

The Economist. Music`s Bright Future, October 28, 2004. Available online at http://www.economist.com/displaystory.cfm?story_id=S%27%29%28%20%2ERA%3F%25%23P%20T%0A (accessed 4-26-05, subscription required).

Einav, Liran. Gross Seasonality and Underlying Seasonality: Evidence from the U.S.

Motion Picture Industry, SIEPR Discussion Paper No. 02-36, forthcoming in RAND Journal of Economics, 2003. Available online at http://www.stanford.edu/~leinav/Seasonality.pdf (accessed 4-21-05).

Geist, Michael. Piercing the peertopeer myths: An examination of the Canadian experience, First Monday, volume 10, number 4, April 2005. Available online at http://firstmonday.org/issues/issue10_4/geist/index.html (accessed 4-26-05).

Golle, Philippe, Kevin Leyton-Brown, and Ilya Mironov. Incentives for Sharing in Peer-to-Peer Networks, Proceedings of the 3rd ACM conference on Electronic Commerce, 264-267, 2001. Available online at http://research.microsoft.com/users/mironov/papers/p2p.pdf (accessed 4-30-05).

Gopal, Ram D., Sudip Bhattacharjee, and G. Lawrence Sanders. Do Artists Benefit from Online Music Sharing? Journal of Business, Forthcoming http://ssrn.com/abstract=527324 (accessed 5-1-05).

Griliches, Zvi. Hybrid Corn: An Exploration in the Economies of Technological Change, Econometrica, Vol. 25, No. 4, pp. 501-522, October 1957.

Hirshleifer, Jack. Suppression of Inventions, The Journal of Political Economy, March 1971, Vol. 79, No. 2, 382-83.

Howe, Jeff. BigChampagne is Watching You, Wired, Issue 11.10, October 2003. Available online at http://www.wired.com/wired/archive/11.10/fileshare_pr.html (accessed 5-9-05).

Karagiannis, Thomas et al. Is P2P dying or just hiding? Globecom, 2004. Available online at http://www.caida.org/outreach/papers/2004/p2p-dying/p2p-dying.pdf (accessed 5-1-05).

Katunich, Lauren J. Time to quit paying the payola piper: why music industry abuse demands a complete system overhaul, Loyola Law School Entertainment Law Review, Volume 22, #3, 2002. Available online at http://elr.lls.edu/issues/v22-issue3/katunich.pdf (accessed 4-18-05).

Last.Fm. http://www.last.fm (accessed 5-1-05).

Liebowitz, Stan J. Pitfalls in Measuring the Impact of File-sharing. CESifo Economic Studies, July 2005. Available online at http://www.utdallas.edu/~liebowit/intprop/pitfalls.pdf (accessed 5-1-05).

Liebowitz, Stan J. Will MP3 downloads Annihilate the Record Industry? The Evidence so Far, Advances in the Study of Entrepreneurship, Innovation, and Economic Growth, V. 15, pp. 229-260, 2004. Available online at http://www.utdallas.edu/~liebowit/intprop/records.pdf (accessed 5-1-05).

Maguire, James. The Chaotic History of Music File Sharing, http://top40.about.com/library/weekly/aamp3historyp2p.htm (accessed 3-31-05).

McManus, Sean. A short history of file sharing, http://www.sean.co.uk/a/musicjournalism/var/historyoffilesharing.shtm (accessed 3-31-05).

Napster, Wikipedia. http://en.wikipedia.org/wiki/Napster (accessed 3-31-05).

Madden, Mary and Lenhart, Amanda. Sharp decline in music file swappers: Data memo from PIP and comScore Media Metrix, Pew Internet & American Life Project, http://www.pewinternet.org/report_display.asp?r=109, 1-4-04. (accessed 4-24-05)

Minar, Nelson. Distributed Systems Topologies, http://www.openp2p.com/pub/a/p2p/2001/12/14/topologies_one.html, 12-14-01. (accessed 4-4-05)

Peitz, Martin and Waelbroeck, Patrick. File-Sharing, Sampling, and Music Distribution, International University in Germany Working Paper No. 26/2004, December 2004. Available online at http://ssrn.com/abstract=652743 (accessed 5-7-05).

PyMusique. http://fuware.nanocrew.net/pymusique/ (accessed 5-1-05).

RIAA, 2003 Consumer Profile. http://www.riaa.com/news/marketingdata/pdf/2003consumerprofile.pdf (accessed 4-5-05).

Rob, Rafael and Waldfogel, Joel. Piracy On The High C`s: Music Downloading, Sales

Displacement, And Social Welfare In A Sample Of College Students, NBER Working Paper No. 10874, October 2004. Available online at http://www.nber.org/papers/w10874 (accessed 4-21-05).

RollingStone. http://www.rollingstone.com/ (accessed 5-7-05).

Shirky, Clay. In Praise of Freeloaders, 12-01-2000. http://www.openp2p.com/pub/a/p2p/2000/12/01/shirky_freeloading.html (accessed 4-4-05)

Stefan Saroiu, Krishna P. Gummadi, Steven D. Gribble. Measuring and analyzing the characteristics of Napster and Gnutella hosts, Multimedia Systems 9: 170184, 2003. Online at http://www.cs.washington.edu/homes/gribble/papers/msj.pdf (accessed 4-4-05).

Tanaka, Tatsuo. Does file sharing reduce music CD sales?: A case of Japan, Institute of Innovation Research Working Paper WP #05-08, Conference on IT Innovation, December 13, 2004.

UpTo11, http://www.upto11.net (accessed 5-1-05).

Vorbis, http://www.vorbis.com (accessed 5-1-05).

WASTE, http://waste.sourceforge.net/ (accessed 5-9-05).

Weisstein, Eric W. Ogive. From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/Ogive.html (accessed 5-5-05).

Zentner, Alejandro. Measuring the Effect of Music Downloads on Music Purchases. Manuscript, April 2004. Available online at http://home.uchicago.edu/~alezentn/musicindustrynew.pdf (accessed 4-21-05).

============

Something you think we should know? tips[at]p2pnet.net


HOME

Leave a Reply

ONLY items referencing the post at hand, please. No links to personal sites, no personal attacks, trolling, freebie advertising, or off-topic posts. Thanks. And Cheers!

    Sponsored by
tek savvy