The Internet has been around for more than a couple of decades now. But a lot of resources have been carefully archiving the Internet in its entirety over the past years. One of the most popular services that let you browse the yesteryears of the World Wide Web is WayBack Machine. Apart from the fact that it has archived more than 445 billion web pages, the weird part is that it has never published an inventory of the websites it archives or the algorithms it uses to determine what to capture and when.
With the Internet reaching a mature age for institutions to research on, these archives are now more important than ever. Despite the 445 billion web pages archived on Wayback Machine, there are certainly a lot of loose ends. For instance, BBC’s archive started in 1996, but the properly aligned images started appearing only after 2012. And the website where Wayback Machine posts all the stored web pages works in a slightly different manner. It posts only the web pages from top 1 million websites in 70 major countries, as ranked by Alexa.
“The WayBack Machine is used by hundreds of thousands of people every day, presenting snapshots, back in time, from more than 1.5 billion websites,” says Mark Graham, director of the Wayback Machine.
SOLUTION TO ERROR PAGES
Another feature of Wayback Machine is that the Chrome plugin recognizes whenever you come across a 404 or any other web page error while browsing your favorite sites. It then proceeds to check and see if there’s an archived version of that site. So, whether there is a web page that has been suspiciously removed from the Internet or the site is just too rotten to continue functioning, Wayback has the archive for you to investigate just that. In simpler terms, it is a way of fighting the menace of link rot.
The Internet Archive has a much nobler ambition for this new product, though. According to reports, almost 83% of the information documents under the Obama administration, and 49% of all Supreme Court records are missing from the Internet. And this is the problem that the Wayback Machine is looking to solve. The infamous link rot is a growing concern, and online archives are vital to preserving a vast plethora of important data.
In an interview with Entrepreneur Magazine, director Mark Graham shared an interesting experience from the service’s users.
“On July 17, 2014, Igor (Strelkov) Girkin, a Ukrainian separatist leader, claimed responsibility online for the downing of what he thought was a Ukrainian military transport plane near the rebel-held Ukrainian city of Donetsk. When reports that Malaysian Airlines Flight MH17, with 295 passengers, had been shot down in the same area, his post was removed. But not before it had been preserved several times by the Wayback Machine, where it is available today.”
USP AND THE FUTURE
The biggest feature of Wayback Machine is the way the site crawls all these billions and trillions of web pages for information and snapshots. What their inventory of more than half a trillion web captures is not the result of a single continuous crawling process but rather millions of separate crawls, defined by thousands of people, over the years. The company is aiming to build the ultimate database of the entire Internet that is permanently available to everyone that is curious enough to want access.
Thus you can use the WayBack Machine to view Archived or Cached web pages on the Internet as well to save a web page as proof that it appeared first on the Internet.
WayBack Machine Chrome extension
WayBack Machine has released and excellent browser extension that can reduce annoying 404 pages. This extension will detect error codes 404, 408, 410, 451, 500, 502, 503, 504, 509, 520, 521, 523, 524, 525, and 526 and offer to display the archived version. You can download it here.