Skip to main content

Home/ Future of the Web/ Group items tagged data

Rss Feed Group items tagged

Paul Merrell

Facebook's Marketplace Faces Antitrust Probes in EU, U.K. - WSJ - 1 views

  • The European Union and the U.K. opened formal antitrust investigations into Facebook Inc.’s FB -0.86% classified-ads service Marketplace, ramping up regulatory scrutiny for the company in Europe. Both the European Commission—the EU’s top antitrust enforcer—and the U.K.’s Competition and Markets Authority said Friday they are investigating whether Facebook repurposes data it gathers from advertisers who buy ads in order to give illegal advantages to its own services, including its Marketplace online flea market. The U.K. added that it is also investigating whether Facebook uses advertiser data to give similar advantages to its online-dating service. The two competition watchdogs said they would coordinate their investigations.
  • Separately on Friday, Germany’s competition regulator announced that it is opening an investigation into Google’s News Showcase, in which the tech company pays to license certain content from news publishers. That probe, which is based on new powers Germany had granted the regulator, will look among other things at whether Google is imposing unfair conditions on publishers and how it selects participants, the Federal Cartel Office said.
  • The three newly opened cases are part of a new wave of antitrust enforcement in Europe. The European Commission filed formal charges last month against Apple Inc. for allegedly abusing its control over the distribution of music-streaming apps, including Spotify Technology SA . In November, it filed formal charges against Amazon.com Inc. for allegedly using nonpublic data it gathers from third-party sellers to unfairly compete against them. Both companies denied wrongdoing. At the same time, the U.K.’s CMA has opened investigations into Google’s announcement that it will retire third-party cookies, a technology advertisers use to track web users, and whether Apple imposes anticompetitive conditions on some app developers, including the use of Apple’s in-app payment system, which is also the subject of a lawsuit in the U.S. In the EU, the European Commission has been investigating Facebook for more than a year on multiple fronts. Facebook and the Commission have squabbled over access to internal documents as part of those investigations.
  • ...1 more annotation...
  • New York State Attorney General Letitia James outlined in December a sweeping antitrust suit against Facebook by the Federal Trade Commission and a bipartisan group of 46 state attorneys general, targeting the company’s tactics against competitors. Photo: Saul Loeb/AFP via Getty Images (Video from 12/9/20)
Gonzalo San Gil, PhD.

Who owns your data? | ITworld - 0 views

  •  
    "By Adam Ruttenberg, CIO | Big Data Add a comment May 30, 2014, 10:10 AM - People may disagree about the meaning of the term "big data," but they agree that businesses can gain advantages by analyzing large data sets in the cloud. The trick is getting the right data."
Paul Merrell

Sun's Advanced Datacenter (Santa Clara, CA) - System News - 0 views

  • To run Sun’s award-winning data centers, a modular design containing many "pods" was implemented to save power and time. The modular design aids the building of any sized datacenter. Inside of each pod, there are 24 racks. Each of these 24 racks has a common cooling system as does every other modular building block. The number of pods is limited by the size of the datacenters. Large and small datacenters can benefit from using the pod approach. The module design makes it easy to configure a datacenter to meet a client's requirements. As the datacenter grows over time, adding pods is convenient. The module and pod designs make it easy to adapt to new technology such as blade servers. Some of the ways that Sun’s datacenter modules are designed with the future in mind are as follows:
  • To run Sun’s award-winning data centers, a modular design containing many "pods" was implemented to save power and time. The modular design aids the building of any sized datacenter. Inside of each pod, there are 24 racks. Each of these 24 racks has a common cooling system as does every other modular building block. The number of pods is limited by the size of the datacenters. Large and small datacenters can benefit from using the pod approach. The module design makes it easy to configure a datacenter to meet a client's requirements. As the datacenter grows over time, adding pods is convenient. The module and pod designs make it easy to adapt to new technology such as blade servers.
  • An updated 58-page Sun BluePrint covers Sun's approach to designing datacenters. (Authors - Dean Nelson, Michael Ryan, Serena DeVito, Ramesh KV, Petr Vlasaty, Brett Rucker, and Brian Day): ENERGY EFFICIENT DATACENTERS: THE ROLE OF MODULARITY IN DATACENTER DESIGN. More Information Sun saves $1 million/year with new datacenter Take a Virtual Tour
  • ...3 more annotations...
  • An updated 58-page Sun BluePrint covers Sun's approach to designing datacenters. (Authors - Dean Nelson, Michael Ryan, Serena DeVito, Ramesh KV, Petr Vlasaty, Brett Rucker, and Brian Day): ENERGY EFFICIENT DATACENTERS: THE ROLE OF MODULARITY IN DATACENTER DESIGN.
  • Take a Virtual Tour
  • Other articles in the Hardware section of Volume 125, Issue 1: Sun's Advanced Datacenter (Santa Clara, CA) Modular Approach Is Key to Datacenter Design for Sun Sun Datacenter Switch 3x24 See all archived articles in the
  •  
    This page seems to be the hub for information about the Sun containerized data centers. I've highlighted links as well as text, but not all the text on the page. Info gathered in the process of surfing the linked pages: [i] the 3x24 data switch page recomends redundant Solaris instances; [ii] x64 blade servers are the design target; [iii] there is specific mention of other Sun-managed data centers being erected in Indiana and in Bangalore, India; [iv] the whiff is that Sun might not only be supplying the data centers for the Microsoft cloud but also managing them; and [v] the visual tour is very impressive; clearly some very brilliant people put a lot of hard and creative work into this.
Paul Merrell

Obama to propose legislation to protect firms that share cyberthreat data - The Washing... - 0 views

  • President Obama plans to announce legislation Tuesday that would shield companies from lawsuits for sharing computer threat data with the government in an effort to prevent cyber­attacks. On the heels of a destructive attack at Sony Pictures Entertainment and major breaches at JPMorgan Chase and retail chains, Obama is intent on capitalizing on the heightened sense of urgency to improve the security of the nation’s networks, officials said. “He’s been doing everything he can within his executive authority to move the ball on this,” said a senior administration official who spoke on the condition of anonymity to discuss legislation that has not yet been released. “We’ve got to get something in place that allows both industry and government to work more closely together.”
  • The legislation is part of a broader package, to be sent to Capitol Hill on Tuesday, that includes measures to help protect consumers and students against ­cyberattacks and to give law enforcement greater authority to combat cybercrime. The provision’s goal is to “enshrine in law liability protection for the private sector for them to share specific information — cyberthreat indicators — with the government,” the official said. Some analysts questioned the need for such legislation, saying there are adequate measures in place to enable sharing between companies and the government and among companies.
  • “We think the current information-sharing regime is adequate,” said Mark Jaycox, legislative analyst at the Electronic Frontier Foundation, a privacy group. “More companies need to use it, but the idea of broad legal immunity isn’t needed right now.” The administration official disagreed. The lack of such immunity is what prevents many companies from greater sharing of data with the government, the official said. “We have heard that time and time again,” the official said. The proposal, which builds on a 2011 administration bill, grants liability protection to companies that provide indicators of cyberattacks and threats to the Department of Homeland Security.
  • ...5 more annotations...
  • But in a provision likely to raise concerns from privacy advocates, the administration wants to require DHS to share that information “in as near real time as possible” with other government agencies that have a cybersecurity mission, the official said. Those include the National Security Agency, the Pentagon’s ­Cyber Command, the FBI and the Secret Service. “DHS needs to take an active lead role in ensuring that unnecessary personal information is not shared with intelligence authorities,” Jaycox said. The debates over government surveillance prompted by disclosures from former NSA contractor Edward Snowden have shown that “the agencies already have a tremendous amount of unnecessary information,” he said.
  • It would reaffirm that federal racketeering law applies to cybercrimes and amends the Computer Fraud and Abuse Act by ensuring that “insignificant conduct” does not fall within the scope of the statute. A third element of the package is legislation Obama proposed Monday to help protect consumers and students against cyberattacks. The theft of personal financial information “is a direct threat to the economic security of American families, and we’ve got to stop it,” Obama said. The plan, unveiled in a speech at the Federal Trade Commission, would require companies to notify customers within 30 days after the theft of personal information is discovered. Right now, data breaches are handled under a patchwork of state laws that the president said are confusing and costly to enforce. Obama’s plan would streamline those into one clear federal standard and bolster requirements for companies to notify customers. Obama is proposing closing loopholes to make it easier to track down cybercriminals overseas who steal and sell identities. “The more we do to protect consumer information and privacy, the harder it is for hackers to damage our businesses and hurt our economy,” he said.
  • Efforts to pass information-sharing legislation have stalled in the past five years, blocked primarily by privacy concerns. The package also contains provisions that would allow prosecution for the sale of botnets or access to armies of compromised computers that can be used to spread malware, would criminalize the overseas sale of stolen U.S. credit card and bank account numbers, would expand federal law enforcement authority to deter the sale of spyware used to stalk people or commit identity theft, and would give courts the authority to shut down botnets being used for criminal activity, such as denial-of-service attacks.
  • The administration official stressed that the legislation will require companies to remove unnecessary personal information before furnishing it to the government in order to qualify for liability protection. It also will impose limits on the use of the data for cybersecurity crimes and instances in which there is a threat of death or bodily harm, such as kidnapping, the official said. And it will require DHS and the attorney general to develop guidelines for the federal government’s use and retention of the data. It will not authorize a company to take offensive cyber-measures to defend itself, such as “hacking back” into a server or computer outside its own network to track a breach. The bill also will provide liability protection to companies that share data with private-sector-developed organizations set up specifically for that purpose. Called information sharing and analysis organizations, these groups often are set up by particular industries, such as banking, to facilitate the exchange of data and best practices.
  • In October, Obama signed an order to protect consumers from identity theft by strengthening security features in credit cards and the terminals that process them. Marc Rotenberg, executive director of the Electronic Privacy Information Center, said there is concern that a federal standard would “preempt stronger state laws” about how and when companies have to notify consumers. The Student Digital Privacy Act would ensure that data entered would be used only for educational purposes. It would prohibit companies from selling student data to third-party companies for purposes other than education. Obama also plans to introduce a Consumer Privacy Bill of Rights. And the White House will host a summit on cybersecurity and consumer protection on Feb. 13 at Stanford University.
Paul Merrell

Proposed changes to US data collection fall short of NSA reformers' goals | US news | T... - 0 views

  • The US intelligence community has delivered a limited list of tweaks to how long it can hold information on ordinary citizens and hide secret trawls for data, responding to Barack Obama’s call for reform of its surveillance practices in the wake of revelations about NSA practices. Published by the office of the director of national intelligence, James Clapper, just six days before a recently announced visit to Washington by the German chancellor, Angela Merkel, the report is the culmination of a year-long effort to respond to revelations by whistleblower Edward Snowden.
  • But the report does not appear to address the role of telecommunications companies in collecting metadata and the use of encryption to prevent hacking, and privacy critics were quick to pounce on a year of promises with little reform to show. “It’s hard to see much ‘there’ there,” Senator Ron Wyden said in a statement. “When it comes to reforming intelligence programs and protecting Americans’ privacy, there is much, much more work to be done.” The outline from the intelligence community also appears to fall short of the legislative changes attempted by campaigners in Congress, focusing instead on measures to tighten internal guidelines and provide foreigners with some of the protections allowed for US citizens. These measures include:
  • Other measures outlined in the new report include steps to clarify the protection given to whistleblowers if they follow internal rules and a requirement that “any significant compliance incident involving personal information, regardless of the person’s nationality” be reported to Clapper.
  • ...3 more annotations...
  • Limiting how long personal data gathered from non-US citizens can be held to five years, so long as it is deemed not relevant to ongoing intelligence investigations. Asking Congress to provide some foreign nationals access to legal redress if their private information has been wilfully disclosed by US intelligence agencies. Limiting to three years how long the FBI can prevent disclosure of its surveillance activities using so-called national security letters, unless a special agent deems otherwise.
  • The official results of Obama’s call for surveillance reform also appear to have failed to address encryption. The FBI director, James Comey, and other officials have been highly critical of the use of encryption by tech companies such as Apple to protect their users’ information. Comey has argued that stronger encryption, baked in to some technology after the Snowden revelations, will aid criminals and terrorists and shut out law enforcement.
  • The intelligence report itself acknowledges that further reforms called for by the president, such as ending the collection of bulk data by the government, have not been implemented, possibly due to stalled legislative efforts in Congress.
Paul Merrell

ISPs say the "massive cost" of Snooper's Charter will push up UK broadband bills | Ars ... - 0 views

  • How much extra will you have to pay for the privilege of being spied on?
  • UK ISPs have warned MPs that the costs of implementing the Investigatory Powers Bill (aka the Snooper's Charter) will be much greater than the £175 million the UK government has allotted for the task, and that broadband bills will need to rise as a result. Representatives from ISPs and software companies told the House of Commons Science and Technology Committee that the legislation greatly underestimates the "sheer quantity" of data generated by Internet users these days. They also pointed out that distinguishing content from metadata is a far harder task than the government seems to assume. Matthew Hare, the chief executive of ISP Gigaclear, said with "a typical 1 gigabit connection to someone's home, over 50 terabytes of data per year [are] passing over it. If you say that a proportion of that is going to be the communications data—the record of who you communicate with, when you communicate or what you communicate—there would be the most massive and enormous amount of data that in future an access provider would be expected to keep. The indiscriminate collection of mass data across effectively every user of the Internet in this country is going to have a massive cost."
  • Moreover, the larger the cache of stored data, the more worthwhile it will be for criminals and state-backed actors to gain access and download that highly-revealing personal information for fraud and blackmail. John Shaw, the vice president of product management at British security firm Sophos, told the MPs: "There would be a huge amount of very sensitive personal data that could be used by bad guys.
  • ...2 more annotations...
  • The ISPs also challenged the government's breezy assumption that separating the data from the (equally revealing) metadata would be simple, not least because an Internet connection is typically being used for multiple services simultaneously, with data packets mixed together in a completely contingent way. Hare described a typical usage scenario for a teenager on their computer at home, where they are playing a game communicating with their friends using Steam; they are broadcasting the game using Twitch; and they may also be making a voice call at the same time too. "All those applications are running simultaneously," Hare said. "They are different applications using different servers with different services and different protocols. They are all running concurrently on that one machine." Even accessing a Web page is much more complicated than the government seems to believe, Hare pointed out. "As a webpage is loading, you will see that that webpage is made up of tens, or many tens, of individual sessions that have been created across the Internet just to load a single webpage. Bluntly, if you want to find out what someone is doing you need to be tracking all of that data all the time."
  • Hare raised another major issue. "If I was a software business ... I would be very worried that my customers would not buy my software any more if it had anything to do with security at all. I would be worried that a backdoor was built into the software by the [Investigatory Powers] Bill that would allow the UK government to find out what information was on that system at any point they wanted in the future." As Ars reported last week, the ability to demand that backdoors are added to systems, and a legal requirement not to reveal that fact under any circumstances, are two of the most contentious aspects of the new Investigatory Powers Bill. The latest comments from industry experts add to concerns that the latest version of the Snooper's Charter would inflict great harm on civil liberties in the UK, and also make security research well-nigh impossible here. To those fears can now be added undermining the UK software industry, as well as forcing the UK public to pay for the privilege of having their ISP carry out suspicionless surveillance.
Paul Merrell

Facebook plans to build huge Utah data center | Fox Business - 0 views

  • acebook plans to build a data center in Utah on a 490-acre site, the social media company and government officials announced on Wednesday.Continue Reading BelowThe Eagle Mountain, Utah, facility will be 970,000 square feet and located about 15 miles south of a National Security Agency data center in Bluffdale, Utah.
  •  
    15 miles from the NSA's Bluffdale facility. An amazing coincidence or something more sinister?
Paul Merrell

Zuckerberg set up fraudulent scheme to 'weaponise' data, court case alleges | Technolog... - 1 views

  • Mark Zuckerberg faces allegations that he developed a “malicious and fraudulent scheme” to exploit vast amounts of private data to earn Facebook billions and force rivals out of business. A company suing Facebook in a California court claims the social network’s chief executive “weaponised” the ability to access data from any user’s network of friends – the feature at the heart of the Cambridge Analytica scandal. A legal motion filed last week in the superior court of San Mateo draws upon extensive confidential emails and messages between Facebook senior executives including Mark Zuckerberg. He is named individually in the case and, it is claimed, had personal oversight of the scheme. Facebook rejects all claims, and has made a motion to have the case dismissed using a free speech defence.
  • It claims the first amendment protects its right to make “editorial decisions” as it sees fit. Zuckerberg and other senior executives have asserted that Facebook is a platform not a publisher, most recently in testimony to Congress.
  • Heather Whitney, a legal scholar who has written about social media companies for the Knight First Amendment Institute at Columbia University, said, in her opinion, this exposed a potential tension for Facebook. “Facebook’s claims in court that it is an editor for first amendment purposes and thus free to censor and alter the content available on its site is in tension with their, especially recent, claims before the public and US Congress to be neutral platforms.” The company that has filed the case, a former startup called Six4Three, is now trying to stop Facebook from having the case thrown out and has submitted legal arguments that draw on thousands of emails, the details of which are currently redacted. Facebook has until next Tuesday to file a motion requesting that the evidence remains sealed, otherwise the documents will be made public.
Paul Merrell

Is Apple an Illegal Monopoly? | OneZero - 0 views

  • That’s not a bug. It’s a function of Apple policy. With some exceptions, the company doesn’t let users pay app makers directly for their apps or digital services. They can only pay Apple, which takes a 30% cut of all revenue and then passes 70% to the developer. (For subscription services, which account for the majority of App Store revenues, that 30% cut drops to 15% after the first year.) To tighten its grip, Apple prohibits the affected apps from even telling users how they can pay their creators directly.In 2018, unwilling to continue paying the “Apple tax,” Netflix followed Spotify and Amazon’s Kindle books app in pulling in-app purchases from its iOS app. Users must now sign up elsewhere, such as on the company’s website, in order for the app to become usable. Of course, these brands are big enough to expect that many users will seek them out anyway.
  • Smaller app developers, meanwhile, have little choice but to play by Apple’s rules. That’s true even when they’re competing with Apple’s own apps, which pay no such fees and often enjoy deeper access to users’ devices and information.Now, a handful of developers are speaking out about it — and government regulators are beginning to listen. David Heinemeier Hansson, the co-founder of the project management software company Basecamp, told members of the U.S. House antitrust subcommittee in January that navigating the App Store’s fees, rules, and review processes can feel like a “Kafka-esque nightmare.”One of the world’s most beloved companies, Apple has long enjoyed a reputation for user-friendly products, and it has cultivated an image as a high-minded protector of users’ privacy. The App Store, launched in 2008, stands as one of its most underrated inventions; it has powered the success of the iPhone—perhaps the most profitable product in human history. The concept was that Apple and developers could share in one another’s success with the iPhone user as the ultimate beneficiary.
  • But critics say that gauzy success tale belies the reality of a company that now wields its enormous market power to bully, extort, and sometimes even destroy rivals and business partners alike. The iOS App Store, in their telling, is a case study in anti-competitive corporate behavior. And they’re fighting to change that — by breaking its choke hold on the Apple ecosystem.
  • ...4 more annotations...
  • Whether Apple customers have a real choice in mobile platforms, once they’ve bought into the company’s ecosystem, is another question. In theory, they could trade in their pricey hardware for devices that run Android, which offers equivalents of many iOS features and apps. In reality, Apple has built its empire on customer lock-in: making its own gadgets and services work seamlessly with one another, but not with those of rival companies. Tasks as simple as texting your friends can become a migraine-inducing mess when you switch from iOS to Android. The more Apple products you buy, the more onerous it becomes to abandon ship.
  • The case against Apple goes beyond iOS. At a time when Apple is trying to reinvent itself as a services company to offset plateauing hardware sales — pushing subscriptions to Apple Music, Apple TV+, Apple News+, and Apple Arcade, as well as its own credit card — the antitrust concerns are growing more urgent. Once a theoretical debate, the question of whether its App Store constitutes an illegal monopoly is now being actively litigated on multiple fronts.
  • The company faces an antitrust lawsuit from consumers; a separate antitrust lawsuit from developers; a formal antitrust complaint from Spotify in the European Union; investigations by the Federal Trade Commission and the Department of Justice; and an inquiry by the antitrust subcommittee of the U.S House of Representatives. At stake are not only Apple’s profits, but the future of mobile software.Apple insists that it isn’t a monopoly, and that it strives to make the app store a fair and level playing field even as its own apps compete on that field. But in the face of unprecedented scrutiny, there are signs that the famously stubborn company may be feeling the pressure to prove it.
  • Tile is hardly alone in its grievances. Apple’s penchant for copying key features of third-party apps and integrating them into its operating system is so well-known among developers that it has a name: “Sherlocking.” It’s a reference to the time—in the early 2000s—when Apple kneecapped a popular third-party web-search interface for Mac OS X, called Watson. Apple built virtually all of Watson’s functionality into its own feature, called Sherlock.In a 2006 blog post, Watson’s developer, Karelia Software, recalled how Apple’s then-CEO Steve Jobs responded when they complained about the company’s 2002 power play. “Here’s how I see it,” Jobs said, according to Karelia founder Dan Wood’s loose paraphrase. “You know those handcars, the little machines that people stand on and pump to move along on the train tracks? That’s Karelia. Apple is the steam train that owns the tracks.”From an antitrust standpoint, the metaphor is almost too perfect. It was the monopoly power of railroads in the late 19th century — and their ability to make or break the businesses that used their tracks — that spurred the first U.S. antitrust regulations.There’s another Jobs quote that’s relevant here. Referencing Picasso’s famous saying, “Good artists copy, great artists steal,” Jobs said of Apple in 2006. “We have always been shameless about stealing great ideas.” Company executives later tried to finesse the quote’s semantics, but there’s no denying that much of iOS today is built on ideas that were not originally Apple’s.
Paul Merrell

The People and Tech Behind the Panama Papers - Features - Source: An OpenNews project - 0 views

  • Then we put the data up, but the problem with Solr was it didn’t have a user interface, so we used Project Blacklight, which is open source software normally used by librarians. We used it for the journalists. It’s simple because it allows you to do faceted search—so, for example, you can facet by the folder structure of the leak, by years, by type of file. There were more complex things—it supports queries in regular expressions, so the more advanced users were able to search for documents with a certain pattern of numbers that, for example, passports use. You could also preview and download the documents. ICIJ open-sourced the code of our document processing chain, created by our web developer Matthew Caruana Galizia. We also developed a batch-searching feature. So say you were looking for politicians in your country—you just run it through the system, and you upload your list to Blacklight and you would get a CSV back saying yes, there are matches for these names—not only exact matches, but also matches based on proximity. So you would say “I want Mar Cabra proximity 2” and that would give you “Mar Cabra,” “Mar whatever Cabra,” “Cabra, Mar,”—so that was good, because very quickly journalists were able to see… I have this list of politicians and they are in the data!
  • Last Sunday, April 3, the first stories emerging from the leaked dataset known as the Panama Papers were published by a global partnership of news organizations working in coordination with the International Consortium of Investigative Journalists, or ICIJ. As we begin the second week of reporting on the leak, Iceland’s Prime Minister has been forced to resign, Germany has announced plans to end anonymous corporate ownership, governments around the world launched investigations into wealthy citizens’ participation in tax havens, the Russian government announced that the investigation was an anti-Putin propaganda operation, and the Chinese government banned mentions of the leak in Chinese media. As the ICIJ-led consortium prepares for its second major wave of reporting on the Panama Papers, we spoke with Mar Cabra, editor of ICIJ’s Data & Research unit and lead coordinator of the data analysis and infrastructure work behind the leak. In our conversation, Cabra reveals ICIJ’s years-long effort to build a series of secure communication and analysis platforms in support of genuinely global investigative reporting collaborations.
  • For communication, we have the Global I-Hub, which is a platform based on open source software called Oxwall. Oxwall is a social network, like Facebook, which has a wall when you log in with the latest in your network—it has forum topics, links, you can share files, and you can chat with people in real time.
  • ...3 more annotations...
  • We had the data in a relational database format in SQL, and thanks to ETL (Extract, Transform, and Load) software Talend, we were able to easily transform the data from SQL to Neo4j (the graph-database format we used). Once the data was transformed, it was just a matter of plugging it into Linkurious, and in a couple of minutes, you have it visualized—in a networked way, so anyone can log in from anywhere in the world. That was another reason we really liked Linkurious and Neo4j—they’re very quick when representing graph data, and the visualizations were easy to understand for everybody. The not-very-tech-savvy reporter could expand the docs like magic, and more technically expert reporters and programmers could use the Neo4j query language, Cypher, to do more complex queries, like show me everybody within two degrees of separation of this person, or show me all the connected dots…
  • We believe in open source technology and try to use it as much as possible. We used Apache Solr for the indexing and Apache Tika for document processing, and it’s great because it processes dozens of different formats and it’s very powerful. Tika interacts with Tesseract, so we did the OCRing on Tesseract. To OCR the images, we created an army of 30–40 temporary servers in Amazon that allowed us to process the documents in parallel and do parallel OCR-ing. If it was very slow, we’d increase the number of servers—if it was going fine, we would decrease because of course those servers have a cost.
  • For the visualization of the Mossack Fonseca internal database, we worked with another tool called Linkurious. It’s not open source, it’s licensed software, but we have an agreement with them, and they allowed us to work with it. It allows you to represent data in graphs. We had a version of Linkurious on our servers, so no one else had the data. It was pretty intuitive—journalists had to click on dots that expanded, basically, and could search the names.
Paul Merrell

Archiveteam - 0 views

  • HISTORY IS OUR FUTURE And we've been trashing our history Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: IT DOESN'T HAVE TO BE THIS WAY. This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction. Currently Active Projects (Get Involved Here!) Archive Team recruiting Want to code for Archive Team? Here's a starting point.
  • Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions - and done our best to save the history before it's lost forever. Along the way, we've gotten attention, resistance, press and discussion, but most importantly, we've gotten the message out: IT DOESN'T HAVE TO BE THIS WAY. This website is intended to be an offloading point and information depot for a number of archiving projects, all related to saving websites or data that is in danger of being lost. Besides serving as a hub for team-based pulling down and mirroring of data, this site will provide advice on managing your own data and rescuing it from the brink of destruction.
  • Who We Are and how you can join our cause! Deathwatch is where we keep track of sites that are sickly, dying or dead. Fire Drill is where we keep track of sites that seem fine but a lot depends on them. Projects is a comprehensive list of AT endeavors. Philosophy describes the ideas underpinning our work. Some Starting Points The Introduction is an overview of basic archiving methods. Why Back Up? Because they don't care about you. Back Up your Facebook Data Learn how to liberate your personal data from Facebook. Software will assist you in regaining control of your data by providing tools for information backup, archiving and distribution. Formats will familiarise you with the various data formats, and how to ensure your files will be readable in the future. Storage Media is about where to get it, what to get, and how to use it. Recommended Reading links to others sites for further information. Frequently Asked Questions is where we answer common questions.
  •  
    The Archive Team Warrior is a virtual archiving appliance. You can run it to help with the ArchiveTeam archiving efforts. It will download sites and upload them to our archive - and it's really easy to do! The warrior is a virtual machine, so there is no risk to your computer. The warrior will only use your bandwidth and some of your disk space. It will get tasks from and report progress to the Tracker. Basic usage The warrior runs on Windows, OS X and Linux using a virtual machine. You'll need one of: VirtualBox (recommended) VMware workstation/player (free-gratis for personal use) See below for alternative virtual machines Partners with and contributes lots of archives to the Wayback Machine. Here's how you can help by contributing some bandwidth if you run an always-on box with an internet connection.
Gonzalo San Gil, PhD.

Startup Crunches 100 Terabytes of Data in a Record 23 Minutes | WIRED - 0 views

  •  
    "There's a new record holder in the world of "big data." On Friday, Databricks-a startup spun out of the University California, Berkeley-announced that it has sorted 100 terabytes of data in a record 23 minutes using a number-crunching tool called Spark, eclipsing the previous record held by Yahoo and the popular big-data tool Hadoop."
Gary Edwards

» 21 Facts About NSA Snooping That Every American Should Know Alex Jones' Inf... - 0 views

  •  
    NSA-PRISM-Echelon in a nutshell.  The list below is a short sample.  Each fact is documented, and well worth the time reading. "The following are 21 facts about NSA snooping that every American should know…" #1 According to CNET, the NSA told Congress during a recent classified briefing that it does not need court authorization to listen to domestic phone calls… #2 According to U.S. Representative Loretta Sanchez, members of Congress learned "significantly more than what is out in the media today" about NSA snooping during that classified briefing. #3 The content of all of our phone calls is being recorded and stored.  The following is a from a transcript of an exchange between Erin Burnett of CNN and former FBI counterterrorism agent Tim Clemente which took place just last month… #4 The chief technology officer at the CIA, Gus Hunt, made the following statement back in March… "We fundamentally try to collect everything and hang onto it forever." #5 During a Senate Judiciary Oversight Committee hearing in March 2011, FBI Director Robert Mueller admitted that the intelligence community has the ability to access emails "as they come in"… #6 Back in 2007, Director of National Intelligence Michael McConnell told Congress that the president has the "constitutional authority" to authorize domestic spying without warrants no matter when the law says. #7 The Director Of National Intelligence James Clapper recently told Congress that the NSA was not collecting any information about American citizens.  When the media confronted him about his lie, he explained that he "responded in what I thought was the most truthful, or least untruthful manner". #8 The Washington Post is reporting that the NSA has four primary data collection systems… MAINWAY, MARINA, METADATA, PRISM #9 The NSA knows pretty much everything that you are doing on the Internet.  The following is a short excerpt from a recent Yahoo article… #10 The NSA is suppose
Paul Merrell

Google Wins Patent For Data Center In A Box; Trouble For Sun, Rackable, IBM? -- Data Ce... - 0 views

  • Google (NSDQ: GOOG) has obtained a broad patent for a data center in a container, which might put a kink in product plans for companies like Sun Microsystems (NSDQ: JAVA), Rackable Systems, and IBM (NYSE: IBM). The patent, granted Tuesday, covers "modular data centers with modular components that can implemented in numerous ways, including as a process, an apparatus, a system, a device, or a method."
  • The U.S. Patent Trademark Office site reveals patent number 7,278,273 as describing modules in intermodal shipping containers, or those that can be shipped by multiple carriers and systems. It also covers computing systems mounted within temperature-controlled containers, configured so they can ship easily, be factory built and deployed at data center sites.
  • If this sounds familiar, it might just be. Google's patent description resembles Sun Microsystems' data center in a box, called Project Blackbox. During its debut last year, Sun installed a Blackbox -- essentially a cargo container for 18-wheelers -- outside of Grand Central Station in New York City to show how easily one of their data centers could be installed. Google's patent description also has similarities to Rackable Systems' ICE Cube as well as IBM's Scalable Modular Data Center.
Gary Edwards

Official Google Webmaster Central Blog: Introducing Rich Snippets - 0 views

  •  
    Google "Rich Snippets" is a new presentation of HTML snippets that applies Google's algorithms to highlight structured data embedded in web pages. Rich Snippets give end-users convenient summary information about their search results at a glance. Google is currently supporting a very limited subset of data about reviews and people. When searching for a product or service, users can easily see reviews and ratings, and when searching for a person, they'll get help distinguishing between people with the same name. It's a simple change to the display of search results, yet our experiments have shown that users find the new data valuable. For this to work though, both Web-masters and Web-workers have to annotate thier pages with structured data in a standard format. Google snippets supports microformats and RDFa. Existing Web data can be wrapped with some additional tags to accomplish this. Notice that Google avoids mention of RDF and the W3C's vision of a "Semantic Web" where Web objects are fully described in machine readable semantics. Over at the WHATWG group, where work on HTML5 continues, Google's Ian Hickson has been fighting RDFa and the Semantic Web in what looks to be an effort to protect the infamous Google algorithms. RDFa provides a means for Web-workers, knowledge-workers, line-of-business managers and document generating end-users to enrich their HTML+ with machine semantics. The idea being that the document experts creating Web content can best describe to search engine and content management machines the objects-of-information used. The google algorithms provide a proprietary semantics of this same content. The best solution to the tsunami of conten the Web has wrought would be to combine end-user semantic expertise with Google algorithms. Let's hope Google stays the RDFa course and comes around to recognize the full potential of organizing the world's information with the input of content providers. One thing the world desperatel
Gary Edwards

Developing a Universal Markup Solution For Web Content - 0 views

  •  
    KODAXIL To Replace XML?

    File this one under the Universal Interoperability label. Very interesting. Especially since XML document formats have proven to fall short on the two primary expectations of users: interoperability and Web ready. Like HTML+ :) Maybe KODAXIL will work?

    The recent Web 2.0 Conference was filled with new web services , portals and wiki efforts trying their best to mash data into document objects. iCloud, MindTouch, AppLogic, 3Tera, Caspio and Gazoodle all deserve attention. although each took a rather different approach towards solving the problem. MindTouch in particular was excellent.

    "A Montreal-based software and research development company has developed a markup solution and language-neutral asset-descriptor that when fully developed, could result in a universal computer language for representing information in databases, web and document contents and business objects."

    "While still at a seminal stage of development, the company Gnoesis, aims to address the problem of data fragmentation caused by semantic differences between developers and users from different linguistic backgrounds."

    Gnoesis, the company that has developed the language called KODAXIL (Knowledge, Object, Data, Action, and eXtensible Interoperable Language), a data and information representation language, says the new language will replace the XML function of consolidating semantically identical data streams from different languages, by creating a common language to do this.

    The extensible semantic markup associated with this language will be understood worldwide and is three times shorter than XML.
Paul Merrell

LEAKED: Secret Negotiations to Let Big Brother Go Global | Wolf Street - 0 views

  • Much has been written, at least in the alternative media, about the Trans Pacific Partnership (TPP) and the Transatlantic Trade and Investment Partnership (TTIP), two multilateral trade treaties being negotiated between the representatives of dozens of national governments and armies of corporate lawyers and lobbyists (on which you can read more here, here and here). However, much less is known about the decidedly more secretive Trade in Services Act (TiSA), which involves more countries than either of the other two. At least until now, that is. Thanks to a leaked document jointly published by the Associated Whistleblowing Press and Filtrala, the potential ramifications of the treaty being hashed out behind hermetically sealed doors in Geneva are finally seeping out into the public arena.
  • If signed, the treaty would affect all services ranging from electronic transactions and data flow, to veterinary and architecture services. It would almost certainly open the floodgates to the final wave of privatization of public services, including the provision of healthcare, education and water. Meanwhile, already privatized companies would be prevented from a re-transfer to the public sector by a so-called barring “ratchet clause” – even if the privatization failed. More worrisome still, the proposal stipulates that no participating state can stop the use, storage and exchange of personal data relating to their territorial base. Here’s more from Rosa Pavanelli, general secretary of Public Services International (PSI):
  • The leaked documents confirm our worst fears that TiSA is being used to further the interests of some of the largest corporations on earth (…) Negotiation of unrestricted data movement, internet neutrality and how electronic signatures can be used strike at the heart of individuals’ rights. Governments must come clean about what they are negotiating in these secret trade deals. Fat chance of that, especially in light of the fact that the text is designed to be almost impossible to repeal, and is to be “considered confidential” for five years after being signed. What that effectively means is that the U.S. approach to data protection (read: virtually non-existent) could very soon become the norm across 50 countries spanning the breadth and depth of the industrial world.
  • ...1 more annotation...
  • The main players in the top-secret negotiations are the United States and all 28 members of the European Union. However, the broad scope of the treaty also includes Australia, Canada, Chile, Colombia, Costa Rica, Hong Kong, Iceland, Israel, Japan, Liechtenstein, Mexico, New Zealand, Norway, Pakistan, Panama, Paraguay, Peru, South Korea, Switzerland, Taiwan and Turkey. Combined they represent almost 70 percent of all trade in services worldwide. An explicit goal of the TiSA negotiations is to overcome the exceptions in GATS that protect certain non-tariff trade barriers, such as data protection. For example, the draft Financial Services Annex of TiSA, published by Wikileaks in June 2014, would allow financial institutions, such as banks, the free transfer of data, including personal data, from one country to another. As Ralf Bendrath, a senior policy advisor to the MEP Jan Philipp Albrecht, writes in State Watch, this would constitute a radical carve-out from current European data protection rules:
Gonzalo San Gil, PhD.

A basic encryption strategy for storing sensitive data | ITworld - 0 views

  •  
    "To safely store your data in a database, you'd start by generating a strong secret key value in a byte array. This is best generated programmatically. This single key can be used to encrypt all of the data you'd like to store. "
  •  
    "To safely store your data in a database, you'd start by generating a strong secret key value in a byte array. This is best generated programmatically. This single key can be used to encrypt all of the data you'd like to store. "
Gonzalo San Gil, PhD.

Cybercriminals ​boost sales through 'data laundering' | ZDNet - 0 views

  •  
    "There is a term for this: "Data laundering", which has been defined as "obscuring, removing, or fabricating the provenance of illegally obtained data such that it may be used for lawful purposes"."
  •  
    "There is a term for this: "Data laundering", which has been defined as "obscuring, removing, or fabricating the provenance of illegally obtained data such that it may be used for lawful purposes"."
Paul Merrell

Time to 'Break Facebook Up,' Sanders Says After Leaked Docs Show Social Media Giant 'Tr... - 0 views

  • After NBC News on Wednesday published a trove of leaked documents that show how Facebook "treated user data as a bargaining chip with external app developers," White House hopeful Sen. Bernie Sanders declared that it is time "to break Facebook up."
  • When British investigative journalist Duncan Campbell first shared the trove of documents with a handful of media outlets including NBC News in April, journalists Olivia Solon and Cyrus Farivar reported that "Facebook CEO Mark Zuckerberg oversaw plans to consolidate the social network's power and control competitors by treating its users' data as a bargaining chip, while publicly proclaiming to be protecting that data." With the publication Wednesday of nearly 7,000 pages of records—which include internal Facebook emails, web chats, notes, presentations, and spreadsheets—journalists and the public can now have a closer look at exactly how the company was using the vast amount of data it collects when it came to bargaining with third parties.
  • The document dump comes as Facebook and Zuckerberg are facing widespread criticism over the company's political advertising policy, which allows candidates for elected office to lie in the ads they pay to circulate on the platform. It also comes as 47 state attorneys general, led by Letitia James of New York, are investigating the social media giant for antitrust violations.
  • ...2 more annotations...
  • According to Solon and Farivar of NBC: Taken together, they show how Zuckerberg, along with his board and management team, found ways to tap Facebook users' data—including information about friends, relationships, and photos—as leverage over the companies it partnered with. In some cases, Facebook would reward partners by giving them preferential access to certain types of user data while denying the same access to rival companies. For example, Facebook gave Amazon special access to user data because it was spending money on Facebook advertising. In another case the messaging app MessageMe was cut off from access to data because it had grown too popular and could compete with Facebook.
  • The call from Sanders (I-Vt.) Wednesday to break up Facebook follows similar but less definitive statements from the senator. One of Sanders' rivals in the 2020 Democratic presidential primary race, Sen. Elizabeth Warren (D-Mass.), released her plan to "Break Up Big Tech" in March. Zuckerberg is among the opponents of Warren's proposal, which also targets other major technology companies like Amazon and Google.
« First ‹ Previous 41 - 60 of 592 Next › Last »
Showing 20 items per page