Skip to main content

Home/ Open Web/ Group items tagged developer-tool-interface

Rss Feed Group items tagged

Paul Merrell

Last Call Working Draft -- W3C Authoring Tool Accessibility Guidelines (ATAG) 2.0 - 0 views

  • This is a Working Draft of the Authoring Tool Accessibility Guidelines (ATAG) version 2.0. This document includes recommendations for assisting authoring tool developers to make the authoring tools that they develop more accessible to people with disabilities, including blindness and low vision, deafness and hearing loss, learning disabilities, cognitive limitations, motor difficulties, speech difficulties, and others. Accessibility, from an authoring tool perspective, includes addressing the needs of two (potentially overlapping) user groups with disabilities: authors of web content, whose needs are met by ensuring that the authoring tool user interface itself is accessible (addressed by Part A of the guidelines), and end users of web content, whose needs are met by ensuring that all authors are enabled, supported, and guided towards producing accessible web content (addressed by Part B of the guidelines).
  • Examples of authoring tools: ATAG 2.0 applies to a wide variety of web content generating applications, including, but not limited to: web page authoring tools (e.g., WYSIWYG HTML editors) software for directly editing source code (see note below) software for converting to web content technologies (e.g., "Save as HTML" features in office suites) integrated development environments (e.g., for web application development) software that generates web content on the basis of templates, scripts, command-line input or "wizard"-type processes software for rapidly updating portions of web pages (e.g., blogging, wikis, online forums) software for generating/managing entire web sites (e.g., content management systems, courseware tools, content aggregators) email clients that send messages in web content technologies multimedia authoring tools debugging tools for web content software for creating mobile web applications
  • Web-based and non-web-based: ATAG 2.0 applies equally to authoring tools of web content that are web-based, non-web-based or a combination (e.g., a non-web-based markup editor with a web-based help system, a web-based content management system with a non-web-based file uploader client). Real-time publishing: ATAG 2.0 applies to authoring tools with workflows that involve real-time publishing of web content (e.g., some collaborative tools). For these authoring tools, conformance to Part B of ATAG 2.0 may involve some combination of real-time accessibility supports and additional accessibility supports available after the real-time authoring session (e.g., the ability to add captions for audio that was initially published in real-time). For more information, see the Implementing ATAG 2.0 - Appendix E: Real-time content production. Text Editors: ATAG 2.0 is not intended to apply to simple text editors that can be used to edit source content, but that include no support for the production of any particular web content technology. In contrast, ATAG 2.0 can apply to more sophisticated source content editors that support the production of specific web content technologies (e.g., with syntax checking, markup prediction, etc.).
  •  
    Link is the latest version link so page should update when this specification graduates to a W3C recommendation.
Gary Edwards

Cloud file-sharing for enterprise users - 1 views

  •  
    Quick review of different sync-share-store services, starting with DropBox and ending with three Open Source services. Very interesting. Things have progressed since I last worked on the SurDocs project for Sursen. No mention in this review of file formats, conversion or viewing issues. I do know that CrocoDoc is used by near every sync-share-store service to convert documents to either pdf or html formats for viewing. No servie however has been able to hit the "native document" sweet spot. Not even SurDocs - which was the whole purpose behind the project!!! "Native Documents" means that the document is in it's native / original application format. That format is needed for the round tripping and reloading of the document. Although most sync-share-store services work with MSOffice OXML formatted documents, only Microsoft provides a true "native" format viewer (Office 365). Office 365 enables direct edit, view and collaboration on native documents. Which is an enormous advantage given that conversion of any sort is guaranteed to "break" a native document and disrupt any related business processes or round tripping need. It was here that SurDoc was to provide a break-through technology. Sadly, we're still waiting :( excerpt: The availability of cheap, easy-to-use and accessible cloud file-sharing services means users have more freedom and choice than ever before. Dropbox pioneered simplicity and ease of use, and so quickly picked up users inside the enterprise. Similar services have followed Dropbox's lead and now there are dozens, including well-known ones such as Google Drive, SkyDrive and Ubuntu One. cloud.jpg Valdis Filks , research director at analyst firm Gartner explained the appeal of cloud file-sharing services. Filks said: "Enterprise employees use Dropbox and Google because they are consumer products that are simple to use, can be purchased without officially requesting new infrastructure or budget expenditure, and can be installed qu
  •  
    Odd that the reporter mentions the importance of security near the top of the article but gives that topic such short shrift in his evaluation of the services. For example, "secured by 256-bit AES encryption" is meaningless without discussing other factors such as: [i] who creates the encryption keys and on which side of the server/client divide; and [ii] the service's ability to decrypt the customer's content. Encrypt/decryt must be done on the client side using unique keys that are unknown to the service, else security is broken and if the service does business in the U.S. or any of its territories or possessions, it is subject to gagged orders to turn over the decrypted customer information. My wisdom so far is to avoid file sync services to the extent you can, boycott U.S. services until the spy agencies are encaged, and reward services that provide good security from nations with more respect for digital privacy, to give U.S.-based services an incentive to lobby *effectively* on behalf of their customer's privacy in Congress. The proof that they are not doing so is the complete absence of bills in Congress that would deal effectively with the abuse by U.S. spy agencies. From that standpoint, the Switzerland-based http://wuala.com/ file sync service is looking pretty good so far. I'm using it.
Gary Edwards

Chrome Developer Tools: Remote Debugging - Google Chrome Developer Tools - Google Code - 0 views

  •  
    Incredible.  I'm wondering if either Jason or florian has thought about using the Chrome JSON messaging layer to expose docx conversions to OTXML?  Essentially, when Florian breaks a .docx document, he only deals with the objects and how they are positioned (layout) on a page.  Once captured and described, these xObjects could then be converted to JSON.  The Chrome web client/ web server port (9222) could then, theoretically be used to observe the JSON xObjects?  Interesting. intro:  Under the hood, Chrome Developer Tools is a web application written in HTML, JavaScript and CSS. It has a special binding available at JavaScript runtime that allows interacting with chrome pages and instrumenting them. Interaction protocol consists of commands that are sent to the page and events that the page is generating. Although Chrome Developer Tools is the only client of this protocol, there are ways for third parties to bypass it and start instrumenting browser pages explicitly. We will describe the ways it could be done below. Contents Protocol Debugging over the wire Using debugger extension API
Paul Merrell

Project Summary - 3 views

  • Maqetta is an open source technology initiative at Dojo Foundation that provides WYSIWYG tooling in the cloud for HTML5 (desktop and mobile). Maqetta allows User Experience Designers (UXD) to perform drag/drop assembly of live UI mockups. One of Maqetta's key design goals is to create developer-ready UI mockups that promote efficient hand-off from designers to developers. The user interfaces created by Maqetta are real-life web applications that can be handed off to developers, who can then transform the application incrementally from UI mockup into final shipping application. While we expect the Maqetta-created mockups often will go through major code changes, Maqetta is designed to promote preservation of visual assets, particularly the CSS style sheets, across the development life cycle. As a result, the careful pixel-level styling efforts by the UI team will carry through into the final shipping application. To help with the designer/developer hand-off, Maqetta includes a "download into ZIP" feature to create a ZIP image that can be imported into a developer tool workspace (e.g., Eclipse IDE). For team development, Maqetta includes a web-based review&commenting features with forum-style comments and on-canvas annotations.
  • Maqetta includes: a WYSIWYG visual page editor for drawing out user interfaces drag/drop mobile UI authoring within an exact-dimension device silhouette, such as the silhouette of an iPhone simultaneous editing in either design or source views deep support for CSS styling (the application includes a full CSS parser/modeler) a mechanism for organizing a UI prototype into a series of "application states" (aka "screens" or "panels") which allows a UI designer to define interactivity without programming a web-based review and commenting feature where the author can submit a live UI mockup for review by his team members a "wireframing" feature that allows UI designers to create UI proposals that have a hand-drawn look a theme editor for customizing the visual styling of a collection of widgets export options that allow for smooth hand-off of the UI mockups into leading developer tools such as Eclipse Maqetta's code base has a toolkit-independent architecture that allows for plugging in arbitrary widget libraries and CSS themes.
Gary Edwards

The Anatomy of an iPhone Site | Build Internet! - 0 views

  •  
    In today's world the internet travels. Not just through laptops and wireless signal, but through a growing number of smart phones. The trick? Getting your site to travel just as well. Build to Touch:  The iPhone did two things differently. The full browser was a good first, but the second changed the fundamentals of interaction in a new direction. The phone is driven by touch. The best applications and websites have navigations that compliment this. Buttons are larger and more accommodating, and interfaces become more intuitive when they seem tactile. The iPhone did two things differently. The full browser was a good first, but the second changed the fundamentals of interaction in a new direction. The phone is driven by touch. The best applications and websites have navigations that compliment this. Buttons are larger and more accommodating, and interfaces become more intuitive when they seem tactile. For the average web designer, you'll save yourself a significant amount of time and headache by simply giving the site some iPhone sensitive browser design. Applications must be approved before going live, and can require extensive knowledge of development tools.
Gary Edwards

Adobe Edge beta brings Flash-style design to HTML5 - 2 views

  •  
    While HTML5 developers are working directly with JavaScript, SVG, CSS, and other technologies, Flash designers enjoy a high-level environment with timelines, drawing tools, easy control of animation effects, and more. With Edge, released in beta Sunday, Adobe is striving to bring that same ease of use to HTML5 development. The user interface will be familiar to anyone who's used Flash or After Effects; a timeline allowing scrubbing and jumping to any point in an animation, properties panels to adjust objects, and a panel to show the actual animation. Behind the scenes, Edge uses standard HTML5. Scripting is provided by a combination of jQuery and Adobe's own scripts, and animation and styling uses both scripts and CSS. Pages produced by Edge encode the actual animations using a convenient JSON format. Edge itself embeds the WebKit rendering engine-the same one used in Apple's Safari browser and Google's Chrome-to actually display the animations.
Paul Merrell

From Radio to Porn, British Spies Track Web Users' Online Identities - 0 views

  • HERE WAS A SIMPLE AIM at the heart of the top-secret program: Record the website browsing habits of “every visible user on the Internet.” Before long, billions of digital records about ordinary people’s online activities were being stored every day. Among them were details cataloging visits to porn, social media and news websites, search engines, chat forums, and blogs. The mass surveillance operation — code-named KARMA POLICE — was launched by British spies about seven years ago without any public debate or scrutiny. It was just one part of a giant global Internet spying apparatus built by the United Kingdom’s electronic eavesdropping agency, Government Communications Headquarters, or GCHQ. The revelations about the scope of the British agency’s surveillance are contained in documents obtained by The Intercept from National Security Agency whistleblower Edward Snowden. Previous reports based on the leaked files have exposed how GCHQ taps into Internet cables to monitor communications on a vast scale, but many details about what happens to the data after it has been vacuumed up have remained unclear.
  • Amid a renewed push from the U.K. government for more surveillance powers, more than two dozen documents being disclosed today by The Intercept reveal for the first time several major strands of GCHQ’s existing electronic eavesdropping capabilities.
  • The surveillance is underpinned by an opaque legal regime that has authorized GCHQ to sift through huge archives of metadata about the private phone calls, emails and Internet browsing logs of Brits, Americans, and any other citizens — all without a court order or judicial warrant
  • ...17 more annotations...
  • A huge volume of the Internet data GCHQ collects flows directly into a massive repository named Black Hole, which is at the core of the agency’s online spying operations, storing raw logs of intercepted material before it has been subject to analysis. Black Hole contains data collected by GCHQ as part of bulk “unselected” surveillance, meaning it is not focused on particular “selected” targets and instead includes troves of data indiscriminately swept up about ordinary people’s online activities. Between August 2007 and March 2009, GCHQ documents say that Black Hole was used to store more than 1.1 trillion “events” — a term the agency uses to refer to metadata records — with about 10 billion new entries added every day. As of March 2009, the largest slice of data Black Hole held — 41 percent — was about people’s Internet browsing histories. The rest included a combination of email and instant messenger records, details about search engine queries, information about social media activity, logs related to hacking operations, and data on people’s use of tools to browse the Internet anonymously.
  • Throughout this period, as smartphone sales started to boom, the frequency of people’s Internet use was steadily increasing. In tandem, British spies were working frantically to bolster their spying capabilities, with plans afoot to expand the size of Black Hole and other repositories to handle an avalanche of new data. By 2010, according to the documents, GCHQ was logging 30 billion metadata records per day. By 2012, collection had increased to 50 billion per day, and work was underway to double capacity to 100 billion. The agency was developing “unprecedented” techniques to perform what it called “population-scale” data mining, monitoring all communications across entire countries in an effort to detect patterns or behaviors deemed suspicious. It was creating what it said would be, by 2013, “the world’s biggest” surveillance engine “to run cyber operations and to access better, more valued data for customers to make a real world difference.”
  • A document from the GCHQ target analysis center (GTAC) shows the Black Hole repository’s structure.
  • The data is searched by GCHQ analysts in a hunt for behavior online that could be connected to terrorism or other criminal activity. But it has also served a broader and more controversial purpose — helping the agency hack into European companies’ computer networks. In the lead up to its secret mission targeting Netherlands-based Gemalto, the largest SIM card manufacturer in the world, GCHQ used MUTANT BROTH in an effort to identify the company’s employees so it could hack into their computers. The system helped the agency analyze intercepted Facebook cookies it believed were associated with Gemalto staff located at offices in France and Poland. GCHQ later successfully infiltrated Gemalto’s internal networks, stealing encryption keys produced by the company that protect the privacy of cell phone communications.
  • Similarly, MUTANT BROTH proved integral to GCHQ’s hack of Belgian telecommunications provider Belgacom. The agency entered IP addresses associated with Belgacom into MUTANT BROTH to uncover information about the company’s employees. Cookies associated with the IPs revealed the Google, Yahoo, and LinkedIn accounts of three Belgacom engineers, whose computers were then targeted by the agency and infected with malware. The hacking operation resulted in GCHQ gaining deep access into the most sensitive parts of Belgacom’s internal systems, granting British spies the ability to intercept communications passing through the company’s networks.
  • In March, a U.K. parliamentary committee published the findings of an 18-month review of GCHQ’s operations and called for an overhaul of the laws that regulate the spying. The committee raised concerns about the agency gathering what it described as “bulk personal datasets” being held about “a wide range of people.” However, it censored the section of the report describing what these “datasets” contained, despite acknowledging that they “may be highly intrusive.” The Snowden documents shine light on some of the core GCHQ bulk data-gathering programs that the committee was likely referring to — pulling back the veil of secrecy that has shielded some of the agency’s most controversial surveillance operations from public scrutiny. KARMA POLICE and MUTANT BROTH are among the key bulk collection systems. But they do not operate in isolation — and the scope of GCHQ’s spying extends far beyond them.
  • The agency operates a bewildering array of other eavesdropping systems, each serving its own specific purpose and designated a unique code name, such as: SOCIAL ANTHROPOID, which is used to analyze metadata on emails, instant messenger chats, social media connections and conversations, plus “telephony” metadata about phone calls, cell phone locations, text and multimedia messages; MEMORY HOLE, which logs queries entered into search engines and associates each search with an IP address; MARBLED GECKO, which sifts through details about searches people have entered into Google Maps and Google Earth; and INFINITE MONKEYS, which analyzes data about the usage of online bulletin boards and forums. GCHQ has other programs that it uses to analyze the content of intercepted communications, such as the full written body of emails and the audio of phone calls. One of the most important content collection capabilities is TEMPORA, which mines vast amounts of emails, instant messages, voice calls and other communications and makes them accessible through a Google-style search tool named XKEYSCORE.
  • As of September 2012, TEMPORA was collecting “more than 40 billion pieces of content a day” and it was being used to spy on people across Europe, the Middle East, and North Africa, according to a top-secret memo outlining the scope of the program. The existence of TEMPORA was first revealed by The Guardian in June 2013. To analyze all of the communications it intercepts and to build a profile of the individuals it is monitoring, GCHQ uses a variety of different tools that can pull together all of the relevant information and make it accessible through a single interface. SAMUEL PEPYS is one such tool, built by the British spies to analyze both the content and metadata of emails, browsing sessions, and instant messages as they are being intercepted in real time. One screenshot of SAMUEL PEPYS in action shows the agency using it to monitor an individual in Sweden who visited a page about GCHQ on the U.S.-based anti-secrecy website Cryptome.
  • Partly due to the U.K.’s geographic location — situated between the United States and the western edge of continental Europe — a large amount of the world’s Internet traffic passes through its territory across international data cables. In 2010, GCHQ noted that what amounted to “25 percent of all Internet traffic” was transiting the U.K. through some 1,600 different cables. The agency said that it could “survey the majority of the 1,600” and “select the most valuable to switch into our processing systems.”
  • According to Joss Wright, a research fellow at the University of Oxford’s Internet Institute, tapping into the cables allows GCHQ to monitor a large portion of foreign communications. But the cables also transport masses of wholly domestic British emails and online chats, because when anyone in the U.K. sends an email or visits a website, their computer will routinely send and receive data from servers that are located overseas. “I could send a message from my computer here [in England] to my wife’s computer in the next room and on its way it could go through the U.S., France, and other countries,” Wright says. “That’s just the way the Internet is designed.” In other words, Wright adds, that means “a lot” of British data and communications transit across international cables daily, and are liable to be swept into GCHQ’s databases.
  • A map from a classified GCHQ presentation about intercepting communications from undersea cables. GCHQ is authorized to conduct dragnet surveillance of the international data cables through so-called external warrants that are signed off by a government minister. The external warrants permit the agency to monitor communications in foreign countries as well as British citizens’ international calls and emails — for example, a call from Islamabad to London. They prohibit GCHQ from reading or listening to the content of “internal” U.K. to U.K. emails and phone calls, which are supposed to be filtered out from GCHQ’s systems if they are inadvertently intercepted unless additional authorization is granted to scrutinize them. However, the same rules do not apply to metadata. A little-known loophole in the law allows GCHQ to use external warrants to collect and analyze bulk metadata about the emails, phone calls, and Internet browsing activities of British people, citizens of closely allied countries, and others, regardless of whether the data is derived from domestic U.K. to U.K. communications and browsing sessions or otherwise. In March, the existence of this loophole was quietly acknowledged by the U.K. parliamentary committee’s surveillance review, which stated in a section of its report that “special protection and additional safeguards” did not apply to metadata swept up using external warrants and that domestic British metadata could therefore be lawfully “returned as a result of searches” conducted by GCHQ.
  • Perhaps unsurprisingly, GCHQ appears to have readily exploited this obscure legal technicality. Secret policy guidance papers issued to the agency’s analysts instruct them that they can sift through huge troves of indiscriminately collected metadata records to spy on anyone regardless of their nationality. The guidance makes clear that there is no exemption or extra privacy protection for British people or citizens from countries that are members of the Five Eyes, a surveillance alliance that the U.K. is part of alongside the U.S., Canada, Australia, and New Zealand. “If you are searching a purely Events only database such as MUTANT BROTH, the issue of location does not occur,” states one internal GCHQ policy document, which is marked with a “last modified” date of July 2012. The document adds that analysts are free to search the databases for British metadata “without further authorization” by inputing a U.K. “selector,” meaning a unique identifier such as a person’s email or IP address, username, or phone number. Authorization is “not needed for individuals in the U.K.,” another GCHQ document explains, because metadata has been judged “less intrusive than communications content.” All the spies are required to do to mine the metadata troves is write a short “justification” or “reason” for each search they conduct and then click a button on their computer screen.
  • Intelligence GCHQ collects on British persons of interest is shared with domestic security agency MI5, which usually takes the lead on spying operations within the U.K. MI5 conducts its own extensive domestic surveillance as part of a program called DIGINT (digital intelligence).
  • GCHQ’s documents suggest that it typically retains metadata for periods of between 30 days to six months. It stores the content of communications for a shorter period of time, varying between three to 30 days. The retention periods can be extended if deemed necessary for “cyber defense.” One secret policy paper dated from January 2010 lists the wide range of information the agency classes as metadata — including location data that could be used to track your movements, your email, instant messenger, and social networking “buddy lists,” logs showing who you have communicated with by phone or email, the passwords you use to access “communications services” (such as an email account), and information about websites you have viewed.
  • Records showing the full website addresses you have visited — for instance, www.gchq.gov.uk/what_we_do — are treated as content. But the first part of an address you have visited — for instance, www.gchq.gov.uk — is treated as metadata. In isolation, a single metadata record of a phone call, email, or website visit may not reveal much about a person’s private life, according to Ethan Zuckerman, director of Massachusetts Institute of Technology’s Center for Civic Media. But if accumulated and analyzed over a period of weeks or months, these details would be “extremely personal,” he told The Intercept, because they could reveal a person’s movements, habits, religious beliefs, political views, relationships, and even sexual preferences. For Zuckerman, who has studied the social and political ramifications of surveillance, the most concerning aspect of large-scale government data collection is that it can be “corrosive towards democracy” — leading to a chilling effect on freedom of expression and communication. “Once we know there’s a reasonable chance that we are being watched in one fashion or another it’s hard for that not to have a ‘panopticon effect,’” he said, “where we think and behave differently based on the assumption that people may be watching and paying attention to what we are doing.”
  • When compared to surveillance rules in place in the U.S., GCHQ notes in one document that the U.K. has “a light oversight regime.” The more lax British spying regulations are reflected in secret internal rules that highlight greater restrictions on how NSA databases can be accessed. The NSA’s troves can be searched for data on British citizens, one document states, but they cannot be mined for information about Americans or other citizens from countries in the Five Eyes alliance. No such constraints are placed on GCHQ’s own databases, which can be sifted for records on the phone calls, emails, and Internet usage of Brits, Americans, and citizens from any other country. The scope of GCHQ’s surveillance powers explain in part why Snowden told The Guardian in June 2013 that U.K. surveillance is “worse than the U.S.” In an interview with Der Spiegel in July 2013, Snowden added that British Internet cables were “radioactive” and joked: “Even the Queen’s selfies to the pool boy get logged.”
  • In recent years, the biggest barrier to GCHQ’s mass collection of data does not appear to have come in the form of legal or policy restrictions. Rather, it is the increased use of encryption technology that protects the privacy of communications that has posed the biggest potential hindrance to the agency’s activities. “The spread of encryption … threatens our ability to do effective target discovery/development,” says a top-secret report co-authored by an official from the British agency and an NSA employee in 2011. “Pertinent metadata events will be locked within the encrypted channels and difficult, if not impossible, to prise out,” the report says, adding that the agencies were working on a plan that would “(hopefully) allow our Internet Exploitation strategy to prevail.”
Paul Merrell

IBM Launches Maqetta HTML5 Tool as Open-Source Answer to Flash, Silverlight - Applicati... - 0 views

  • IBM announced Maqetta, an HTML5 authoring tool for building desktop and mobile user interfaces, and also announced the contribution of the open-source technology to the Dojo Foundation.
Paul Merrell

The People and Tech Behind the Panama Papers - Features - Source: An OpenNews project - 0 views

  • Then we put the data up, but the problem with Solr was it didn’t have a user interface, so we used Project Blacklight, which is open source software normally used by librarians. We used it for the journalists. It’s simple because it allows you to do faceted search—so, for example, you can facet by the folder structure of the leak, by years, by type of file. There were more complex things—it supports queries in regular expressions, so the more advanced users were able to search for documents with a certain pattern of numbers that, for example, passports use. You could also preview and download the documents. ICIJ open-sourced the code of our document processing chain, created by our web developer Matthew Caruana Galizia. We also developed a batch-searching feature. So say you were looking for politicians in your country—you just run it through the system, and you upload your list to Blacklight and you would get a CSV back saying yes, there are matches for these names—not only exact matches, but also matches based on proximity. So you would say “I want Mar Cabra proximity 2” and that would give you “Mar Cabra,” “Mar whatever Cabra,” “Cabra, Mar,”—so that was good, because very quickly journalists were able to see… I have this list of politicians and they are in the data!
  • Last Sunday, April 3, the first stories emerging from the leaked dataset known as the Panama Papers were published by a global partnership of news organizations working in coordination with the International Consortium of Investigative Journalists, or ICIJ. As we begin the second week of reporting on the leak, Iceland’s Prime Minister has been forced to resign, Germany has announced plans to end anonymous corporate ownership, governments around the world launched investigations into wealthy citizens’ participation in tax havens, the Russian government announced that the investigation was an anti-Putin propaganda operation, and the Chinese government banned mentions of the leak in Chinese media. As the ICIJ-led consortium prepares for its second major wave of reporting on the Panama Papers, we spoke with Mar Cabra, editor of ICIJ’s Data & Research unit and lead coordinator of the data analysis and infrastructure work behind the leak. In our conversation, Cabra reveals ICIJ’s years-long effort to build a series of secure communication and analysis platforms in support of genuinely global investigative reporting collaborations.
  • For communication, we have the Global I-Hub, which is a platform based on open source software called Oxwall. Oxwall is a social network, like Facebook, which has a wall when you log in with the latest in your network—it has forum topics, links, you can share files, and you can chat with people in real time.
  • ...3 more annotations...
  • We had the data in a relational database format in SQL, and thanks to ETL (Extract, Transform, and Load) software Talend, we were able to easily transform the data from SQL to Neo4j (the graph-database format we used). Once the data was transformed, it was just a matter of plugging it into Linkurious, and in a couple of minutes, you have it visualized—in a networked way, so anyone can log in from anywhere in the world. That was another reason we really liked Linkurious and Neo4j—they’re very quick when representing graph data, and the visualizations were easy to understand for everybody. The not-very-tech-savvy reporter could expand the docs like magic, and more technically expert reporters and programmers could use the Neo4j query language, Cypher, to do more complex queries, like show me everybody within two degrees of separation of this person, or show me all the connected dots…
  • We believe in open source technology and try to use it as much as possible. We used Apache Solr for the indexing and Apache Tika for document processing, and it’s great because it processes dozens of different formats and it’s very powerful. Tika interacts with Tesseract, so we did the OCRing on Tesseract. To OCR the images, we created an army of 30–40 temporary servers in Amazon that allowed us to process the documents in parallel and do parallel OCR-ing. If it was very slow, we’d increase the number of servers—if it was going fine, we would decrease because of course those servers have a cost.
  • For the visualization of the Mossack Fonseca internal database, we worked with another tool called Linkurious. It’s not open source, it’s licensed software, but we have an agreement with them, and they allowed us to work with it. It allows you to represent data in graphs. We had a version of Linkurious on our servers, so no one else had the data. It was pretty intuitive—journalists had to click on dots that expanded, basically, and could search the names.
Paul Merrell

RDFa API - 0 views

  • RDFa APIAn API for extracting structured data from Web documentsW3C Working Draft 08 June 2010
  • RDFa [RDFA-CORE] enables authors to publish structured information that is both human- and machine-readable. Concepts that have traditionally been difficult for machines to detect, like people, places, events, music, movies, and recipes, are now easily marked up in Web documents. While publishing this data is vital to the growth of Linked Data, using the information to improve the collective utility of the Web for humankind is the true goal. To accomplish this goal, it must be simple for Web developers to extract and utilize structured information from a Web document. This document details such a mechanism; an RDFa Document Object Model Application Programming Interface (RDFa DOM API) that allows simple extraction and usage of structured information from a Web document.
  • This document is a detailed specification for an RDFa DOM API. The document is primarily intended for the following audiences: User Agent developers that are providing a mechanism to programatically extract RDF Triples from RDFa in a host language such as XHTML+RDFa [XHTML-RDFA], HTML+RDFa [HTML-RDFA] or SVG Tiny 1.2 [SVGTINY12], DOM tool developers that want to provide a mechanism for extracting RDFa content via programming languages such as JavaScript, Python, Ruby, or Perl, and Developers that want to understand the inner workings and design criteria for the RDFa DOM API.
1 - 10 of 10
Showing 20 items per page