Skip to main content

Home/ Open Web/ Group items tagged Tagged-PDF

Rss Feed Group items tagged

Gary Edwards

Pragmatic PDF: Structured Content: PDF to HTML - 1 views

  •  
    A while back I included the following as one of the areas of interest of the PDF/D Consortium: Structured Documents and Single Sourcing: improving round-trips to document softwareWhat did I mean by Structured Documents? For years Solid Documents has been converting PDF files to Word documents with a focus on retaining format and layout to allow customers to repurpose the content. While this is a great solution for a large amount of customers, it is not the only type of reconstruction that is interesting. PDF is by nature a "document" format: the layout is in the form of pages. Content also needs to exist in alternate formats like a continuously flowing stream. Use cases for continuously flowing content include:conversion to HTML to reflow for form factors other than "pages"conversion to content management systems where structure is more important than layout and formattingconversion for alternate readers for people with disabilities (text to speech, etc)Reconstruction for these use cases focuses more on the structure of the document than on the layout and formatting. For example, we need to take unstructured PDF files and recognize columns, tables, lists, headers and footers, etc. This allows us to organize the content in a logical structure. Ultimately, we'll recognize topics and sections too so that we can produce logical hierarchies from plain old non-tagged PDF files. One great example of where conventional PDF pages are not the most appropriate way to read a document are on small screens of handheld devices. For example, the typical Blackberry has a 3"x2" screen with a resolution something like 320x240 pixels.
Gary Edwards

Death of The Document - CIO Central - CIO Network - Forbes - 0 views

  •  
    Well, not quite.  More IBM happy talk about interoperability and easy document interchange.  While i agree with the static versus interactive - collaborative document perspective, it's far more complicated. Today we have a world of "native"  docs and "visual" docs.   Native docs are bound to their authoring productivity environment, and are stubbornly NOT interchangeable.  Even for ODF and OOXML formats. Visual documents are spun from natives, and they are highly interchangeable, but interactively limited.  They lack the direct interaction of native authoring environments.  The Visual document phenomenon starts with PDF and the virtual print driver.  Any authoring application(s) in a productivity environment can print a PDF using the magic of the virtual print driver.   In 2008, when ISO stamped PDF with "accessibility tags", a new, highly interactive version of PDF was offically recognized.  We know this as "Tagged PDF".  And it has led the sweeping revolution of wide implementation of the paperless transaction process. The Visual Document phenomenon doesn't stop there.  The highly mobile WebKit revolution ushered in by the 2008 iPhone phenomenon led to wide acceptance of highly interactive and collaborative, but richly visual versions of SVG and HTML5-CSS3-JSON-JavaScript documents. Today we have SVG-HTML+ type visually immersive documents spun out of Server side publication presses such as FlipBoard, Cognito cComics, QWiki, Needle, Sports Illustrated, Push Pop Press, and TreeSaver to name but a few.   Clearly the visually immersive category of documents is exploding, but not for business - productivity documents.  Adobe has proposed a "CSS Regions" standard for richly immersive layout that might change that.  But mostly i think the problem for business documents, reports and forms is that they are "compound documents" bound to desktop productivity environments and workgroups. The great transition from desktop/workgroup productivity environme
Paul Merrell

The best way to read Glenn Greenwald's 'No Place to Hide' - 0 views

  • Journalist Glenn Greenwald just dropped a pile of new secret National Security Agency documents onto the Internet. But this isn’t just some haphazard WikiLeaks-style dump. These documents, leaked to Greenwald last year by former NSA contractor Edward Snowden, are key supplemental reading material for his new book, No Place to Hide, which went on sale Tuesday. Now, you could just go buy the book in hardcover and read it like you would any other nonfiction tome. Thanks to all the additional source material, however, if any work should be read on an e-reader or computer, this is it. Here are all the links and instructions for getting the most out of No Place to Hide.
  • Greenwald has released two versions of the accompanying NSA docs: a compressed version and an uncompressed version. The only difference between these two is the quality of the PDFs. The uncompressed version clocks in at over 91MB, while the compressed version is just under 13MB. For simple reading purposes, just go with the compressed version and save yourself some storage space. Greenwald also released additional “notes” for the book, which are just citations. Unless you’re doing some scholarly research, you can skip this download.
  • No Place to Hide is, of course, available on a wide variety of ebook formats—all of which are a few dollars cheaper than the hardcover version, I might add. Pick your e-poison: Amazon, Nook, Kobo, iBooks. Flipping back and forth Each page of the documents includes a corresponding page number for the book, to allow readers to easily flip between the book text and the supporting documents. If you use the Amazon Kindle version, you also have the option of reading Greenwald’s book directly on your computer using the Kindle for PC app or directly in your browser. Yes, that may be the worst way to read a book. In this case, however, it may be the easiest way to flip back and forth between the book text and the notes and supporting documents. Of course, you can do the same on your e-reader—though it can be a bit of a pain. Those of you who own a tablet are in luck, as they provide the best way to read both ebooks and PDF files. Simply download the book using the e-reader app of your choice, download the PDFs from Greenwald’s website, and dig in. If you own a Kindle, Nook, or other ereader, you may have to convert the PDFs into a format that works well with your device. The Internet is full of tools and how-to guides for how to do this. Here’s one:
  • ...1 more annotation...
  • Kindle users also have the option of using Amazon’s Whispernet service, which converts PDFs into a format that functions best on the company’s e-reader. That will cost you a small fee, however—$0.15 per megabyte, which means the compressed Greenwald docs will cost you a whopping $1.95.
Gary Edwards

HTML5, Cloud and Mobile Create 'Perfect Storm' for Major App Dev Shift - Application De... - 0 views

  •  
    Good discussion, but it really deserves a more in-depth thrashing.  The basic concept is that a perfect storm of mobility, cloud-computing and HTML5-JavaScript has set the stage for a major, massive shift in application development.  The shift from C++ to Java is now being replaced by a greater shift from Java and C++ to JavaScript-JSON-HTML5. Interesting, but i continue to insist that the greater "Perfect Storm" triggered in 2008, is causing a platform shift from client/server computing to full on, must have "cloud-computing".   There are three major "waves"; platform shifts in the history of computing at work here.  The first wave was "Mainframe computing", otherwise known as server/terminal.  The second wave was that of "client/server" computing, where the Windows desktop eventually came to totally dominate and control the "client" side of the client/server equation. The third wave began with the Internet, and the dominance of the WWW protocols, interfaces, methods and formats.  The Web provides the foundation for the third great Wave of Cloud-Computing. The Perfect Storm of 2008 lit the fuse of the third Wave of computing.  Key to the 2008 Perfect Storm is the world wide financial collapse that put enormous pressure on businesses to cut cost and improve productivity; to do more with less, or die.  The survival maxim quickly became do more with less people - which is the most effective form of "productivity".  The nature of the collapse itself, and the kind of centralized, all powerful bailout-fascists governments that rose during the financial collapse, guaranteed that labor costs would rise dramatically while also being "uncertain".  Think government controlled healthcare. The other aspects of the 2008 Perfect Storm are mobility, HTML5, cloud-computing platform availability, and, the ISO standardization of "tagged" PDF.   The mobility bomb kicked off in late 2007, with the introduction of the Apple iPhone.  No further explanation needed :) Th
Gary Edwards

OfficeDrop: Digital Filing System, Scanner Software - 0 views

  •  
    Tagged PDF, Scanning software that connects devices directly to the OfficeDrop Cloud, Sharing of folders and documents, attaching forms to documents, Intuit add-on, eMail addresses for folders, and, an automated mail system for sending out bills and payments.  Awesome!  Way beyond DropBox, but same idea ported over to Tagged PDF forms.
Gary Edwards

Bricolage Structured Prediction Algorithm - 0 views

  •  
    I was surprised to learn that Florian's native document parser is a JSON like ripper of OpenXML visual objects.  He doesn't wrestle with structured objects, but simply treats everything as a visual object.  NOOXML might be closer to a virtual print driver than a OpenXML ripper.   So this has me rethinking the OCR/Scan methods used to rip paper documents to create Tagged PDF "structured object" versions.  Structured objects can easily be converted to interactive HTML-CSS or SVG.  Today Google released an OCR enhanced Android gDOCS app.  Not sure if it uses the Bricolage/Bento algorithm, but that would be an interesting approach. excerpt: the Bricolage algorithm for transferring design and content between Web pages. Bricolage employs a novel, structured-prediction technique that learns to create coherent mappings between pages by training on human-generated exemplars. The produced mappings are then used to automatically transfer the content from one page into the style and layout of another. We show that Bricolage can learn to accurately reproduce human page mappings, and that it provides a general, efficient, and automatic technique for retargeting content between a variety of real Web pages.
Gary Edwards

RealObjects: Next Generation HTML-CSS Online Editor - 1 views

  •  
    Advanced XML, HTML5, XHTML CSS3 editing with conversion to PDF, PDF/A and SVG.  Excellent stuff.  Good Case Studies.  Lots of tools and document source code examples.
Gary Edwards

How To Win The Cloud Wars - Forbes - 0 views

  •  
    Byron Deeter is right, but perhaps he's holding back on his reasoning.  Silicon Valley is all about platform, and platform plays only come about once every ten to twenty years.  They come like great waves of change, not replacing the previous waves as much as taking away and running with the future.   Cloud Computing is the fourth great wave.  It will replace the PC and Network Computing waves as the future.  It is the target of all developers and entrepreneurs.   The four great waves are mainframe, workstation, pc and networked pc, and the Internet.  Cloud Computing takes the Internet to such a high level of functionality that it will now replace the pc-netwroking wave.  It's going to be enormous.  Especially as enterprises move their business productivity and data / content apps from the desktop/workgroup to the Cloud.  Enormous. The key was the perfect storm of 2008, where mobility (iPhone) converged with the standardization of tagged PDF, which converged with the Cloud Computing application and data model, which all happened at the time of the great financial collapse.   The financial collapase of 2008 caused a tectonic shift in productivity.  Survival meant doing more with less.  Particularly less labor since cost of labor was and continues to be a great uncertainty.  But that's also the definition of productivity and automation.  To survive, companies were compelled to reduce labor and invest in software/hardware systems based productivity.  The great leap to a new platform had it's fuel; survival. Social applications and services are just the simplest manifestation of productivity through managed connectivity in the Cloud.  Wait until this new breed of productivity reaches business apps!  The platform wars have begun, and it's for all the marbles. One last thought.  The Internet was always going to win as the next computing platform wave.  It's the first time communications have been combined and integrated into content, and vast dat
Gary Edwards

Free desktop productivity tools that aren't OpenOffice - 0 views

  •  
    Good discussion and review of office suites, desktop publishing tools, wordprocessors, time management, and drawing/illustration tools.  covers: AbiWord, Scribus, SeaMonkey, GIMP, Paint.net, InkScape, Dia, GTD-Free (Getting Things Done), and Task Coach.  +1 for InkScape Tagged PDF editing and Paint.net as a Adobe Illustrator replacement.
Paul Merrell

Study: Surveillance will cost US tech sector more than $35B by 2016 | TheHill - 0 views

  • A new study says that the U.S. tech industry is likely to lose more than $35 billion from foreign customers by 2016 because of concerns over government surveillance.“In short, foreign customers are shunning U.S. companies,” the authors of a new study from the Information Technology and Innovation Foundation write.ADVERTISEMENT“The U.S. government’s failure to reform many of the NSA’s surveillance programs has damaged the competitiveness of the U.S. tech sector and cost it a portion of the global market share,” they said.The think tank’s report found that the cost to the tech sector associated with ongoing concerns over surveillance programs run out of the U.S. was likely to “far exceed” $35 billion by 2016, an earlier estimate set by the group.
  • The group said that lawmakers must enact additional reforms to surveillance policy if they wish to help the tech sector regain the trust of foreign customers. That includes opposing “backdoors,” which allow law enforcement to access otherwise encrypted data, and signing off on trade agreements, including the controversial Trans-Pacific Partnership, that “ban digital protectionism.”The study’s authors found that the revelations about broad U.S. surveillance programs acted as a justification for foreign policymakers to enact protectionist policies aimed at aiding their own domestic technology sectors.Foreign companies have also used the information about U.S. surveillance programs to their advantage.“Some European companies have begun to highlight where their digital services are hosted as an alternative to U.S. companies,” the authors write.
  • American companies, they found, have lost contracts to foreign competitors over fears about mass surveillance.Earlier this month, President Obama signed the USA Freedom Act, a bill that reformed the three Patriot Act provisions that authorized the bulk, warrantless collection of Americans’ phone records. The bill was widely supported by technology companies, including giants like Apple and Google.
Paul Merrell

Court upholds NSA snooping | TheHill - 0 views

  • A district court in California has issued a ruling in favor of the National Security Agency in a long-running case over the spy agency’s collection of Internet records.The challenge against the controversial Upstream program was tossed out because additional defense from the government would have required “impermissible disclosure of state secret information,” Judge Jeffrey White wrote in his decision.ADVERTISEMENTUnder the program — details of which were revealed through leaks from Edward Snowden and others — the NSA taps into the fiber cables that make up the backbone of the Internet and gathers information about people's online and phone communications. The agency then filters out communications of U.S. citizens, whose data is protected with legal defenses not extended to foreigners, and searches for “selectors” tied to a terrorist or other target.In 2008, the Electronic Frontier Foundation (EFF) sued the government over the program on behalf of five AT&T customers, who said that the collection violated the constitutional protections to privacy and free speech.
  • But “substantial details” about the program still remain classified, White, an appointee under former President George W. Bush, wrote in his decision. Moving forward with the merits of a trial would risk “exceptionally grave damage to national security,” he added. <A HREF="http://ws-na.amazon-adsystem.com/widgets/q?rt=tf_mfw&ServiceVersion=20070822&MarketPlace=US&ID=V20070822%2FUS%2Fthehill07-20%2F8001%2Fdffbe72d-f425-4b83-b07e-357ae9d405f6&Operation=NoScript">Amazon.com Widgets</A> The government has been “persuasive” in using its state secrets privilege, he continued, which allows it to withhold evidence from a case that could severely jeopardize national security.   In addition to saying that the program appeared constitutional, the judge also found that the AT&T customers did not even have the standing to sue the NSA over its data gathering.While they may be AT&T customers, White wrote that the evidence presented to the court was “insufficient to establish that the Upstream collection process operates in the manner” that they say it does, which makes it impossible to tell if their information was indeed collected in the NSA program.  The decision is a stinging rebuke to critics of the NSA, who have seen public interest in their cause slowly fade in the months since Snowden’s revelations.
  • The EFF on Tuesday evening said that it was considering next steps and noted that the court focused on just one program, not the totality of the NSA’s controversial operations.“It would be a travesty of justice if our clients are denied their day in court over the ‘secrecy’ of a program that has been front-page news for nearly a decade,” the group said in a statement.“We will continue to fight to end NSA mass surveillance.”The name of the case is Jewel v. NSA. 
  •  
    The article should have mentioned that the decision was on cross-motions for *partial* summary judgment. The Jewel case will proceed on other plaintiff claims. 
Gary Edwards

Target Survey - the Open Siddur Project Development Wiki - 0 views

  •  
    The ultimate goals are to have a computer-viewable display format (XHTML) and at least one printable format. We may also want a post-processing editable format. Our farthest target as yet is XHTML, styled by CSS. For a printed format, one expects a complete target to be able to produce a document that has features which one would expect of any Siddur: page numbers, table of contents, footnotes, side notes, header/page title, etc. XHTML originated as a computer-display format, not a publishing format. Even when combined with CSS 2.1, it does not support some of the features above (with some hacking, side notes, a static header/footer, and page numbers are possible, but it is still missing vital features). CSS3 is more publishing friendly, when implemented, will make life much easier. Until then, we will have to be a bit more creative. The following is a list of software libraries and formats that can help us increase the range of formats that we can target. XSLT or Java are the preferred languages, since the rest of our chain is in XSLT, and driven by Saxon, which is written in Java, allowing us to bundle the entire chain in a portable program, which can be distributed ( with the added bonus of being able to be distributed within a web browser as an applet ).
Gary Edwards

Adobe proposes standard for magazine-like Web | Deep Tech - CNET News - 0 views

  •  
    Adobe Systems has proposed a standard that could make it easier to create Web pages with fancy layouts seen more often in magazines. The company proposed a technology it calls CSS Regions (PDF) yesterday to the World Wide Web Consortium, which standardizes the Cascading Style Sheets technology widely used to control formatting on Web pages. Adobe also described the technology at a CSS Working Group meeting in Silicon Valley. "This proposal is intended to support sophisticated, magazine-style layouts using CSS," said Arno Gourdol, director of engineering for runtime foundation at Adobe, in a mailing list posting.
Gary Edwards

Mars:FAQ - Adobe Labs - 0 views

    • Gary Edwards
       
      Sounds like docubase "layers" to me.
  • auxiliary content
  • document assembly and disassembly b
  • ...5 more annotations...
    • Gary Edwards
       
      The Acrobat 8 Reader can read Tagged PDF, MARS and Flash.  Flash uses SWF-FLA, a proprietary version of SVG.  Funny they would use SVG (with namespace customization) for MARS.
  • Anyone over the age of 18, or minors with parental permission, can
  • ocument.
  • create a Mars d
    • Gary Edwards
       
      Wow, anyone can create a MARS document.  Even OpenOffice?  How about Florian's NOOXML Trellis?
1 - 14 of 14
Showing 20 items per page