Print  
Gray star Gray star Gray star Gray star Gray star --Not rated--
765 Visits 35 Comments
Created
Marcel Ramaker Marcel Ramaker
Modified by
Marcel Ramaker Marcel Ramaker
Mar 1, 2009 5:48 PM
Kablink Component
  • -- select one --

I've configured the document-converter as explained by Brent in his wiki. When started at the console or at boottime soffice is bound to port 8100 and listening.

When uploading or moving a document the following error-line is written to catalina.out

2009-03-01 15:32:07,745 ERROR [http-8080-6] [org.kablink.teaming.module.folder.impl.DefaultFolderCoreProcessor] - org.kablink.teaming.docconverter.DocConverterException: com.sun.star.task.ErrorCodeIOException:

When using the [VIEW] link the only thing that seems to happen is opening a new empty browser window. No activity seen in at the backend.   

Workflow
Process State Action
Discussion workflow Active
This entry is currently active
Attachments(1)
Entry History
File Versions
Tags
 
  File Name Version Status Date Size Modified by Actions Edit
catalina.out V1.0
Mar 1, 2009  2:55 PM 16KB Marcel Ramaker  
Replies
Thumbnail Image
Marcel Dekker Marcel Dekker
Hi Marcel, I have the same problem with a blank screen, but I don't have the error in Catalina.out.
Thumbnail Image
Marcel Dekker Marcel Dekker
Just found out that it is indexing the file for searching, so OpenOffice.org can probably not be used to generate HTML views.
Thumbnail Image
Brent McConnell Brent McConnell
This has been filed as a bug.  We need to figure out a better solution when replacing Novell Teaming's Stellent converters with Open Office converters.
Thumbnail Image
Thomas Jan Vennström Thomas Jan Vennström

One idea would be to have a white list instead of the black list.

We have a lot of problems with ppl checking in all sorts of wierd files, files that crashes OO when being indexed/converted.

And that forces me to update the ssf.properties blacklist several times a week (as well as restarting OO).

If there would be a white list instead, life would be a lot easier.

/ Thomas

Thumbnail Image
Dennis R Foster Dennis R Foster

Reference Bugzilla bug  480931 (https://bugzilla.novell.com/show_bug.cgi?id=480931)

- - - - -

The issue here is that the XHTML produced by the OpenOffice 3.x converters contain a reference to an invalid DTD.  The fix is that while parsing the XHTML, to replace the references to the invalid DTD by a references to a valid one.

The XHTML produced by OO 3.x contains a reference to the following DTD:

     http://www.w3.org/TR/MathML2/dtd/xhtml-math11-f.dtd

In researching this (using Google, ...) based on the exception Teaming is getting when trying to parse the XHTML, I stumbled across the following web page:

     http://www.nabble.com/Error-message-is-not-helpful-td21048983.html

That page says to use the following DTD instead:

     http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd

The ultimate fix was when parsing XHTML returned by the OO converters, to replace references to the bogus DTD with references to the valid one.

- - - - -

The only work around I see without this coding change would be to revert to a 2.x version of OpenOffice for use as the converter.  The XHTML produced by OO 2.x does NOT refer to the bogus DTD.

 

Thumbnail Image
Marcel Dekker Marcel Dekker
Hi, I am using OO.o version 2.3.0.12, but that will still give you a blank HTML page. It is indexing the contents though.
Thumbnail Image
Marcel Dekker Marcel Dekker

Hi, just to be complete about the error when using OO.o 2.3. Below is the error I see in my logs when posting a PDF file.

2009-03-05 09:40:23,855 WARN  [http-192.168.10.24-80-6] [org.kablink.teaming.module.authentication.impl.AuthenticationModuleImpl] - Authentication failure for zone 1: org.springframework.security.userdetails.UsernameNotFoundException: No such user; nested exception is org.kablink.teaming.security.authentication.UserDoesNotExistException: Authentication failed: Unrecognized user [liferay.com,a]
2009-03-05 11:39:00,409 ERROR [http-192.168.10.24-80-6] [org.kablink.teaming.docconverter.impl.TextOpenOfficeConverter] - OpenOffice Text Converter, could not load file: file:///home/teamingdata/cachefilestore/liferay.com/0/44/folderEntry_6/ff8080821fd5c8a1011fd636ef310013/text/s123624953842830838.pdf
2009-03-05 11:39:00,423 WARN  [http-192.168.10.24-80-6] [org.kablink.teaming.docconverter.impl.TextOpenOfficeConverter] - Failed to convert file: BenB - Offerte iZyExtern.pdf in Binder: /werkruimtes/Persoonlijke werkruimtes/Admin (admin)/Bestandsmap
org.dom4j.DocumentException: Error on line -1 of document  : Premature end of file. Nested exception: Premature end of file.

And this one when trying to view as HTML

2009-03-05 12:10:41,620 ERROR [http-192.168.10.24-80-5] [org.kablink.teaming.docconverter.impl.HtmlOpenOfficeConverter] - OpenOffice Html Converter, could not load file: file:///home/teamingdata/cachefilestore/liferay.com/0/44/folderEntry_6/ff8080821fd5c8a1011fd636ef310013/html/s123625144158430850.pdf

Thumbnail Image
Dennis R Foster Dennis R Foster

Ok, what you're seeing is NOT a manifestation of the problem I fixed (obviously :-)

In discussing this with others, our "best guess" is related to the fact that the Teaming 1.x system used a directory named /home/icecoredata.  With Teaming 2.0, this has changed to /home/teamingdata.  When you installed Teaming 2.0, did your install take care of this renaming properly?

It appears as though you have Teaming 1.x data in it's default directories but that Teaming 2.x is referencing the new directory names.

Thumbnail Image
Marcel Dekker Marcel Dekker

Where do you see the path to /home/icecoredata? The system is a fresh build of Kablink version 10600 I downloaded this morning and I have made sure the paths where created as you describe. With every build I use I start with a clean system, so I don't think it's a problem with the path. The file that is uploaded is in the correct place (home/teamingdata). To give you more information I've attached my catalina.out file.

Attachments
  File Name Version Status Date Size Modified by Actions Edit
catalina.out V1.0
Mar 5, 2009  7:44 PM 55KB Marcel Dekker  
Thumbnail Image
Marcel Dekker Marcel Dekker
BTW, if I look at the place that is mentioned in the log I do see a file is being created (html folder), but it's 0 bytes in size.
Thumbnail Image
Dennis R Foster Dennis R Foster

We've seen problems where somebody will have a database and file system populated from Teaming 1.x (using /home/icecoredata) and then make a clean install of Teaming 2.x (using /home/teamingdata) pointing to the Teaming 1.x data.

The problem in this scenario is that there are references to files being generated using /home/teamingdata to files that existed in /home/icecoredata.

Again, this was simply a guess.

Thumbnail Image
Marcel Dekker Marcel Dekker
OK, Thanks for looking in to it. I have another freshly installed server to work with, so I will install Kablink on that to see if the error will solve itself.
Thumbnail Image
Marcel Ramaker Marcel Ramaker
Modified by
Marcel Ramaker Marcel Ramaker
Mar 6, 2009 11:08 AM

Did some testing with the March 5th build. The DTD-error seems resolved for OO 3.x. Most files can be displayed as HTML with a few exceptions.

Viso .vsd file > Error: com.sun.star.lang.IllegalArgumentException: URL seems to be an unsupported one. > error displayed within new browser window.

Acrobat .pdf file > blank new browser window > no errors in catalina.out

Text .txt file > blank new browser window > could not load file in catalina.out > empty tree.txt.html in directory.

With each file I first tried to edit (in place edit works excelent!) then tried to view it and the last step reindexed the searchindex. Then tried to search for word-phrases in docs edited, no results. So the lucene indexer does not get converted data parsed for indexing. 

See the attached catalina.out for all errors generated during this process.  

Attachments
  File Name Version Status Date Size Modified by Actions Edit
catalina.out V1.0
Mar 6, 2009  10:51 AM 76KB Marcel Ramaker  
Thumbnail Image
Jong Kim Jong Kim
In other word, do you always get this error for any pdf files?
Thumbnail Image
Marcel Ramaker Marcel Ramaker
If you mean no error at all with all pdf-files, Yes. When hitting [VIEW] a blank new window opens and no entry is written to catalina.out 
Thumbnail Image
Marcel Dekker Marcel Dekker

Did a clean install and now most files do get converted. Like Marcel I have problems with PDF files, but in my log it is throwing an exception. The error is something like "premature end of file". Text files are fine with my build. Don't no if edited files are re-indexed, but will look into that as well.

Thumbnail Image
Brent McConnell Brent McConnell
Can OpenOffice convert and index PDF files?  My understanding was that it couldn't.  The Stellent converters do, however.  We need to start including a converter for PDF files in Kablink.  Anyone want to pick up this challenge.  I can point you in the right places to get started.  If will involve righting some Java wrapper code around an open source PDF converter.  Someone correct me if I'm wrong.
Thumbnail Image
Andreas Lang Andreas Lang

This is possible, and I've got it working (for smaller PDF-Documents...)

at first you need to install the pdf-converter for openoffice it is an extension that will be found there http://extensions.services.openoffice.org/project/pdfimport

 

Then you need to edit the file ssf.properties wich is located at your tomcat installation directory at webapps/ssf/WEB-INF/classes/config

There you set

(line 734) openoffice.convert.index.2.extensions=.odg,.pdf

 

restart your kablink server and also your openoffice as bg-process then indexing pdf-Files should work.

at least it does it here ...

Thumbnail Image
Dennis R Foster Dennis R Foster

I've entered bug 665877 (https://bugzilla.novell.com/show_bug.cgi?id=665877) to document how to add PDF support to Kablink Vibe OnPrem.

Thumbnail Image
Laurent Lacheny Laurent Lacheny

Hi,

 

It works fine on my server, only if I stop soffice and then start it again with the boot parameters.

I continue to investigate this weird behaviour....

Cheers,

 

Laurent

Thumbnail Image
Laurent Lacheny Laurent Lacheny

Hey,

 

It works fine with OO started in the boot sequence if the pdf import plugin is installed for ALL users:

sh /opt/openoffice.org3/program/unopkg add --shared oracle-pdfimport.oxt

 

Enjoy,

 

Laurent

Thumbnail Image
Shrenik Bhura Shrenik Bhura

Herein is the error I encounter in the logs after having followed the steps given above. Have followed each step carefully.

Thumbnail Image
Dennis R Foster Dennis R Foster

This really sounds like a permissions problem.

Are you running the OO converter process as the SAME user that you're running Vibe?  If not, both the user running OO and the user Vibe is running as MUST have full R/W/X permission to Vibe's data directories.

Thumbnail Image
Shrenik Bhura Shrenik Bhura

It is the same user as is evident from this :

iaserver:~ # ll /opt/kablink/teaming/data/
total 28
drwxr-x---  3 iasysadmin users 4096 2010-04-22 22:23 cachefilestore
drwxr-x---  3 iasysadmin users 4096 2010-03-28 18:18 extensions
drwxr-x---  3 iasysadmin users 4096 2010-04-22 22:23 filerepository
drwxr-x---  5 iasysadmin users 4096 2011-01-17 23:04 lucene
drwxr-x---  3 iasysadmin users 4096 2010-06-04 19:37 mail
drwxr-x---  3 iasysadmin users 4096 2010-04-23 21:57 rss
drwxr-x--- 18 iasysadmin users 4096 2010-12-05 13:08 temp

and this :

Processes

Attachments
  File Name Version Status Date Size Modified by Actions Edit
Screenshot-1.png V1.0
Feb 9, 2011  7:47 PM 145KB Shrenik Bhura  
Thumbnail Image
Jong Kim Jong Kim
I have not run with OpenOffice converter recently myself, but my understanding was that it could not handle PDF conversion. The comment by Dennis (in 130.4) led me to thinking that he fixed a bug that had prevented PDF conversion from working properly. But that might just have been my mis-interpretation of his comment. Perhaps what he fixed was a different problem? Dennis, were you able to convert PDF documents?
Thumbnail Image
Robin Jackson Robin Jackson

I absolutely concur.  PDF conversion is a must, no it's not working in 3.0 beta 2 (as of today).  Just a quick search on the problem yield a few "out-of-the-box" possibilities;

 

In order to index PDF documents you need to first parse them to extract text that you want to index from them. Here are some PDF parsers that can help you with that:

PDFBox is a Java API from Ben Litchfield that will let you access the contents of a PDF document. It comes with integration classes for Lucene to translate a PDF into a Lucene document.

XPDF is an open source tool that is licensed under the GPL. It's not a Java tool, but there is a utility called pdftotext that can translate PDF files into text files on most platforms from the command line.

Based on xpdf, there is a utility called pdftohtml that can translate PDF files into HTML files. This is also not a Java application.

JPedal is a Java API for extracting text and images from PDF documents.

Simple Text Extractor Library for use with PDF documents. Relies on PDFBox.

 

I know I'm just jumping in, been a Teaming user (Novell) since 1.0.  Wanted to put Kablink up on a NPO site (kablink.dc406.com) but the number one reason I wanted it was to create a searchable reposository of PDF's

 

Regards

 

Rob

Thumbnail Image
Dennis R Foster Dennis R Foster

Currently Teaming is implemented to use a single tool for document conversion (ie., indexing, HTML viewing and for Novell Teaming, thumbnail generation.)  For Kablink Teaming, the tool used is OpenOffice.  For Novell Teaming, the tool used is Stellent (as developed by Oracle.)  There are currently no provisions in Teaming to allow for different tools to be used for different file types/extensions.  Eg., there's currently no way to use OpenOffice for everything but PDFs and use some other tool for PDFs.

If/when OpenOffice is extended to support conversion of PDF files, with Kablink Teaming 3, adding support for this to Teaming would be a few simple changes to the configuration files.  But again, this would REQUIRE that OpenOffice support the PDF conversions required by Teaming.

Thumbnail Image
Jong Kim Jong Kim

Actually, if anyone wants to take a crack at adding PDF support to their site without having to wait for OpenOffice to add PDF support, I think it can be achieved with a little bit of custom work as follows:

1. Copy the following section of bean definitions from applicationContext.xml into applicationContext-ext.xml.

<bean id="htmlOpenOfficeConverter" class="org.kablink.teaming.docconverter.impl.HtmlOpenOfficeConverter" parent="converterCommon">
<property name="host"><value>localhost</value></property>
<property name="port"><value>8100</value></property>
</bean>

<bean id="textOpenOfficeConverter" class="org.kablink.teaming.docconverter.impl.TextOpenOfficeConverter" parent="converterCommon">
<property name="host"><value>localhost</value></property>
<property name="port"><value>8100</value></property>
<property name="nullTransform"><value>config/null.xslt</value></property>
<property name="excludedExtensions"><value>jpg,jpeg,gif,tiff,png,exe,mpeg,mov,bmp,exe,wav,wma,mpa,mp3,mpg,mp4,swf,ogg,m4a,flv,bin</value></property>
</bean>

<bean id="imageOpenOfficeConverter" class="org.kablink.teaming.docconverter.impl.ImageOpenOfficeConverter" parent="converterCommon">
<property name="host"><value>localhost</value></property>
<property name="port"><value>8100</value></property>
</bean>

2. Replace the factory-shipped class names with the names of your own custom classes.

3. Write your own custom classes which will extend each of the three factory-shipped classes (or if you care about only one functionality, do it for just one class)

4. Override the public convert(...) method. Within the overridden method, check the file extension. If PDF, run your own logic (using PDFBox or whatever). If not, simply call super.

 

Thumbnail Image
Varun Bhansaly Varun Bhansaly

Hi Jong,
Following your suggestions, have written custom classes which extend factory shipped classes for pdf indexing & HTML conversion. And pdf indexing works !
Though, now I'm slightly confused as to how to package these classes & its dependencies so that they survive vibe upgrades.
Of course changes have been made to applicationContext-ext.xml & ssf-ext.properties as well, but I need not worry about these.
What I have been doing now is - package these files in a jar & place this jar along with its dependencies in "apache-tomcat-6.0.18/lib/ext".
My questions - Is this a recommended approach ? Does this qualify as an extension ?

Looking forward to your response.

Thumbnail Image
Jong Kim Jong Kim

Hi,

Guessing from the directory name you mentioned ("apache-tomcat-6.0.18/lib/ext"), it appears that you're still working with Teaming 2.*. With Teaming 2.*, unfortunately, the custom .jar files placed in the /lib/ext directory will not be preserved during upgrade. However, beginning with Vibe 3.0, we recommend that all custom .jar files be put into <tomcat>/lib/custom-ext directory, which will survive all subsequent Vibe updates/upgrades.

Regards,

Thumbnail Image
Varun Bhansaly Varun Bhansaly

I am using v3.x codebase, but did not update apache tomcat in my dev. env. Hence the name apache-tomcat-6.0.18

I have a separate setup of v3.x as well, noticed that tomcat directory no longer has a version, but could not locate directory lib/custom-ext. So if I'm using latest tomcat, do I need to create this ?

 

Thanks,

Thumbnail Image
Jong Kim Jong Kim

Yes, the base product doesn't have this directory /lib/custom-ext automatically created. So create one manually.

Thumbnail Image
Varun Bhansaly Varun Bhansaly

Done, this works, thanks for the help !

Thumbnail Image
Laurent Lacheny Laurent Lacheny
Modified by
Laurent Lacheny Laurent Lacheny
Jan 19, 2011 9:27 PM

Hello,

 

I discovered an OO plugin to read PDF files. It works fine with OO as a stand alone application.

I tried to install this plugin on my teaming 3.0 server, but PDF are still not viewable as HTML.

With the plugin, PDF documents are imported in Draw.

I'm not fluente with Teaming configuration files, I certainly missed something.

 

http://extensions.services.openoffice.org/project/pdfimport

 

Note :

I am able to convert a pdf file to a html file with the attached python script. Plse note that I am using the filter 'draw_html_Export'.

(the script works only with absolute path)

 

Cheers,

 

Laurent

Attachments
  File Name Version Status Date Size Modified by Actions Edit
office.py V1.0
Jan 19, 2011  9:27 PM 2KB Laurent Lacheny  
Thumbnail Image
Dennis R Foster Dennis R Foster
I don't think the fix I mentioned in 130.4 will have any impact on PDF conversions.  It was a simple change to handle a difference in the XHTML produced by newer versions of )).
Skip Footer Toolbar