I'm currently working on a project in which we need to get some content from a Sharepoint instance into a SQL server database. The work is being done by another consultant using Microsoft SSIS. The simplest way to get the data from one side to the other would be to do a straight database to database data synchronization. The problem with this plan is that the Sharepoint database schema just plain sucks and that Microsoft doesn't recommend accessing it directly, not just because of the crappy schema, but also because it could change between Sharepoint versions.
So plan B is to use the abundantly available Sharepoint webservices (if you could call them webservices). Since SSIS has support for webservices we tried that and failed again. We found no way to get SSIS to understand the Sharepoint webservices, it just kept complaining about the WSDL and the XSD. So Googling gave us an alternate solution: create some stubs around the webservice and use those in a script task. This got use a bit further but then SSIS started complaining about the XML using multiple namespaces and it also couldn't handle the XHTML content in some tags.
But again we found a workaround. Using the following XSLT to preprocess the XML before using it made SSIS accept it. The XSL removes all namespaces and places the XHTML content of selected tags, by means of a xsl:param in CDATA sections. One last problem was that SSIS doesn't give you the means to provide an XSL with params, so you'll have to put them directly into the content of the xsl:param.
So plan B is to use the abundantly available Sharepoint webservices (if you could call them webservices). Since SSIS has support for webservices we tried that and failed again. We found no way to get SSIS to understand the Sharepoint webservices, it just kept complaining about the WSDL and the XSD. So Googling gave us an alternate solution: create some stubs around the webservice and use those in a script task. This got use a bit further but then SSIS started complaining about the XML using multiple namespaces and it also couldn't handle the XHTML content in some tags.
But again we found a workaround. Using the following XSLT to preprocess the XML before using it made SSIS accept it. The XSL removes all namespaces and places the XHTML content of selected tags, by means of a xsl:param in CDATA sections. One last problem was that SSIS doesn't give you the means to provide an XSL with params, so you'll have to put them directly into the content of the xsl:param.
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" />
<xsl:param name="rtfNodes">TagsName1~TagsName1</xsl:param>
<xsl:template match="*">
<xsl:choose>
<xsl:when test="contains($rtfNodes, local-name())">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="@*" />
<xsl:text disable-output-escaping="yes"><![CDATA[ <![CDATA[ ]]></xsl:text>
<xsl:copy-of select="node()"/>
<xsl:text disable-output-escaping="yes"><![CDATA[]]]]><![CDATA[>]]></xsl:text>
</xsl:element>
</xsl:when>
<xsl:otherwise>
<xsl:element name="{local-name()}">
<xsl:apply-templates select="@*" />
<xsl:apply-templates select="node()" />
</xsl:element>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
<xsl:template match="/">
<xsl:copy>
<xsl:apply-templates />
</xsl:copy>
</xsl:template>
<xsl:template match="comment() | processing-instruction() | text()">
<xsl:copy />
</xsl:template>
</xsl:stylesheet>
If you can use third-party solutions, check the commercial CozyRoc SharePoint integration. These are the relevant components:
ReplyDelete* SharePoint Source SSIS Data Flow Component – http://www.cozyroc.com/ssis/sharepoint-source
* SharePoint Destination SSIS Data Flow Component – http://www.cozyroc.com/ssis/sharepoint-destination
* SharePoint SSIS Connection Manager – http://www.cozyroc.com/ssis/sharepoint-connection
Batch insert and update is also supported.