Python: shapefile merger utility

This Python script is a commandline shapefile merger. Many options provided to perform quickly the desired merge.
Features:
- accepts wildcards
- recursive (walk through paths)
- list of shapefiles to exclude
- adds fields filled with the source filename and the source feature id (possibility to reuse existing)
- force output result to bi-dimensional geometry type
- dbf schemes union or intersection

Download shapemerger.py or copy the code below.

To run the script you need Python with Ogr/Gdal Python bindings.
On Windows, I suggest to install it from OsGeo4W setup:
- run the setup and choose Advanced Install
- step forward until Select Packages form
- from Commandline_Utilities select and install: gdal, python (meta package), shapelib
- from Libs select and install: gdal-python
- click next and finish installation

Open OsGeo4W shell, move where you downloaded shapemerger.py and type

python shapemerger.py

If everything is ok, it will return the command usage.

Examples:
  • python shapemerger.py -o test.shp *.shp
    - merge all the shapefiles of the current directory in test.shp
    - the reference geometry type is extracted from the first file.
    - union of dbf schemes (default)
  • python shapemerger.py -o test.shp *.shp -i -r -x *_railways.shp
    - merge all the shapefiles of the current directory and subdirectories in test.shp
    - exclude all the shapefiles named as ..._railways.shp
    - intersection of dbf schemes
  • python shapemerger.py -s SHAPENAME -f SHAPEFID -e zones_*.shp
    - merge all the zones_...shp in shapemerge.shp (default name of merge file)
    - rename source filenames field to: SHAPENAME, source feature ids field to: SHAPEFID
    - use existing values of fields SHAPENAME and SHAPEFID if found in the input shapefiles


#!/usr/bin/python
# shapemerger.py
# v.1.1 by toni @ http://furiousgis.blogspot.com
#
# shapefile merger utility 

import os, sys, glob
try:
    from osgeo import ogr, gdal
except:
    try:
        import ogr, gdal
    except:
        print 'Error: shapemerge.py cannot find python OGR and GDAL modules'
        print '       for handling geometries. OsGeo4W python is recommended.'
        sys.exit(1)

def Usage():
    print
    print 'Shapefile merger'
    print 'Usage: shapemerger.py\t[-o <output.shp>] [-n] [-i] [-s <fieldname>] [-f <fieldname>]'
    print '\t\t\t[-e] [-2] [-r] <shapefiles> [-x <exclude shapefile list>]'
    #print '\t\t\t[-e] [-2] [-r] [-t plg|arc|pt|plg2D|arc2D|pt2D|plg3D|arc3D|pt3D]'
    #print '\t\t\t<shapefiles> [-x <exclude shapefile list>]'
    print 
    print '  <shapefiles>: input shapefile list (accepts wildcards)'
    print
    print '  -o:\toutput shapefile name (default: shapemerge.shp)'
    print '  -n:\tdo not add fields for the source filenames and fids'
    print '  -i:\tdbf schemes intersection (default: union)'
    print '  -s:\tcustom name for the source filenames field (default: SOURCESHP)'
    print '  -f:\tcustom name for the source fids field (default: SOURCEFID)'
    print '  -e:\tuse existing source filenames and fids fields, if any'
    print '  -2:\tforce output geometry type to 2D'
    print '  -x:\tlist of shapefiles to exclude (accepts wildcards)'
    print '  -r:\trecursive (grabs every path of input files)'
    #print '  -t:\tfilter input shapefiles by geometry type'
    sys.exit(1)

strCurrentPath = os.path.abspath('')
os.chdir(strCurrentPath)

strOutDirname = ''
strMergeName = 'shapemerge'
strOutBasename = 'shapemerge.shp'

pOgrShpDriver = ogr.GetDriverByName('Esri Shapefile')

lsFilenames=[]
lsExcludeFilenames=[]
lsVerifiedFilenames=[]
lsMergeFieldsToRetain=[]
lsMergeFieldNames=[]
lsMergeFieldOgrDefns=[]
#lsGeomTypeFilters=['PLG','ARC','PT','PLG2D','ARC2D','PT2D','PLG3D','ARC3D','PT3D']

bFlagExclude = 0
bFlagAddSourcefields = 1
bFlagUseExistingSfields = 0
bFlagIntersection = 0
bFlagRecursive = 0
bFlagGeomTFilter = 0
bFlag2D = 0

bGeomTypeRefIsSet = None

strFilenameFieldname = 'SOURCESHP'
strFidFieldname = 'SOURCEFID'

i = 1

# parse commandline (first check for recursive flag)
while i < len(sys.argv): 
    arg = sys.argv[i]
    if arg.upper() == '-R':
        bFlagRecursive = 1
        del sys.argv[i]
        break
    i = i + 1

i = 1
    
while i < len(sys.argv): 
    arg = sys.argv[i]

    if arg.upper() == '-N':
        bFlagAddSourcefields = 0
        bFlagExclude = 0

    elif arg.upper() == '-I':
        try:
            strGdalVersion = int(gdal.VersionInfo()[:2])
        except:
            strGdalVersion = 0
        
        if (strGdalVersion < 19):
            print 'Warning: dbf schemes intersection supported only with gdal >= 1.9.0, ignoring.'
            bFlagIntersection = 0
        else:
            bFlagIntersection = 1
        
        bFlagExclude = 0

    elif arg.upper() == '-S':
        i = i + 1
        strFilenameFieldname = ''.join(sys.argv[i].split())[:10].upper() #upper, tronca al decimo, toglie spazi
        bFlagExclude = 0

    elif arg.upper() == '-F':
        i = i + 1
        strFidFieldname = ''.join(sys.argv[i].split())[:10].upper() #upper, tronca al decimo, toglie spazi
        bFlagExclude = 0
        
    elif arg.upper() == '-E':
        bFlagUseExistingSfields = 1
        bFlagExclude = 0

    elif arg.upper() == '-2':
        bFlag2D = 1
        bFlagExclude = 0

    #if arg.upper() == '-T':
    #    i = i + 1
    #    strGeomTFilter = sys.argv(i)
    #    
    #    if (not (strGeomTFilter in lsGeomTypeFilters)):
    #        print "Warning: invalid geometry type declared, ignoring." 
    #    else:
    #        bFlagGeomTFilter = 1
    #    
    #    bFlagExclude = 0
        
    elif arg.upper() == '-X':
        bFlagExclude = 1

    elif arg.upper() == '-O':
        i = i + 1
        bFlagExclude = 0

        strOutDirname, strOutBasename = os.path.split(sys.argv[i])
        strMergeName, strExtension = os.path.splitext(strOutBasename)
            
    else:
        # expands wildcards: exclude, input list/recursive input list
        if (bFlagRecursive):
            strInputShapePath, strInputShapeBasename = os.path.split(arg)
            strInputShapePath = strInputShapePath or '.'
            
            lsInputShapeRoots=[]
            lsInputShapeRoots.extend(glob.glob(strInputShapePath)) #expands paths with wildcards

        if (bFlagExclude and bFlagRecursive):
            for strRootPath in lsInputShapeRoots:
                for strRoot,strDirs,strFiles in os.walk(strRootPath):
                    lsExcludeFilenames.extend ( glob.glob(strRoot + os.sep + strInputShapeBasename) )
        
        elif (bFlagExclude and (not bFlagRecursive)):
            lsExcludeFilenames.extend ( glob.glob(arg) )
        
        elif (bFlagRecursive):
            for strRootPath in lsInputShapeRoots:
                for strRoot,strDirs,strFiles in os.walk(strRootPath):
                    lsFilenames.extend ( glob.glob(strRoot + os.sep + strInputShapeBasename) )
        else:
            lsFilenames.extend ( glob.glob(arg) )
                
    i = i + 1

for strItem in lsExcludeFilenames:
    try:
        lsFilenames.remove(strItem)
    except:
        pass

if len(lsFilenames) == 0:
    Usage()
    sys.exit(1)

if not os.path.isdir(strOutDirname or '.'):
    print strOutDirname + ': invalid directory, outputting here'
    strOutDirname = ''

if os.path.isfile( os.path.join(strOutDirname, strOutBasename) ):
    strSuffix = ''
    j = 0
    while os.path.isfile( os.path.join(strOutDirname,strMergeName + strSuffix + '.shp') ):
        j = j + 1
        strSuffix = '_' + str(j)
  
    if (strMergeName != 'shapemerge'):
        print strOutBasename + ': file exists, outputting in ' + strMergeName + strSuffix + '.shp'
  
    if (j > 0):
        strMergeName = strMergeName + strSuffix
        strOutBasename = strMergeName + '.shp'  
  
# geometry type check / "source attributes" renaming
strSuffix = ''
j = 0 
for i,strFile in enumerate(lsFilenames):
    try:
        pOgrDatasource = pOgrShpDriver.Open(strFile)
        pOgrLayer = pOgrDatasource.GetLayer()
        pOgrLayerSchema = pOgrLayer.GetLayerDefn()
        
        if (not bGeomTypeRefIsSet):
            # picks up the first geometry type as reference
            intGeometryTypeRef = pOgrLayer.GetGeomType()
            
            if (bFlag2D):
                # fetch first feature to flatten geometry and grab TypeRef
                pOgrFeature = pOgrLayer.GetNextFeature()
                pOgrFeatureGeometry = pOgrFeature.GetGeometryRef()
                pOgrClonedGeometry = pOgrFeatureGeometry.Clone()
                pOgrClonedGeometry.FlattenTo2D()
                intGeometryTypeRef = pOgrClonedGeometry.GetGeometryType() 
                #pOgrFeature.Destroy()
                #pOgrClonedGeometry
            
            bGeomTypeRefIsSet = i # register index of the reference shapefile
                
        #check if layers the picked geometry type (ignore 3D flag)
        if pOgrLayer.GetGeomType() != intGeometryTypeRef:
            if ((pOgrLayer.GetGeomType() == ogr.wkbPolygon) or (pOgrLayer.GetGeomType() == ogr.wkbPolygon25D)) and ((intGeometryTypeRef == ogr.wkbPolygon) or (intGeometryTypeRef == ogr.wkbPolygon25D)) \
                or ((pOgrLayer.GetGeomType() == ogr.wkbLineString) or (pOgrLayer.GetGeomType() == ogr.wkbLineString25D)) and ((intGeometryTypeRef == ogr.wkbLineString) or (intGeometryTypeRef == ogr.wkbLineString25D)) \
                or ((pOgrLayer.GetGeomType() == ogr.wkbPoint) or (pOgrLayer.GetGeomType() == ogr.wkbPoint25D)) and ((intGeometryTypeRef == ogr.wkbPoint) or (intGeometryTypeRef == ogr.wkbPoint25D)) :
                    pass
            else:
                #print 'Error: shapefiles must have the same geometry type.'
                #print '\t' + os.path.basename(lsFilenames[0]) + ': ' + ogr.GeometryTypeToName(intGeometryTypeRef).upper()
                #print '\t' + os.path.basename(lsFilenames[i]) + ': ' + ogr.GeometryTypeToName(pOgrLayer.GetGeomType()).upper() 
                #sys.exit(1)
                print 'Warning: ' + os.path.basename( strFile ) + ' type ' + ogr.GeometryTypeToName(pOgrLayer.GetGeomType()).upper() + ' unfits first file type ' + ogr.GeometryTypeToName(intGeometryTypeRef).upper() + ' (' + os.path.basename(lsFilenames[bGeomTypeRefIsSet]) +'), skipping.'
                continue
                
        if ( (not bFlagUseExistingSfields) and bFlagAddSourcefields ):
            # check for "source fields" in current shapefile to eventually rename them
            while ( (pOgrLayerSchema.GetFieldIndex(strFilenameFieldname) != -1) or (pOgrLayerSchema.GetFieldIndex(strFidFieldname) != -1) ):
                j = j + 1
                strSuffix = '_' + str(j)
                strFilenameFieldname = strFilenameFieldname[:(10 - len(strSuffix))] + strSuffix
                strFidFieldname      = strFidFieldname[:(10 - len(strSuffix))] + strSuffix

        lsVerifiedFilenames.append( strFile )
        pOgrDatasource.Destroy()
        
    except:
        print 'Warning: ' + strFile + ' invalid shapefile, skipping.'
        #continue

if (len(lsVerifiedFilenames) == 0):
    print 'Error: all the input shapefiles are invalid.'
    sys.exit(1)
else:
    lsFilenames = lsVerifiedFilenames

if (j >0):
    print "Warning: \"source fieldnames\" renamed to: " +  strFilenameFieldname + ", " + strFidFieldname

# building attributes
if (bFlagAddSourcefields):
    # "source fields" attributes
    lsMergeFieldOgrDefns.append( ogr.FieldDefn(strFilenameFieldname, ogr.OFTString) )
    lsMergeFieldOgrDefns[0].SetWidth(255)
    lsMergeFieldOgrDefns.append( ogr.FieldDefn(strFidFieldname, ogr.OFTInteger) )
    
    lsMergeFieldNames.append(strFilenameFieldname)
    lsMergeFieldNames.append(strFidFieldname)


for i,strFile in enumerate(lsFilenames):
    # shapefiles attributes
    
    if ( i>0 and bFlagIntersection and ( (bFlagAddSourcefields and (len(lsMergeFieldNames)==2)) or (len(lsMergeFieldNames)==0) ) ):
        # no more scheme intersection to do, exit loop 
        break
    
    strFullname = strFile
    strDirname, strBasename = os.path.split(strFullname)
    strShapename, strExtension = os.path.splitext(strBasename)
    
    pOgrShpDatasource = pOgrShpDriver.Open (strFullname)
    pOgrShpLayer = pOgrShpDatasource.GetLayer()
    pOgrShpLayerSchema = pOgrShpLayer.GetLayerDefn()
     
    for j in range( pOgrShpLayerSchema.GetFieldCount() ):
        # parse shapefile attributes
        pOgrShpFielddefn = pOgrShpLayerSchema.GetFieldDefn(j)
        strShpFieldname = pOgrShpFielddefn.GetNameRef().upper()
        
        try:
            iMergeListsFieldIdx = lsMergeFieldNames.index(strShpFieldname)
        except:
            iMergeListsFieldIdx = -1

        if (iMergeListsFieldIdx == -1):
            
            if ( (not bFlagIntersection) or (bFlagIntersection and (i == 0)) ):
                # field doesn't exist: create it. If flag intersection is on, parse only the first file
                pOgrFielddefn = ogr.FieldDefn( strShpFieldname, pOgrShpFielddefn.GetType() )
                pOgrFielddefn.SetWidth( pOgrShpFielddefn.GetWidth() )
                pOgrFielddefn.SetPrecision( pOgrShpFielddefn.GetPrecision() )                
                pOgrFielddefn.SetJustify( pOgrShpFielddefn.GetJustify() )                                
                
                lsMergeFieldOgrDefns.append( pOgrFielddefn )
                lsMergeFieldNames.append( strShpFieldname )
                
                # add field to the retain list for intersection
                #lsMergeFieldsToRetain.append( lsMergeFieldNames.index(strShpFieldname) )
                lsMergeFieldsToRetain.append( len(lsMergeFieldNames) - 1 )
        
        else:
            # field exist, check if properties should be enlarged
            pOgrFielddefn = lsMergeFieldOgrDefns[iMergeListsFieldIdx]
            
            if pOgrFielddefn.GetWidth() < pOgrShpFielddefn.GetWidth(): 
                pOgrFielddefn.SetWidth( pOgrShpFielddefn.GetWidth() )
            
            elif pOgrFielddefn.GetPrecision < pOgrShpFielddefn.GetPrecision:
                pOgrFielddefn.SetPrecision( pOgrShpFielddefn.GetPrecision() )
                                    
            if (bFlagIntersection):
                # add field to the retain list for intersection
                lsMergeFieldsToRetain.append(iMergeListsFieldIdx)
            
    if ( bFlagIntersection ):
        # performing dbf intersection
        
        if (bFlagAddSourcefields):
            lsMergeFieldsToRetain.append( lsMergeFieldNames.index(strFilenameFieldname) )
            lsMergeFieldsToRetain.append( lsMergeFieldNames.index(strFidFieldname) )
            
        lsMergeFieldsToDelete = range( len(lsMergeFieldNames)-1, -1, -1) # reverse: from last index to 0
        
        for iIdx in lsMergeFieldsToRetain:
            try:
                lsMergeFieldsToDelete.remove(iIdx)
            except:
                pass
       
        for iIdx in lsMergeFieldsToDelete:
            # add or remove fields items
            del lsMergeFieldOgrDefns[iIdx]
            del lsMergeFieldNames[iIdx]            

    pOgrShpDatasource.Destroy()
    lsMergeFieldsToRetain = []

# generate the merge layer
strMergeShpFullname = os.path.join(strOutDirname, strOutBasename)

pOgrMergeShpDatasource = pOgrShpDriver.CreateDataSource( strMergeShpFullname )
pOgrMergeShpLayer = pOgrMergeShpDatasource.CreateLayer( strMergeName, geom_type = intGeometryTypeRef )

for i in lsMergeFieldOgrDefns:
    # create the fields
    pOgrMergeShpLayer.CreateField(i,1)

pOgrMergeShpLayerSchema = pOgrMergeShpLayer.GetLayerDefn()

# Loop for populating the merge shapefile
for i,strFile in enumerate(lsFilenames):
    strFullname = strFile
    strDirname, strBasename = os.path.split(strFullname)
    strShapename, strExtension = os.path.splitext(strBasename)
    
    pOgrShpDatasource = pOgrShpDriver.Open (strFullname)
    pOgrShpLayer = pOgrShpDatasource.GetLayer()
    pOgrShpLayerSchema = pOgrShpLayer.GetLayerDefn()
            
    pOgrShpLayer.ResetReading()
    
    pOgrShpFeature = pOgrShpLayer.GetNextFeature()    
    
    bFlagExistsFilenameFld = pOgrShpLayerSchema.GetFieldIndex(strFilenameFieldname)
    bFlagExistsFidFld = pOgrShpLayerSchema.GetFieldIndex(strFidFieldname)
    
    while pOgrShpFeature:
        pOgrMergeShpfeature = ogr.Feature(pOgrMergeShpLayerSchema)
        pOgrMergeShpfeature.SetFrom(pOgrShpFeature,1)
        
        if (bFlag2D):
            # performing 2d conversion
            pOgrMergeGeometry2d = pOgrMergeShpfeature.GetGeometryRef().Clone()
            pOgrMergeGeometry2d.FlattenTo2D()
            pOgrMergeShpfeature.SetGeometryDirectly(pOgrMergeGeometry2d)
        
        if (bFlagAddSourcefields):
            # "source fields" values, if any and if they should be populated
            if ( (bFlagUseExistingSfields and bFlagExistsFilenameFld == -1) or (not bFlagUseExistingSfields) ):
                #pOgrMergeShpfeature.SetField( strFilenameFieldname, strBasename )
                pOgrMergeShpfeature.SetField( strFilenameFieldname, strFullname )
            if ( (bFlagUseExistingSfields and bFlagExistsFidFld == -1) or (not bFlagUseExistingSfields) ):
                pOgrMergeShpfeature.SetField( strFidFieldname, pOgrShpFeature.GetFID() )
            
        pOgrMergeShpLayer.CreateFeature(pOgrMergeShpfeature)
        
        pOgrMergeShpfeature.Destroy()
        pOgrShpFeature.Destroy()
        pOgrShpFeature = pOgrShpLayer.GetNextFeature()
            
    pOgrShpDatasource.Destroy()

# flush all to disk before quitting
pOgrMergeShpDatasource.SyncToDisk()

print
print strMergeShpFullname
sys.exit(0)

22 comments:

  1. Hi Toni,
    Thank you for sharing your script. I'm looking for something that can intersect 2 shape files. Seems like your script might be the solution. It is very fast but somehow the the "-i" option output is identical to the output without choosing "-i". In other words, choosing intersect or Union gives the same output. Any idea why that might be?

    Thanks,
    Albert.

    ReplyDelete
  2. Hi Albert, thanks for your interest.
    I'm sending you a private mail so maybe I can give you some help.

    Anyway, the purpose of the "intersect" option is referred to the attribute schemes, it is not a geometric intersection.

    Eg.:
    shapefile 1: 10 features, attributes A,B,C
    shapefile 2: 20 features, attributes B,C,D

    merge result with -i option will be:
    shapemerge: 30 features, attributes B,C

    I tested it three or four times and it seems to work.

    ReplyDelete
  3. Thank you for interesting solution.

    I have one problem with results from your script. I tried to open result with python module shapefile (http://pypi.python.org/pypi/pyshp#id4) and I get error with opening records:

    records = sf.records()
    File "D:\teren\programi\python_shape\shapefile.py", line 413, in records
    r = self.__record()
    File "D:\teren\programi\python_shape\shapefile.py", line 378, in __record
    value = int(value)
    ValueError: invalid literal for int() with base 10: '********'

    When I opened original data with shapefile module before using your script I don't get any error. I also don't have any problem with opening merged file with QGIS. Do have any idea what could go wrong?

    ReplyDelete
    Replies
    1. Hi there.
      I never used that module. I googled for a while and found this: http://code.google.com/p/pyshp/issues/detail?id=12
      can you please check it out? Maybe it is something related.

      However, to check the output shapefile integrity type
      ogrinfo -al -so shapefile.shp
      If no errors are raised, then means the file is ok.

      Anyway, try to add
      pOgrMergeShpDatasource.Destroy()
      just after the pOgrMergeShpDatasource.SyncToDisk() instruction at the end of the script.
      This ensures that the deallocation is properly executed.

      Delete
    2. Thank you for answer. I have corrected module shapefile with some additional if-clause control and now it works. I think that problem was in original data (I didn't check all 100 input files for merging) - there were values '********' and '**' instead od integer and script stoped at this situation.

      Thanks again. Best

      Delete
  4. I must have a package installed incorrectly because it errors:
    E:\temp>C:\Python26\ArcGIS10.0\python.exe shapemerger.py -o test.shp *shp
    'import site' failed; use -v for traceback
    Traceback (most recent call last):
    File "shapemerger.py", line 7, in
    import os, sys, glob
    File "C:\OSGeo4W\\apps\Python27\lib\os.py", line 398, in
    import UserDict
    File "C:\OSGeo4W\\apps\Python27\lib\UserDict.py", line 84, in
    _abcoll.MutableMapping.register(IterableUserDict)
    File "C:\OSGeo4W\\apps\Python27\lib\abc.py", line 109, in register
    if issubclass(subclass, cls):
    File "C:\OSGeo4W\\apps\Python27\lib\abc.py", line 184, in __subclasscheck__
    cls._abc_negative_cache.add(subclass)
    File "C:\OSGeo4W\\apps\Python27\lib\_weakrefset.py", line 84, in add
    self.data.add(ref(item, self._remove))
    TypeError: cannot create weak reference to 'classobj' object

    ReplyDelete
  5. Hi man,
    you should open a OSGeo4W shell and use "that" OSGEO python (check your PATH variable).

    Do you have arcgis installed? It's better to clean the PYTHONPATH variable, so type:

    SET PYTHONPATH=

    Finally, I can see a typo in your command:

    it is:
    shapemerger.py -o test.shp *.shp

    not:
    shapemerger.py -o test.shp *shp

    ReplyDelete
  6. any idea of how i might use this from another python program in linux? i tried subprocess but could not get it to work

    ReplyDelete
    Replies
    1. Unfortunately I didn't write a module, this is a simple script. You should slightly modify the code to import it. Why don't you extract only what you need?

      Delete
  7. Thanks for the script Tony. It's very neat and useful but I encountered two issues when trying it:

    No .prj file seems to be generated. I would very much like it to be created since many applications can't determine CRS if the prj file is missing.

    The shapefiles I merged had the same crs and identical prj files so I don't think there was some shortcoming in the source data that caused problems.

    ---
    Original data was in UTF-8 but the merged dbf file was in ISO-8859-1, resulting in a loss of information.

    I'm using Ubuntu Linux.
    'locale charmap' returns 'UTF-8'
    echo $LANG gives "en_US.UTF-8"

    ReplyDelete
    Replies
    1. Ok, don't worry about prj file. Of course you're merging files with the same crs, so what you have to do is just to copy one of your existing prj to a file whose basename is identical to the generated shapefile: eg.: cp myfirst.prj myunion.prj
      (in order to have myunion.shp/shx/dbf/prj all together)

      The real issue is the encoding. Try to do the following (untested):
      before launching set this env var:
      export SHAPE_ENCODING=UTF-8
      (should set the parser encoding)

      open the script and replace line 370 with this:
      pOgrMergeShpLayer = pOgrMergeShpDatasource.CreateLayer( strMergeName, geom_type = intGeometryTypeRef, options=["ENCODING=UTF-8"] )
      (should set the writer encoding)

      Delete
  8. Hello Toni and thanks for the quick reply.

    First, the encoding issue.
    I made these addition:

    ...
    # geometry type check / "source attributes" renaming
    os.environ['SHAPE_ENCODING'] = 'UTF-8' ##My addition
    strSuffix = ''
    j = 0
    for i,strFile in enumerate(lsFilenames):

    ...
    and as you suggested:
    pOgrMergeShpLayer = pOgrMergeShpDatasource.CreateLayer( strMergeName, geom_type = intGeometryTypeRef, options=["ENCODING=UTF-8"])

    After these two changes UTF-8 encoding was preserved, and the merge.shp even got a .cpg file :)

    You can see the full script here: http://pastebin.com/8jJTq9Hk
    ----

    If you want to improve the script I think it would be a good idea for it to be "encoding aware". The script could check for .cpg file in the input, and make output in UTF-8 by default (but this can be overrridden with an argument)

    ReplyDelete
    Replies
    1. Toni, I would like a prj file to be created automatically by the script so to save time and effort.

      Delete
    2. I really appreciate your efforts. Good job!
      Yes it's possible: prj file can be generated adding similar format specific options. check ogr class documentation

      Delete
    3. Hi Toni

      I have made a very simple fix to adding a projection for the merged shapefile. I simply added that the shapefile should use WGS84 as crs. No reprojection will automatically be done, so all input should also be in WGS84.

      The script can be seen here: http://pastebin.com/eu0qZdvH

      Changes:
      L9 from osgeo import osr, ogr, gdal

      L353-358
      #define a spatial reference for the merge shapefile
      outSpatialReference = osr.SpatialReference()
      outSpatialReference.ImportFromEPSG(4326)

      pOgrMergeShpDatasource = pOgrShpDriver.CreateDataSource( strMergeShpFullname )
      pOgrMergeShpLayer = pOgrMergeShpDatasource.CreateLayer( strMergeName, outSpatialReference, geom_type = intGeometryTypeRef, options=["ENCODING=UTF-8"])

      This fix is about as basic it can be, but it fixes the problem for me.

      Delete
  9. Toni,

    Fantastic work, exactly what I need, appreciate you publishing.

    Jon

    ReplyDelete
  10. Super handy! Thank you for sharing this tool!

    ReplyDelete
  11. Nice work! I'm trying to expand on this to get a merge for each subdirectory rather than a total combined file. I was thinking something like this:

    For \r %I in (*.shp) do python shapemerger.py -o out.shp -r *.shp %I

    I'm still working though the syntax. Is this the way to go or is there a better method?

    ReplyDelete
  12. The scripts works fast but it is not giving results similar to ArcGIS Union tool. For every feature in the output shapefile should have attribute information from each input layer. But the output from your script gives attribute information only for one input layer while the other columns are left empty for a given feature.

    Please suggest any solution which could be an alternate to ArcGIS union tool.

    thanks

    ReplyDelete
    Replies
    1. ArcGIS Union tool is doing two additional jobs as follows. Please suggest how do we implement the same through ogr/Gdal

      1. Cracks and clusters the features. Cracking inserts vertices at the intersection of feature edges; clustering snaps together vertices that are within the x,y tolerance.
      2. Discovers geometric relationships (overlap) between features from all feature classes.

      Delete
  13. Hello, thanks for the script

    Some anotations:
    Warning: ... 3D MEASURED POINT unfits first file type POINT ..., skipping.
    ERROR 1: Attempt to write non-point (MULTIPOINT) geometry to point shapefile.

    ReplyDelete
  14. Warning 1: Value 11172583.4344 of field Shape_Area of feature 38 not successfully written. Possibly due to too larger number with respect to field width

    ReplyDelete