DataFrame _________ Get layer as pandas DataFrame ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``LayersSet`` provides a method to create a pandas DataFrame from a ``osgeo.ogr.Layer``. Layer zero is the default layer number. The DataFrame: * uses the feature ID as index * has a special column named ``_GEOM_``, which contains all information about the layer. .. note:: If you intent to convert the DataFrame back to layer 1. do not remove the ``_GEOM_`` column. 2. Do not rename the columns if the ogr field types should to be maintained .. code-block:: python lrs = LayersReader('D:/tmp/DEU_adm_shp/DEU_adm4.shp') df = lrs.data_frame() print df :: _GEOM_ ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 \ FID 0 Polygon 86 DEU Germany 1 Baden-Württemberg 1 1 Polygon 86 DEU Germany 1 Baden-Württemberg 1 ... ... ... ... ... ... ... ... 11300 Polygon 86 DEU Germany 16 Thüringen 402 11301 Polygon 86 DEU Germany 16 Thüringen 403 [11302 rows x 17 columns] Show layers as pandas DataFrame ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The method ``show()`` creates a ``DataFrame`` object and sets some display properties: .. code-block:: python lrs = LayersReader('D:/tmp/DEU_adm_shp/DEU_adm4.shp') lrs.show(width=300, max_rows=6) :: _GEOM_ ID_0 ISO NAME_0 ID_1 NAME_1 ID_2 \ FID 0 Polygon 86 DEU Germany 1 Baden-Württemberg 1 1 Polygon 86 DEU Germany 1 Baden-Württemberg 1 2 Polygon 86 DEU Germany 1 Baden-Württemberg 1 3 Polygon 86 DEU Germany 1 Baden-Württemberg 1 4 Polygon 86 DEU Germany 1 Baden-Württemberg 1 ... ... ... ... ... ... ... ... 11297 Polygon 86 DEU Germany 16 Thüringen 402 11298 Polygon 86 DEU Germany 16 Thüringen 402 11299 Polygon 86 DEU Germany 16 Thüringen 402 11300 Polygon 86 DEU Germany 16 Thüringen 402 11301 Polygon 86 DEU Germany 16 Thüringen 403 [11302 rows x 17 columns] pandas methods can be used: .. code-block:: python lrs = LayersReader('D:/tmp/DEU_adm_shp/DEU_adm4.shp') df = lrs.data_frame() df_nrw = df[df['NAME_1']=='Nordrhein-Westfalen'] df_nrw = df_nrw.drop(['ID_0', 'ISO', 'NAME_0', 'ID_1'], axis=1) print df_nrw :: _GEOM_ NAME_1 NAME_3 FID 5946 Polygon Nordrhein-Westfalen Bielefeld 5947 Polygon Nordrhein-Westfalen Bochum 5948 Polygon Nordrhein-Westfalen Bonn 5949 Polygon Nordrhein-Westfalen Ahaus 5950 Polygon Nordrhein-Westfalen Bocholt ... ... ... ... 6337 Polygon Nordrhein-Westfalen Sonsbeck 6338 Polygon Nordrhein-Westfalen Voerde (Niederrhein) 6339 Polygon Nordrhein-Westfalen Wesel 6340 Polygon Nordrhein-Westfalen Xanten 6341 Polygon Nordrhein-Westfalen Wuppertal [396 rows x 3 columns] _GEOM_ column ^^^^^^^^^^^^^ The column `_GEOM_` is central in the DataFrame: Behind each column cell there is an object ``DataFrameFeature``, which contains the ``osgeo.ogr.Layer`` and the corresponding feature id of the row. With that, any attribute or method described in http://gdal.org/python/osgeo.ogr.Geometry-class.html. can be used:: AddGeometry, AddGeometryDirectly, AddPoint, AddPointM, AddPointZM, AddPoint_2D, Area, AssignSpatialReference, Boundary, Buffer, Centroid, Clone, CloseRings, Contains, ConvexHull, CoordinateDimension, Crosses, DelaunayTriangulation, Destroy, Difference, Disjoint, Distance, Empty, Equal, Equals, ExportToGML, ExportToIsoWkb, ExportToIsoWkt, ExportToJson, ExportToKML, ExportToWkb, ExportToWkt, FlattenTo2D, GetArea, GetBoundary, GetCoordinateDimension, GetCurveGeometry, GetDimension, GetEnvelope, GetEnvelope3D, GetGeometryCount, GetGeometryName, GetGeometryRef, GetGeometryType, GetLinearGeometry, GetM, GetPoint, GetPointCount, GetPointZM, GetPoint_2D, GetPoints, GetSpatialReference, GetX, GetY, GetZ, HasCurveGeometry, Intersect, Intersection, Intersects, Is3D, IsEmpty, IsMeasured, IsRing, IsSimple, IsValid, Length, Overlaps, PointOnSurface, Segmentize, Set3D, SetCoordinateDimension, SetMeasured, SetPoint, SetPointM, SetPointZM, SetPoint_2D, Simplify, SimplifyPreserveTopology, SymDifference, SymmetricDifference, Touches, Transform, TransformTo, Union, UnionCascaded, Value, Within, WkbSize, next The method ``DataFrameFeature.apply(method)`` applies the given `method` to all geometries in the column `_GEOM_`. **Example: area calculation** The example below shows how to calculate areas in km²: 1. Read layers with `LayersReader` 2. Transform the coordinate system from WGS84 into UTM Zone32 and return a `LayersWriter` in memory. 3. The tranformed `LayersWriter` instance is transformed into a data frame 4. The method 'GetArea' is applied to each geometry g. The returned values are saved in a new DataFrame column 'area_km2' 5. Print only columns 'GEOM', 'NAME_4', and 'area_km2' .. code-block:: python :linenos: lrs = LayersReader('D:/tmp/girs/DEU_adm_shp/DEU_adm4.shp') # Read lrs = lrs.transform(epsg=32632) # Transform to UTM Zone 32 and return in 'Memory' (RAM) df = lrs.data_frame() # to DataFrame df['area_km2'] = df['_GEOM_'].apply(lambda g: g.apply('GetArea') / 1000000.0) print df[['_GEOM_', 'NAME_4', 'area_km2']] :: geom NAME_4 area_km2 0 Polygon Allmendingen 46.050899 1 Polygon Altheim 7.711316 2 Polygon Berghülen 26.007805 3 Polygon Blaubeuren 79.237158 4 Polygon Blaustein 54.932116 ... ... ... ... 11297 Polygon Sachsenhausen 4.862127 11298 Polygon Schwerstedt 6.889861 11299 Polygon Vippachedelhausen 10.378227 11300 Polygon Wohlsborn 4.077278 11301 Polygon Weimar 84.403305 [11302 rows x 3 columns] **Example: distance calculation** The example below shows how to calculate distances in km of each city to the centroid of the city Wuppertal applying the method ``Distance``: .. code-block:: python # Read, transform to UTM Zone32, return in 'Memory' (RAM), get DataFrame and filter it lrs = LayersReader('D:/tmp/DEU_adm_shp/DEU_adm4.shp').transform(epsg=32632) df = lrs.data_frame()[['geom', 'NAME_4']] # to DataFrame # Get the centroid of the City Wuppertal as ``osgeo.ogr.Geometry`` geom_wuppertal = df[df['NAME_4']=='Wuppertal'].geom.apply(lambda g: g.apply('Centroid')).iloc[0] # Apply the method ``osgeo.ogr.Geometry.Distance`` and convert distance to km df['distWupper'] = df['geom'].apply(lambda g: g.apply('Distance', geom_wuppertal) / 1000.0) print df[['geom', 'NAME_4', 'distWupper']] :: geom NAME_4 distWupper 0 Polygon Allmendingen 367.195868 1 Polygon Altheim 372.542163 2 Polygon Berghülen 358.381501 3 Polygon Blaubeuren 361.697021 4 Polygon Blaustein 360.106787 ... ... ... ... 11297 Polygon Sachsenhausen 293.474305 11298 Polygon Schwerstedt 287.096263 11299 Polygon Vippachedelhausen 281.862430 11300 Polygon Wohlsborn 293.756043 11301 Polygon Weimar 285.625870 [11302 rows x 3 columns] .. _distwuppertal: .. figure:: images/distwuppertal.jpg :width: 300pt Save DataFrame as layers ^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: python from girs.feat.layers import data_frame_to_layer lrs = LayersReader('D:/tmp/girs/DEU_adm_shp/DEU_adm4.shp') # Read print lrs.get_field_definitions_data_frame() lrs = data_frame_to_layer(lrs.data_frame()) # , 'D:/tmp/girs/DEU_adm_shp/DEU_adm4_from_df.shp') print lrs.get_field_definitions_data_frame()