DataFrame

Get layer as pandas DataFrame

LayersSet provides a method to create a pandas DataFrame from a osgeo.ogr.Layer. Layer zero is the default layer number.

The DataFrame:

  • uses the feature ID as index
  • has a special column named _GEOM_, which contains all information about the layer.

Note

If you intent to convert the DataFrame back to layer

  1. do not remove the _GEOM_ column.
  2. Do not rename the columns if the ogr field types should to be maintained
lrs = LayersReader('D:/tmp/DEU_adm_shp/DEU_adm4.shp')
df = lrs.data_frame()
print df
        _GEOM_  ID_0  ISO   NAME_0  ID_1             NAME_1  ID_2  \
FID
0      Polygon    86  DEU  Germany     1  Baden-Württemberg     1
1      Polygon    86  DEU  Germany     1  Baden-Württemberg     1
...        ...   ...  ...      ...   ...                ...   ...
11300  Polygon    86  DEU  Germany    16          Thüringen   402
11301  Polygon    86  DEU  Germany    16          Thüringen   403
[11302 rows x 17 columns]

Show layers as pandas DataFrame

The method show() creates a DataFrame object and sets some display properties:

lrs = LayersReader('D:/tmp/DEU_adm_shp/DEU_adm4.shp')
lrs.show(width=300, max_rows=6)
                _GEOM_  ID_0  ISO   NAME_0  ID_1             NAME_1  ID_2  \
FID
0      Polygon    86  DEU  Germany     1  Baden-Württemberg     1
1      Polygon    86  DEU  Germany     1  Baden-Württemberg     1
2      Polygon    86  DEU  Germany     1  Baden-Württemberg     1
3      Polygon    86  DEU  Germany     1  Baden-Württemberg     1
4      Polygon    86  DEU  Germany     1  Baden-Württemberg     1
...        ...   ...  ...      ...   ...                ...   ...
11297  Polygon    86  DEU  Germany    16          Thüringen   402
11298  Polygon    86  DEU  Germany    16          Thüringen   402
11299  Polygon    86  DEU  Germany    16          Thüringen   402
11300  Polygon    86  DEU  Germany    16          Thüringen   402
11301  Polygon    86  DEU  Germany    16          Thüringen   403

[11302 rows x 17 columns]

pandas methods can be used:

lrs = LayersReader('D:/tmp/DEU_adm_shp/DEU_adm4.shp')
df = lrs.data_frame()
df_nrw = df[df['NAME_1']=='Nordrhein-Westfalen']
df_nrw = df_nrw.drop(['ID_0', 'ISO', 'NAME_0', 'ID_1'], axis=1)
print df_nrw
               _GEOM_               NAME_1                NAME_3
    FID
    5946  Polygon  Nordrhein-Westfalen             Bielefeld
    5947  Polygon  Nordrhein-Westfalen                Bochum
    5948  Polygon  Nordrhein-Westfalen                  Bonn
    5949  Polygon  Nordrhein-Westfalen                 Ahaus
    5950  Polygon  Nordrhein-Westfalen               Bocholt
    ...       ...                  ...                   ...
    6337  Polygon  Nordrhein-Westfalen              Sonsbeck
    6338  Polygon  Nordrhein-Westfalen  Voerde (Niederrhein)
    6339  Polygon  Nordrhein-Westfalen                 Wesel
    6340  Polygon  Nordrhein-Westfalen                Xanten
    6341  Polygon  Nordrhein-Westfalen             Wuppertal

[396 rows x 3 columns]

_GEOM_ column

The column _GEOM_ is central in the DataFrame: Behind each column cell there is an object DataFrameFeature, which contains the osgeo.ogr.Layer and the corresponding feature id of the row. With that, any attribute or method described in http://gdal.org/python/osgeo.ogr.Geometry-class.html. can be used:

AddGeometry, AddGeometryDirectly, AddPoint, AddPointM, AddPointZM, AddPoint_2D, Area,
AssignSpatialReference, Boundary, Buffer, Centroid, Clone, CloseRings, Contains, ConvexHull,
CoordinateDimension, Crosses, DelaunayTriangulation, Destroy, Difference, Disjoint, Distance,
Empty, Equal, Equals, ExportToGML, ExportToIsoWkb, ExportToIsoWkt, ExportToJson, ExportToKML,
ExportToWkb, ExportToWkt, FlattenTo2D, GetArea, GetBoundary, GetCoordinateDimension,
GetCurveGeometry, GetDimension, GetEnvelope, GetEnvelope3D, GetGeometryCount, GetGeometryName,
GetGeometryRef, GetGeometryType, GetLinearGeometry, GetM, GetPoint, GetPointCount, GetPointZM,
GetPoint_2D, GetPoints, GetSpatialReference, GetX, GetY, GetZ, HasCurveGeometry, Intersect,
Intersection, Intersects, Is3D, IsEmpty, IsMeasured, IsRing, IsSimple, IsValid, Length, Overlaps,
PointOnSurface, Segmentize, Set3D, SetCoordinateDimension, SetMeasured, SetPoint, SetPointM,
SetPointZM, SetPoint_2D, Simplify, SimplifyPreserveTopology, SymDifference, SymmetricDifference,
Touches, Transform, TransformTo, Union, UnionCascaded, Value, Within, WkbSize, next

The method DataFrameFeature.apply(method) applies the given method to all geometries in the column _GEOM_.

Example: area calculation

The example below shows how to calculate areas in km²:

  1. Read layers with LayersReader
  2. Transform the coordinate system from WGS84 into UTM Zone32 and return a LayersWriter in memory.
  3. The tranformed LayersWriter instance is transformed into a data frame
  4. The method ‘GetArea’ is applied to each geometry g. The returned values are saved in a new DataFrame column ‘area_km2’
  5. Print only columns ‘GEOM’, ‘NAME_4’, and ‘area_km2’
1
2
3
4
5
lrs = LayersReader('D:/tmp/girs/DEU_adm_shp/DEU_adm4.shp')  # Read
lrs = lrs.transform(epsg=32632)  # Transform to UTM Zone 32 and return in 'Memory' (RAM)
df = lrs.data_frame()  # to DataFrame
df['area_km2'] = df['_GEOM_'].apply(lambda g: g.apply('GetArea') / 1000000.0)
print df[['_GEOM_', 'NAME_4', 'area_km2']]
          geom             NAME_4   area_km2
0      Polygon       Allmendingen  46.050899
1      Polygon            Altheim   7.711316
2      Polygon          Berghülen  26.007805
3      Polygon         Blaubeuren  79.237158
4      Polygon          Blaustein  54.932116
...        ...                ...        ...
11297  Polygon      Sachsenhausen   4.862127
11298  Polygon        Schwerstedt   6.889861
11299  Polygon  Vippachedelhausen  10.378227
11300  Polygon          Wohlsborn   4.077278
11301  Polygon             Weimar  84.403305

[11302 rows x 3 columns]

Example: distance calculation

The example below shows how to calculate distances in km of each city to the centroid of the city Wuppertal applying the method Distance:

# Read, transform to UTM Zone32, return in 'Memory' (RAM), get DataFrame and filter it
lrs = LayersReader('D:/tmp/DEU_adm_shp/DEU_adm4.shp').transform(epsg=32632)
df = lrs.data_frame()[['geom', 'NAME_4']]  # to DataFrame
# Get the centroid of the City Wuppertal as ``osgeo.ogr.Geometry``
geom_wuppertal = df[df['NAME_4']=='Wuppertal'].geom.apply(lambda g: g.apply('Centroid')).iloc[0]
# Apply the method ``osgeo.ogr.Geometry.Distance`` and convert distance to km
df['distWupper'] = df['geom'].apply(lambda g: g.apply('Distance', geom_wuppertal) / 1000.0)
print df[['geom', 'NAME_4', 'distWupper']]
          geom             NAME_4  distWupper
0      Polygon       Allmendingen  367.195868
1      Polygon            Altheim  372.542163
2      Polygon          Berghülen  358.381501
3      Polygon         Blaubeuren  361.697021
4      Polygon          Blaustein  360.106787
...        ...                ...         ...
11297  Polygon      Sachsenhausen  293.474305
11298  Polygon        Schwerstedt  287.096263
11299  Polygon  Vippachedelhausen  281.862430
11300  Polygon          Wohlsborn  293.756043
11301  Polygon             Weimar  285.625870

[11302 rows x 3 columns]
../_images/distwuppertal.jpg

Save DataFrame as layers

from girs.feat.layers import data_frame_to_layer
lrs = LayersReader('D:/tmp/girs/DEU_adm_shp/DEU_adm4.shp')  # Read
print lrs.get_field_definitions_data_frame()
lrs = data_frame_to_layer(lrs.data_frame())  # , 'D:/tmp/girs/DEU_adm_shp/DEU_adm4_from_df.shp')
print lrs.get_field_definitions_data_frame()