Hi there 👋

StabRise - Document Processing Solutions

Our projects

PDF DataSource for the Apache Spark

Spark Pdf


Source Code: https://github.com/StabRise/spark-pdf

Home page: https://stabrise.com/spark-pdf/

Quick Start Jupyter Notebook: https://github.com/StabRise/spark-pdf/blob/main/examples/PdfDataSource.ipynb


The project provides a custom data source for the Apache Spark that allows you to read PDF files into the Spark DataFrame.

Key features:

ScaleDP

ScaleDP


Source Code: https://github.com/StabRise/scaledp

Home page: https://stabrise.com/scaledp/

Quick Start Jupyter Notebook: https://github.com/StabRise/ScaleDP-Tutorials/blob/master/1.QuickStart.ipynb


ScaleDP is an Open-Source Library for processing documents using Apache Spark.

Key features:

De-Identify

De-Identify

De-Identify is tool for de-identification/anonymization data

Supported formats