Connecting Azure SQL through PySpark
31 Mar 2017
- Download sqljdbc
- Extract files in a directory c:/spark
- Copy & paste c:/spark/sqljdbc_6.0/enu/auth/sqljdb_auth.dll to c:/windows/system32
- Create test-sql.py as below
from pyspark import SparkContext
from pyspark.sql import SQLContext, Row
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("My App")
sc = SparkContext(conf = conf)
query = "(SELECT Id from Matter) as mt"
sqlContext = SQLContext(sc)
df = sqlContext.read.format("jdbc").options(
url="jdbc:sqlserver://domain.database.windows.net;" +
"databaseName=collaborate-7;user=[USER_ID]];password=[Password]];Integrated Security=False",
driver="com.microsoft.sqlserver.jdbc.SQLServerDriver",
dbtable=query).load()
df.show()
- Run cmd with driver as spark-submit –driver-class-path c:/spark/sqljdbc_6.0/enu/jre8/sqljdbc42.jar test-sql.py
- Output
<class 'pyspark.sql.dataframe.DataFrame'>
+---+
| Id|
+---+
| 12|
| 13|
| 1|
| 7|
| 2|
| 8|
| 9|
| 10|
| 11|
| 3|
+---+
only showing top 10 rows
See more spark-pyspark-to-extract-from-sql-server