Connecting Azure SQL through PySpark

  1. Download sqljdbc
  2. Extract files in a directory c:/spark
  3. Copy & paste c:/spark/sqljdbc_6.0/enu/auth/sqljdb_auth.dll to c:/windows/system32
  4. Create test-sql.py as below
from pyspark import SparkContext
from pyspark.sql import SQLContext, Row
from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("local").setAppName("My App")
sc = SparkContext(conf = conf)

query = "(SELECT Id from Matter) as mt"

sqlContext = SQLContext(sc)

df = sqlContext.read.format("jdbc").options(
  url="jdbc:sqlserver://domain.database.windows.net;" +
         "databaseName=collaborate-7;user=[USER_ID]];password=[Password]];Integrated Security=False",
  driver="com.microsoft.sqlserver.jdbc.SQLServerDriver",  
  dbtable=query).load()

df.show() 
  1. Run cmd with driver as spark-submit –driver-class-path c:/spark/sqljdbc_6.0/enu/jre8/sqljdbc42.jar test-sql.py
  2. Output
<class 'pyspark.sql.dataframe.DataFrame'>
+---+
| Id|
+---+
| 12|
| 13|
|  1|
|  7|
|  2|
|  8|
|  9|
| 10|
| 11|
|  3|
+---+
only showing top 10 rows

See more spark-pyspark-to-extract-from-sql-server

Neural Network Basics

  • Softmax function approximates probability distribution by normalising data to 0-1 range within a collection. The sum of probabiliy within the collection is 1.

    See usage of softmax in classification at softmax-regression-related-logistic-regression.

Lambda Calculus Pt. 2

Example of Free Variables

  • x is free in E if:
    • Rule 1 (E = x) ? yes, since there is no meta variable for x
    • Rule 2 (E = λy.E1) ? yes, if y != x
    • Rule 3 (E = E1 E2) ? yes, if either E1 or E2 is free
      • x free in xλx.x ? Yes, x(λx.x) where x is free in E1
      • x free in (λx.xy)x ? Yes, since E2 is free
      • x free in λx.yx ? No, λx.(yx)

Combinators

An expression is a combinator if it doesn’t have any free variables

  • λx.λy.xyx combinator ? yes (The function takes x and y as parameters and do xyx)
  • λx.x ? yes
  • λz.λy.xyz ? no, since y is free

Bound Variables

  • If a variable is not free then it is Bound
    • Bound Rule 1: If x is free in E then it is bound by λx in λx.E
    • Bound Rule 2: If x is bound by λx in E then x is bound by same inner most λx in λz.E
      • Even if z == x
      • λx.λx.x
        • Which lambda expression binds x? inner most λx
    • Bound Rule 3: If x is bound by λx in E1 then E1 is tied by same abstraction λx in E1 E2 and E2 E1. Think it easy as it said.

Examples

  • (λx.x(λy.xyzy)x)xy
    • x.xy.xyzy)x)xy
  • (λx.λy.xy)(λz.xz)
    • xy.xy)(λz.xz)
  • (λx.xλx.zx)
    • x.xλx.zx)

Equivalence

  • What does it mean for two functions to be equivalent?
    • λy.y = λx.x ? Same functions
    • λx.xy = λy.yx ?
    • λx.x = λx.x ? Same functions

α-Equivalence

When two functions vary only by names of bound variables E1 = αE2

  • λx.xλy.xyz
    • Can we rename x to foo? yes
    • Can we rename y to bar? yes
    • Can we rename y to x? no, since it creates bound variable & changes the existing semantics
    • Can we rename z to x? no, since it creates bound variable & changes the existing semantics

Renaming Operation

E{y/x} = renaming x to y

  • x{y/x} = y
  • z{y/x} = z, if x != z
  • (E1 E2) {y/x} = (E1{y/x})(E2{y/x})
  • (λx.E){y/x} = (λy.E{y/x})
  • (λz.E){y/x} = (λz.E{y/x}), if x != z

Examples

  • (λx.x){foo/x} = (λfoo.(x){foo/x}) = (λfoo.(foo))

  • ((λx.x(λy.xyzy)x)xy){bar/x} = (λx.x(λy.xyzy)x){bar/x}(x){bar/x}(y){bar/x} … = (λbar.(x(λy.xyzy)x){bar/x}) bar y = (λbar.(bar(λy.(bar y z y))bar)) bar y

α-Equivalence

  • For all expressions E and all variables y that do not occur in E
    • λx.E = αy.(E{y/x})
  • λy.y = λx.x ? yes
  • ((λx.x(λy.xyzy)x)xy) = ((λy.y(λz.yzwz)y)yx) ? no
  • (λx.x(λy.xyzy)x) = (λy.y(λz.yzwz)y) ? no, bound variables can be different name, but you shouldn’t change free variable such as z as it will change semantics

Substituion

(λx.+x 1)x -> (+ 1 2)

  • Can we use renaming? no
  • We need another operator, called substitution, to replace a variable by a lambda expression
    • E[x->N], where E and N are lambda expressions and x is a name
  • (+ x 1)[x->2] = (+ 2 1)
  • (λx.+ x 1)[x->2] = (λx.+ x 1) Nothing to change since the varialbe is bound
  • (λx.yx)[y->λz.xz] != (λx.(λz.xz)x) = (λw.(λz.xz)w) See CSE 340 11-25-15 Lecture: “Lambda Calculus Pt. 2”

Lambda Calculus Pt. 1

Lambda calculus is a language to express function application, which enables us to

  • Define functions (anonymous)
  • Apply functions

It is also a source of functional programming, and consists of Syntax (grammar) and Symantics (meaing) as other languages.

Syntax

There are only 4 syntactic expressions in lambda calculus (E for expression)

  • Rule 1: E -> ID e.g. x
  • Rule 2: E -> λID.E e.g. λx.x
  • Rule 3: E -> E E e.g. foo λ bar . (foo (bar baz))
  • Rule 4: E -> (E)

But, there is ambiguous syntax in such as λx.xy since it can be parsed in either direction e.g. Rule 2, 3 and Rule 3, 2 This can resolved with disambiguation rules

  • E -> E E is left associative e.g. xyz is (xy)z and wxyz is ((wx)y)z
  • λID.E extends as far to the right e.g. λx.xy is λx.(xy) and λx.λx.x is λx.(λx.(x))

Examples

  • (λx.y)x != λx.yx (== λx.(yx))
  • λx.(x)y == λx.((x)y)
  • λa.λb.λc.abc == λa.(λb.(λc.((ab)c)))

Semantics

Every ID in lambda calculus is called as “variable”

  • E -> λID.E is “abstraction”
    • ID is the variable of the abstraction
    • E is the body of abstration
  • E -> E E is “application” (Calling a function)

  • Semantic 1: E -> λID.E defines a new anonymous function
    • That’s why anonymous function is called “lambda expressions” in programming language
    • ID is the parameter of the function
    • E is the body of the function
  • Semantic 2: E -> E1 E2, is similar to calling function E1 and setting its parameter to be E2

Examples

  • Semantic 1: λx.+ x 1 (Taking x as parameter and summing up x and 1)
  • Semantic 2: (λx.+ x 1)2 (Calling function E1 setting its parameter as 2 i.e. + 2 1 = 3)

How can + function be defined if abractions only accept 1 parameter? Currying

  • A function that takes multiple arguments into a seuqence of functions that each take a single arguments

  • λx.λy.((+ x) y)
  • (λx.λy.((+ x) y))1 = λy.((+ 1) y)
  • (λx.λy.((+ x) y)) 10 20 = (λy.((+ 10) y))20 = ((+ 10) 20) = 30

Free variable

It doesn’t appear within the body of the abstraction with a metavariable of the same name (e.g. x in λx)

  • x free in λx.xyz ? no
  • y free in λx.xyz ? yes
  • x free in (λx.(+ x 1)) x ? yes
  • z free in λx.λy.λz.zyx ? no
  • x free in (λx.z foo)(λy.y x) ? yes

See CSE 340 11-23-15 Lecture: “Lambda Calculus Pt. 1”

Set Up Spark on Windows

You just need to point IntelliJ to wherever Spark is located which you can do by adding them as External Libraries, to do this,

  1. go to the mynewproject folder we created in the video

  2. Right Click on it and go to Module Settings (Shortcut for this is just F4)

  3. On the left had side panel under Project Settings click on Libraries

  4. Click on the plu sign to add a new library and then select Scala SDK and then choose Browse

  5. Go to where Spark is installed on your Computer (it should be under C://Spark if you installed it on Windows, or under a hidden folder like bin/Cellar for a MAC Homebrew install)

  6. Then go to Spark/jars/ and select all the .jar files there (you don’t actually have to pick them all, its just easier then manually selecting only the ones you need)

  7. Then hit Ok and Apply these new external libraries

  8. You should now be able to write out Spark libraries on IntelliJ, keep in mind though, using Spark with Scala Worksheets instead of just s normal Scala script can be cumbersome/buggy because of the restriction of being able to only have one Spark Context open at a time, and the interactive nature of the worksheet may constantly be calling new SparkContexts over and over again leading to error notices.

Here is a video on Youtube that actually walks through this (but note he uses Maven):

https://www.youtube.com/watch?v=GGf6OqjaGw4

http://stackoverflow.com/questions/38127062/java-lang-nosuchmethoderror-scala-predef-arrowassocljava-lang-objectljava-l