Example Code

Initialization Code

Everytime you run a job on Wukong, you’ll need to create an instance of the LocalCluster class as well as an instance of the Client class.

 1import dask.array as da
 2from wukong import LocalCluster, Client
 3local_cluster = LocalCluster(
 4host="10.0.88.131:8786",
 5    proxy_address = "10.0.88.131",
 6    proxy_port = 8989,
 7    num_lambda_invokers = 4,
 8    chunk_large_tasks = False,
 9    n_workers = 0,
10    use_local_proxy = True,
11    local_proxy_path = "/home/ec2-user/Wukong/KV Store Proxy/proxy.py",
12    redis_endpoints = [("127.0.0.1", 6379)],
13    use_fargate = False)
14client = Client(local_cluster)

In all of the following examples, the code given assumes you’ve created a local cluster and client object first.

Linear Algebra

Wukong supports many popular linear algebra operations such as Singular Value Decomposition (SVD) and TSQR (Tall-and-Skinny QR Reduction).

Singular Value Decomposition (SVD)

Tall-and-Skinny Matrix

Here, we are computing the SVD of a 200,000 x 100 matrix. In this case, we partition the original matrix into chunks of size 10,000 x 100.

1X = da.random.random((200000, 100), chunks=(10000, 100)).persist()
2u, s, v = da.linalg.svd(X)
3v.compute(scheduler = client.get)

Square Matrix

We can also compute the SVD of a square matrix – in this case, the input matrix is size 10,000 x 10,000. We partition this input matrix into chunks of size 2,000 x 2,000 in this example.

1X = da.random.random((10000, 10000), chunks=(2000, 2000)).persist()
2u, s, v = da.linalg.svd_compressed(X, k=5)
3v.compute(scheduler = client.get)

QR Reduction

1X = da.random.random((128, 128), chunks = (16, 16))
2q, r = da.linalg.qr(X)
3r.compute(scheduler = client.get)

Tall-and-Skinny QR Reduction (TSQR)

We can also compute the tall-and-skinny QR reduction of matrices using Wukong.

1X = da.random.random((262_144, 128), chunks = (8192, 128))
2q, r = da.linalg.tsqr(X)
3r.compute(scheduler = client.get)

Cholesky Decomposition

 1def get_sym(input_size):
 2    A = da.ones((input_size,input_size), chunks = chunks)
 3    lA = da.tril(A)
 4    return lA.dot(lA.T)
 5
 6input_matrix = get_sym(100)
 7X = da.asarray(input_matrix, chunks = (25,25))
 8
 9# Pass 'True' for the 'lower' parameter if you wish to compute the lower cholesky decomposition.
10chol = da.linalg.cholesky(X, lower = False)
11chol.compute(scheduler = client.get)

General Matrix Multiplication (GEMM)

1x = da.random.random((10000, 10000), chunks = (2000, 2000))
2y = da.random.random((10000, 10000), chunks = (2000, 2000))
3
4z = da.matmul(x, y)
5z.compute(scheduler = client.get)

Machine Learning

Wukong also supports many machine learning workloads through the use of Dask-ML.

Support Vector Classification (SVC)

 1import pandas as pd
 2import seaborn as sns
 3from collections import defaultdict
 4import sklearn.datasets
 5from sklearn.svm import SVC
 6
 7import dask_ml.datasets
 8from dask_ml.wrappers import ParallelPostFit
 9
10X, y = sklearn.datasets.make_classification(n_samples=1000)
11clf = ParallelPostFit(SVC(gamma='scale'))
12clf.fit(X, y)
13
14results = defaultdict(list)
15
16X, y = dask_ml.datasets.make_classification(n_samples = 100000,
17                                            random_state = 100000,
18                                            chunks = 100000 // 20)