Example Code
Initialization Code
Everytime you run a job on Wukong, you’ll need to create an instance of the LocalCluster
class as well as an instance of the Client
class.
1import dask.array as da
2from wukong import LocalCluster, Client
3local_cluster = LocalCluster(
4host="10.0.88.131:8786",
5 proxy_address = "10.0.88.131",
6 proxy_port = 8989,
7 num_lambda_invokers = 4,
8 chunk_large_tasks = False,
9 n_workers = 0,
10 use_local_proxy = True,
11 local_proxy_path = "/home/ec2-user/Wukong/KV Store Proxy/proxy.py",
12 redis_endpoints = [("127.0.0.1", 6379)],
13 use_fargate = False)
14client = Client(local_cluster)
In all of the following examples, the code given assumes you’ve created a local cluster and client object first.
Linear Algebra
Wukong supports many popular linear algebra operations such as Singular Value Decomposition (SVD) and TSQR (Tall-and-Skinny QR Reduction).
Singular Value Decomposition (SVD)
Tall-and-Skinny Matrix
Here, we are computing the SVD of a 200,000 x 100 matrix. In this case, we partition the original matrix into chunks of size 10,000 x 100.
1X = da.random.random((200000, 100), chunks=(10000, 100)).persist()
2u, s, v = da.linalg.svd(X)
3v.compute(scheduler = client.get)
Square Matrix
We can also compute the SVD of a square matrix – in this case, the input matrix is size 10,000 x 10,000. We partition this input matrix into chunks of size 2,000 x 2,000 in this example.
1X = da.random.random((10000, 10000), chunks=(2000, 2000)).persist()
2u, s, v = da.linalg.svd_compressed(X, k=5)
3v.compute(scheduler = client.get)
QR Reduction
1X = da.random.random((128, 128), chunks = (16, 16))
2q, r = da.linalg.qr(X)
3r.compute(scheduler = client.get)
Tall-and-Skinny QR Reduction (TSQR)
We can also compute the tall-and-skinny QR reduction of matrices using Wukong.
1X = da.random.random((262_144, 128), chunks = (8192, 128))
2q, r = da.linalg.tsqr(X)
3r.compute(scheduler = client.get)
Cholesky Decomposition
1def get_sym(input_size):
2 A = da.ones((input_size,input_size), chunks = chunks)
3 lA = da.tril(A)
4 return lA.dot(lA.T)
5
6input_matrix = get_sym(100)
7X = da.asarray(input_matrix, chunks = (25,25))
8
9# Pass 'True' for the 'lower' parameter if you wish to compute the lower cholesky decomposition.
10chol = da.linalg.cholesky(X, lower = False)
11chol.compute(scheduler = client.get)
General Matrix Multiplication (GEMM)
1x = da.random.random((10000, 10000), chunks = (2000, 2000))
2y = da.random.random((10000, 10000), chunks = (2000, 2000))
3
4z = da.matmul(x, y)
5z.compute(scheduler = client.get)
Machine Learning
Wukong also supports many machine learning workloads through the use of Dask-ML
.
Support Vector Classification (SVC)
1import pandas as pd
2import seaborn as sns
3from collections import defaultdict
4import sklearn.datasets
5from sklearn.svm import SVC
6
7import dask_ml.datasets
8from dask_ml.wrappers import ParallelPostFit
9
10X, y = sklearn.datasets.make_classification(n_samples=1000)
11clf = ParallelPostFit(SVC(gamma='scale'))
12clf.fit(X, y)
13
14results = defaultdict(list)
15
16X, y = dask_ml.datasets.make_classification(n_samples = 100000,
17 random_state = 100000,
18 chunks = 100000 // 20)