Skip to content

Running Dotnet Spark applications on Ubuntu ContinerΒΆ

Let's create the first console application called HelloSparkΒΆ

  1. Create console Application

    Bash
    dotnet new console -o HelloSpark
    

  2. Add nuget package Microsoft.Spark

    Bash
    cd ~/Projects/HelloSpark
    dotnet add package Microsoft.Spark
    

  3. Let open in VS Code and add some code to it

    Bash
    code .
    
    and add following code ion Program.cs
    C#
    using Microsoft.Spark.Sql;
    
    namespace HelloSpark
    {
        class Program
        {
            static void Main(string[] args)
            {
                var spark = SparkSession.Builder().GetOrCreate();
                var df = spark.Read().Json("people.json");
                df.Show();
            }
        }
    }
    

  4. Add person.js file

    JSON
    {"name":"Michael"}
    {"name":"Andy", "age":30}
    {"name":"Justin", "age":19}
    

  5. Change .csproj file to copy json file to output directory ```xml{10-12}

Exe netcoreapp3.1

Always

Text Only
6. Build the application
dotnet build
Text Only
7. Run the application in the docker container with network mapped to host
```bash
docker run -d --name dotnet-spark --network host -v "$HOME/Projects/HelloSpark/bin/Debug:/dotnet/Debug" 3rdman/dotnet-spark:latest
Docker containe should be running at this point Running Container

Other option would be to run with port mapping

Bash
docker run -d --name dotnet-spark -p 8080:8080 -p 8081:8081 -p 5567:5567 -p 4040:4040 -v "$HOME/Projects/HelloSpark/bin/Debug:/dotnet/Debug" 3rdman/dotnet-spark:latest
where port 8080 (spark-master), 8081 (spark-slave) 5567 (backend-debugging) and 4040 (Spark UI)

  1. Configure VS Code to debug the application launch.json
    JSON
    {
        // Use IntelliSense to learn about possible attributes.
        // Hover to view descriptions of existing attributes.
        // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
        "version": "0.2.0",
        "configurations": [
            {
                "name": ".NET Core Launch (console)",
                "type": "coreclr",
                "request": "launch",
                "preLaunchTask": "build",
                // If you have changed target frameworks, make sure to update the program path.
                "program": "${workspaceFolder}/bin/Debug/netcoreapp3.1/HelloSpark.dll",
                "args": [],
                "cwd": "${workspaceFolder}",
                // For more information about the 'console' field, see https://aka.ms/VSCode-CS-LaunchJson-Console
                "console": "internalConsole",
                "stopAtEntry": false,
                "logging": {
                    "moduleLoad": false
                }
            },
            {
                "name": ".NET Core Attach",
                "type": "coreclr",
                "request": "attach",
                "processId": "${command:pickProcess}"
            }
        ]
    }
    
    When we hit debug it might say the build task is note avialable, it will ask to create one, choose .Net Core VS Code task

Awesome, we could see now we are able to run and debug the application Dubugging

Tip

πŸ‘ Aswesome!!! No hassle of configuring all the spark and stuff locally

Spark Master Spark UI

Warning

While debugging on VS Code in Ubuntu use only one instance otherwise debugger won't work.

Warning

TODO: Investigate why the execution is looking for json file at '/dotnet/Debug/netcoreapp3.1/' for other project Project URL

Commands to execute after cloning:

Bash
dotnet restore
dotnet build
docker run -d --name dotnet-spark-helloudf --network host -v "$HOME/Projects/HelloUdf/bin/Debug:/dotnet/HelloUdf/Debug" 3rdman/dotnet-spark:latest
docker exec -it dotnet-spark-helloudf /bin/bash

Failure

Error Somehow docker is looking for Microsoft.Spark.Worker in host directory java.io.IOException: Cannot run program "/home/rig/bin/Microsoft.Spark.Worker/Microsoft.Spark.Worker": error=2, No such file or directory