To start a brand new CI/CD project three tools are necessary:
Git, a source control tool;
Nexus, a binary repository tool; and
Jenkins, an automation tool.
The above list are widely used FLOSS projects. Any other set of tools will fit the bill. There are two rules to follow no matter what:
there is absolutely no space for manual steps, from development to deployment in live environments. It means that every test and deployment must be automated and scripted; and
every piece of code, configuration and scripts mus be kept in source control.
To launch a Python Spark application in cluster mode it is necessary to broadcast the application to the workers, using the --py-files directive. I concluded that the best way to do it is to create a fat egg with the .py files, and extract the entry point python file from it. The packaged code is referenced from the Spark application adding this reference in the entry point file:
Simple Python code sample that checks if a Hive database or table exists.
from pyspark.sql import HiveContext
def database_exists(hc, db):
"""
Function that checks the existence of a Hive database
:param hc: Hive Context
:param db: database name
:return: bool, True if dabase exists
"""
return bool([x.datadaseName for x in hc.sql("SHOW DATABASES").collect() if x.datadaseName == db])
def table_exists(hc, db, table):
"""
Function that checks the existence of a Hive table
:param hc: Hive Context
:param db: database name
:param table: table name
:return: bool, True if table exists
"""
if database_exists(hc=hc, db=db):
hc.sql("USE %s" % (db,))
return table in hc.tableNames()
else:
False
To add a SSH key to GitHub it is necessary to work both in your computer and in GitHub web page. Lets start with the computer from which you want to commit code to your GitHub repositories. Later we will deal with GitHub.
@ your computer
First start by generating a ssh key. To generate a ssh key got to the command line and type:
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/Users/cadoado/.ssh/id_rsa): /Users/cadoado/.ssh/github_rsa
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /Users/cadoado/.ssh/github_rsa.
Your public key has been saved in /Users/cadoado/.ssh/github_rsa.pub.
The key fingerprint is:
SHA256:5egVnuUSe4NKS4l8HU4mnioQk4sacmjI3upq9eQ9Bfk cadoado@Cados-MBP
The key's randomart image is:
+---[RSA 2048]----+
| |
| . |
| + o B . |
|oo + . = & X |
|*o+ o S @ + |
|=o.o . * E o . |
|....= o = |
| .. + o |
|=o . |
+----[SHA256]-----+
The passphrase can be an empty string (just press Enter). Now you can check the key pair was generated:
The file github_rsa is the private part, that you shell never ever disclose. The file github_rsa.pub is the public part, that you distribute whenever requested to.
Assign the key to GitHub editing – or creating – a config file:
$ vi ~/.ssh/config
Add these lines to the config file (here you must refer to the private part of the key):
Finally copy/paste the public part of the previously generated key. Give the key a name because in the future you may end up with several keys (for each device you want to commit code from)
Final remarks
Now you can communicate conveniently and securely with GitHub, avoiding constant typing of your user/password.
You may need to reset the project origin in your local machine, please refer to the post Changing GitHub project origin.
foldleft is a partial applied function (curried), where first it is applied an initial value followed by an operation on a pair of elements from the sequence to be fold:
def foldLeft[B](z: B)(op: (B, A) ⇒ B): B
scala> val xs: List[Int] = List(1,2,3)
xs: List[Int] = List(1, 2, 3)
scala> xs.foldLeft(0){(acc, x) => acc + x}
res9: Int = 6
scala> xs.foldLeft(0)(_+_)
res10: Int = 6