Taken/adapted from: https://ucdavis-bioinformatics-training.github.io/2022-Feb-Introduction-To-Python-For-Bioinformatics/python/python2
Make sure you have VSCode and python installed. Let’s write our first piece of code.
Open a new file in VSCode.
print("Hello, World!")
Save the file and name it “helloworld.py”. Note the “.py” extension. Click the triangle “play” button to run your code in the terminal. Python can also be run interactively. Type in “python” into the terminal. Paste/type in the code snippets in the terminal.
Finally, you can also run python in a Jupyter notebook. The extension for this is “.ipynb”. Each “chunk” of code can be run within the notebook.
int
: integers, these are whole numbers, negative or positive.n = 42
float
: floats, or real numbers (i.e., doubles)n_pi = 3.14
gene = "TAF1"
print(gene)
print("My gene is called:", gene)
print("My gene is called:" + gene)
Note, print
outputs to your screen. What is the difference in output between the last two print statements?
control = False
treatment = True
type()
function returns the data type of the variableprint("The data type of the variable 'n' is:")
print(type(n))
print("The data type of the variable 'gene' is:")
print(type(gene))
Strings have a long list of built-in methods to return modified versions of the string.
tmpstr = "Hello my name is X"
allcaps = tmpstr.upper()
print(allcaps)
newstr = tmpstr.replace("X","Y")
print(newstr)
tmpstr2 = "How are you doing?"
print(tmpstr + " " + tmpstr2)
print(tmpstr2.find("you"))
print(tmpstr2.find("california"))
More here: https://www.w3schools.com/python/python_ref_string.asp
user
. Then print out a sentence that reads “This is my NAME codebook”, where NAME is replaced by your user
text.Casting is converting certain values for certain datatypes to other datatypes. Some examples are listed below.
bool()
tmpstr = "Hello"
tmpbool = bool(tmpstr)
print(tmpbool)
str()
n = 42.24
print("The number is: " + str(x))
int()
mystr = "50"
myint = int(mystr) + 1
print(str(myint))
Comparisons are useful and some of them most common operations performed. Numerical and string comparisons can be done like so:
print(1<1)
print(1<2)
print(2>1)
print(1<=1)
print(2>=1)
print(1==1)
print(0==1)
gene = "TAF1"
greeting = "hello"
print(gene == "BRCA2")
print(greeting == "hello")
a = 42
b = 7
print(a + b)
print(b - a)
c = 83
c += 5
print(c)
c -= 10
print(c)
print(a/b)
print(4**b)
#or
expb = pow(4,b)
print(expb)
print(42 % 4)
av = abs(24-42)
print(av)
print(round(4.2))
print(int(4.2))
import math
print(math.ceil(4.2))
print(math.floor(4.8))
print(int(4.8))
import random
). Then, write some code that assigns a random number to two variables (x and y) with the random.random()
function. Then calculate the average value of these variables (x and y), and round the result to the nearest two decimal points. Assign that result to the variable z, and print it out.In addition to simple data types, we have collections of these data types into things we call data structures. These include:
gene_tuple = ("DDX11L1","WASH7P","MIR6859-1","MIR1302-2HG","MIR1302-2","FAM138A")
gene_list = ["DDX11L1","WASH7P","MIR6859-1","MIR1302-2HG","MIR1302-2","FAM138A"]
In python, lists are 0-indexed. This means to access the first element, we use the index value of 0. However, we can also use “negative indices” to access this (based on the length of the list):
print(gene_list[0])
print(gene_list[-6])
print(gene_list[5])
print(gene_list[-1])
print(gene_list[-3:]) # last three elements
print(gene_list[1:3]) # elements 2 to 3
print(gene_list[:3]) # up to element 3
print(gene_list[1:2]) # second element, but returns a list
print(gene_list[1]) # second element, but returns a string
mystring = "The quick brown fox jumps over the lazy dog"
print(mystring[4:9])
print("The length of gene_list is " + str(len(gene_list)))
forty_twos = ["42", 42, "forty-two", 42.0]
val = forty_twos[1]
print(val)
print(type(val))
val = forty_twos[2]
print(val)
print(type(val))
gene_list2 = gene_list
gene_list2[2] = "DMR3"
print(gene_list)
gene_list2 = gene_list.copy()
gene_list2[2] = "DMR5"
print(gene_list)
print(gene_list2)
gene_list + gene_list2
print("BRCA2" in gene_list)
Some basic list methods that are useful include .append()
, .remove()
, and .reverse()
. More methods can be found here: https://www.tutorialsteacher.com/python/list-methods
gene_list.append("BRCA2")
print(gene_list)
gene_list.remove("WASH7P")
print(gene_list)
gene_list.reverse()
print(gene_list)
gene_list
variable from the previous code to a new variable called gene_list_test
. Add three random genes and remove the third gene in the list. Add a gene to the middle of the list.gene_exp_dict = {"DDX11L1":43.2,"WASH7P":45,"MIR6859-1":60.1,"MIR1302-2HG":12,"MIR1302-2":0.5,"FAM138A":23}
print(gene_exp_dict["WASH7P"])
gene_exp_dict["WASH7P"] = 39
print(gene_exp_dict["WASH7P"])
gene_exp_dict["BRCA2"] = 100
print(gene_exp_dict)
print(gene_exp_dict.keys())
print(gene_exp_dict.values())
print(gene_exp_dict.items())
print("BRCA2" in gene_exp_dict)
gene_exp_dict_copy = gene_exp_dict.copy()
s1 = {1, 2, 3}
s2 = set([1, 2, 3, 4])
print(f"Set s1: {s1}")
print(f"Set s2: {s2}")
set_union = s1.union(s2)
print(set_union)
set_union = (s1 | s2)
print(set_union)
More here: https://www.dataquest.io/blog/data-structures-in-python/
gene_exp_dict
. Populate the dictionary with at least 10 genes for 5 samples. Then, write the code to get the gene expression for the 4th gene, 2nd sample.Back to the homepage