Download - CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Transcript
Page 1: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

CSE111Green:ProgramDesignILecture15:

Modules,plo2ng,andmore

GuestLecturer:Prof.JoeHummel

RobertH.Sloan(CS)&RachelPoretsky(Bio)UniversityofIllinois,Chicago

October17,2017

Page 2: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

PYTHON STANDARD LIBRARY & BEYOND: MODULES

Page 3: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Extending Python

n  Every modern programming language has way to extend basic functions of language with new ones

n  Python: importing a module n  module: Python file with new capabilities defined in it n  One you import module, it's as if you typed it in: you get all

functions, objects, variables defined in it immediately

Page 4: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Python Standard Library

n  Python always comes with big set of modules n  List at https://docs.python.org/3/py-modindex.html n  Examples

csv Read/write csv files datetime Basic date & time types math Math stuff (e.g., sin(), cos(), sqrt() ) os E.g., list files in your operating system random random number generation urllib Open URLs, parse URLs

Page 5: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

BTW, did we mention

n  You will probably need csv for next Monday's lab n  And random for Project 2 n  And will use other modules too

csv Read/write csv files datetime Basic date & time types math Math stuff (e.g., sin(), cos(), sqrt() ) os E.g., list files in your operating system random random number generation urllib Open URLs, parse URLs

Page 6: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Using Modules

n  Use import<module_name> to make module's function's available

n  Style: Put all import statements at top of file n  After importmodule_name, access its functions (and

variables, etc.) through module_name.function_namen  If module_name is long, can abbreviate in import with as:

q  importmodule_nameasmnq  mn.function_name

n  Thereareafewlong-namedcommonmoduleswhereeverybodydoesthis

Page 7: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

If you prefer to save typing

n  (I mostly do not do this) n  To accessfunction_namewithout having to type module_name prefix, use:

frommodule_nameimportfunction_name

Page 8: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Dot notation remark

n  Python makes two different uses of dot notation q  methods, where as we've seen, we call method as

n  obj_name.method_name()

q  functions in modules n  module_name.function_name

Page 9: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Common but not standard

n  matplotlib and pandas 2 (of many) examples of modules not among modules required to come with every Python 3 q  matplotlib: Drawing graphs (in style of Matlab) q  We will do some work with it in our next lab q  pandas: Data science stuff; won't use it in this course

n  matplotlib very, very widely used, and pandas widely used n  Both among many modules that come with Anaconda

distribution of Python 3

Page 10: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

We all can make modules for ourselves

n  Modules used to group functions q  Both standard library or matplotlib and modules we write ourselves q  Very useful for clarity and reuse as overall project sizes get larger q  Not so need for your own modules in CS 111

n  Any file ending in .py can act as module

Page 11: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

CSV FILES AND A BIT MORE ON FILES GENERALLY

Page 12: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Files and real-world data (α)

n  Open has an optional third argument, specifying a character encoding

n  Irrelevant most of the time n  But you may need it if you are working with Arabic, Greek

Hebrew, Mandarin, Russian, etc. n  Or purely English materials using oddball symbols like § n  (About 90% of files with α or §have character encoding assumed

with not 3rd argument, but you could get one of the other 10%. Should not come up in CS 111 Green this semester.)

Page 13: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Files and real-world data (β): CSV

n  Structured text! In 2017, often want to communicate between all sorts of different electronic tools

n  CSV (comma-separated values) is format used by Excel, and very common for exchanging large collections of data

n  Python has a csv module and it has csv.writer() and csv.reader() functions that could help you q  Lab next week will have ecology data science flavor, and probably it

will have csv input

Page 14: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

CSV data: LHS is (real) Excel spreadsheet

Fall Semester UG Majors 1 Yr % Inc.2006 2152007 242 12.6%2008 252 4.1%2009 286 13.5%2010 318 11.2%2011 328 3.1%2012 385 17.4%2013 493 28.1%2014 594 20.5%2015 701 18.0%2016 843 20.3%

2017 est. 970 15.1%2017 rev. est. 1063 26.1%

n  And Excel can save it as CSV: Fall Semester,UG Majors,1 Yr % Inc. 2006,215, 2007,242,12.6% 2008,252,4.1% 2009,286,13.5% 2010,318,11.2% 2011,328,3.1% 2012,385,17.4% 2013,493,28.1% 2014,594,20.5% 2015,701,18.0% 2016,843,20.3% 2017 est.,970,15.1% 2017 rev. est. ,1063,26.1%

Page 15: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

For the record CSV format

n  Format is comma separating each value in a row; newline to end rows q  And can specify to use something else instead of comma

n  But 2017 data science work mostly goes on just knowing that it is a format that lots of software knows how to handle and we don't have to know that

n  Assuming we use csv module n  Would have to know if just open() followed by readlines()

Page 16: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Reading in csv data (coming to a lab or project near you)

n  importcsvn  Open file as usual:

q  fileref=open("filename.csv","r")

n  Thencreatecsvreaderobjectfromthefileref:q  data_reader=csv.reader(fileref)

n  Use for loop to iterate over that, each row à list of strings q  forrowindata_reader:q  #rowislistofstrings,1perentryinrowq  #processlistwithfororwithindexing

Page 17: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Similarly, csv writer objects

n  If you need to write a csv file, there is an analagous csv.writer function

n  and a csv writer object like q  wr = csv.writer(filerf) q  that has methods writerow() and writerows()

Page 18: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Programming: A Superpower

n  Why write Python programs and not just use Excel? 1.  We can write a program that computes anything, not just

what is built into Excel (all this biology just one example!) 2.  Excel not built for big data; Python is

q  Chicago crime data Prof. Sloan used in some security and privacy research has 1,048,576 million rows, 18 columns

q  Python: creating csv.reader and looping over all the rows to count them: 1 second (Sloan's 2013 laptop)

q  Open file in Excel: several minutes n  Just resize one column for better viewing: 5-30 sec

Page 19: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Before leaving files: with … as

n  "Oh, sweetie, you left the refrigerator door open—again!" q  Snarl!

n  It is really bad practice to open files without closing them q  Messing with the computer's file system (more in CS 361) q  Typically nothing bad happens with size programs we write in

CS 111, but it could, and it's still a bad practice n  Better Python construct than open/close: with as,

guaranteeing we never forget to close what we've opened q  Opens and closes; get file only inside block

Page 20: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

with as syntax example

withopen("proteins.aa","r")asfileref:#usefilerefhereforlineinfileref:#couldhavebeenfileref.read()etc.#Nocloseneeded

Page 21: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

with as considered better style than open close

n  Because it makes it impossible to forget the close

Page 22: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Another useful & interesting module: random (Attention: Not in book, in upcoming lab/project) >>>importrandom>>>foriinrange(5):...print(random.random())...0.126366640291652680.28212728895355120.61600319401875430.286090069819085250.6277074518401735n  Notice: We're using the function named random from module named random, hence

random.random()

Page 23: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Commonly used random functions

n  random.random() # takes no input q  returns pseudorandom float between 0.0 and 1.0

n  random.uniform(a, b) q  returns float pseudorandomly chosen from between a and b

n  random.choice(ls) # gets list as input q  returns psudorandomly chosen element of the list

Page 24: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Randomly choosing words from a list

>>>foriinrange(5):...print(random.choice(["Here","is","a","list","of","words","in","random","order"]))...listwordsinHerelist

Page 25: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Randomly generating language

n  Given a list of nouns, verbs that agree in tense and number, and object phrases that all match the verb,

n  We can randomly take one from each to make sentences.

Page 26: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

importrandomdefexcuse():excuse=["Ididn'tknowIwasinthisclass","IthoughtIalready

graduated","Igotstuckinablizzard"]bigNum=["4","17","likeabillion","mega","tonsof"]lottaWork=["midterms","Ph.D.theses","programs"]print("Ineedanextensionbecause",random.choice(excuse),"andIhad",

random.choice(bigNum),random.choice(lottaWork),"todo.")

Side note: Good example of a function that should have 0 inputs and no return value.

Page 27: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Running random sentence generator

>>>excuse()IneedanextensionbecauseIthoughtIalreadygraduatedandIhadlikeabillionprogramstodo.>>>excuse()IneedanextensionbecauseIgotstuckinablizzardandIhad4programstodo.>>>excuse()IneedanextensionbecauseIgotstuckinablizzardandIhad17programstodo.>>>excuse()IneedanextensionbecauseIthoughtIalreadygraduatedandIhadtonsofprogramstodo.>>>excuse()IneedanextensionbecauseIdidn’tknowIwasinthisclassandIhad17Ph.D.thesestodo.

Page 28: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Choosing randomly from a population

We can sample using random module's choice here too

>>> import random>>> random.choice(pop_list)"A">>> pop_list["A","A","a","A","a"] # Didn't change the original list

>>>pop_list=["A","A","a","A","a"] #Will be part of Proj 2

Page 29: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

MATPLOTLIB MODULE Drawing graphs

Page 30: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

A Picture is Worth 1000 Excel cells Year,Annual anomaly,Lower 95% confidence interval,Upper 95% confidence interval 1880,-0.4700088,-0.672646261,-0.267371339 1881,-0.3568788,-0.560588343,-0.153169257 1882,-0.3726612,-0.575728173,-0.169594227 1883,-0.448443,-0.650803864,-0.246082136 1884,-0.5897538,-0.790478088,-0.389029512 1885,-0.6636546,-0.86307244,-0.46423676 1886,-0.6439392,-0.842606641,-0.445271759 1887,-0.7616232,-0.959851596,-0.563394804 1888,-0.5166342,-0.713950039,-0.319318361 1889,-0.4717926,-0.674798269,-0.268786931 1890,-0.8875836,-1.093269328,-0.681897872 1891,-0.6603264,-0.864035943,-0.456616857 1892,-0.8173098,-1.021385617,-0.613233983 1893,-0.8148276,-1.019876951,-0.609778249 And so on

Page 31: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

matplotlib: A module for drawing graphs

importmatplotlab.pyplotasplt

n  matplotlib is super commonly used module for 2-D graphics in Python. q  There are others, but matplotlib is most widely used q  Style taken from MATLAB

Page 32: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

About that as in the import statement

n  Advice: If module has dot in its name like matplotlib.pyplot, matplotlib.image, urllib.request, etc., always import it as something. Else Python can get confused about multiple dots when you go to use functions inside it. Thus:

importmatplotlab.pyplotasplt

q  And the name plt is what 95% of Pythonistas use—it's a convention

Page 33: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Functions in plt that might be nice to use: plot

n  plt.plot(ls) with one list input parameter: Makes a line graph assuming x's are range(len(ls)), i.e., 0, 1, …, len(ls) -1

n  With 2: x vs. y n  Can set x and/or y-axis label, and the title n  Demo or screenshot in a minute

n  Important! Exact details of when and where plot appears depend heavily on whether you are using Spyder, console, Jupyter, or other, and on your settings

Page 34: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Getting plot in its own window (not in book)

n  You want the plot to show up in its own window, not inside the lower-right iPython console q  Easier to zoom, scroll, examine the data q  Easier to save out to other forms for printing, submitting, etc. q  Easier to convince system that everything you are doing should be

more work on same graph, instead of creating new empty graph each time (e.g., when you first label x- then y-axis, you usually want those on same graph, not to make 2 different graphs!)

Page 35: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

To always get graphs in their own window

n  Spyder preferences (under Python menu on Mac, on Windows maybe under Tools?) q  Then: Ipython Console à Graphics à Graphics Backend à

Backend: "automatic"

n  Requires you to restart Spyder to start working

n  In theory there's also command can give at iPython prompt for this, but Prof. Sloan couldn't figure it out, and it's definitely not what's stated in official Spyder tutorial (%matplotlib qt), which gives error

Page 36: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Basic demo code import matplotlib.pyplot as plt import random # simple plotting demo of plain line graph x = [1, 2, 3, 4, 5] yline =[] # y values will go here for i in range(len(x)): yline.append(random.random()) plt.ylabel('some 0 to 1 random numbers') plt.xlabel('x is 1 to 5') plt.title('Line graph of random numbers') plt.plot(x, yline)

Page 37: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Aside for graph geeks

n  Optional for fun: To change style of your plot: importmatplotlib#Mustdothis!Usuallyonlymatplotlib.pyplotmatplotlib.style.use('fivethirtyeight')#ORmatplotlib.style.use('ggplot') #Rstyle

n  Out of the box, it's Matlab style, which some folks like a lot

Page 38: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

More specific styling things

n  Described in Zybooks 15.2 and 15.3, like adding legends to describe what different lines on multi-line line graph are, making lines different colors and styles (dotted, dashed, solid) that you choose instead of matplotlib choosing automagically q  Nobody in his or her right mind memorizes this stuff unless you are

working on graphs as full-time job; we wouldn't ask midterm questions about it

q  But you do need to know where to find it if a lab asks you to plot something with a green dashed line

Page 39: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

A few examples

n  If last argument to plot is string, that's the format n  'b-' Solid blue line; matplotlib default n  'r--' Red dashes n  'bs' Blue squares n  'g^' Green triangles

Page 40: CSE 111 Green: Program Design I Lecture 15: Modules, … not come up in CS 111 Green this semester.) Files and real-world data (β): CSV n Structured text! In 2017, often want to communicate

Important: which plot? And how?

n  If you make more calls to plt.plot() while a graph is up, those things get added to that graph q  To start over, close that window

n  To plot 3 lines from lists l1, l2, and l3 with x-axis from list x: plot.plt(x,l1)plot.plt(x,l2)plot.plt(x,l3)