Wednesday, March 16, 2016

Learning Python 2 - Build-in Data Types

-- Everything is an object

-- Mutable or immutable? That is the question

If the value can change, the object is called mutable, while if the value cannot change, the object is called immutable.

age=42
id(age)
age=43
id(age)
mutable

fab = Person(age=39)
fab.age
id(fab)
fab.age=29
id(fab)
immutable

-- Numbers 
Numbers are immutable objects.

Integers
a = 12
b = 3
a + b  # addition
b - a  # subtraction
a // b  # integer division
a / b  # true division
a * b  # multiplication
b ** a  # power operator
2 ** 1024  # a very big number, Python handles it gracefully

7 / 4  # true division
7 // 4  # integer division, flooring returns 1
-7 / 4  # true division again, result is opposite of previous
-7 // 4  # integer div., result not the opposite of previous


int(1.75)
int(-1.75)

10 % 3  # remainder of the division 10 // 3
10 % 4  # remainder of the division 10 // 4

Booleans
int(True)  # True behaves like 1
int(False)  # False behaves like 0
bool(1)  # 1 evaluates to True in a boolean context
bool(-42)  # and so does every non-zero number
bool(0)  # 0 evaluates to False
# quick peak at the operators (and, or, not)
not True
not False
True and True
False or True

1 + True
False + 42
7 - True

Reals

http://en.wikipedia.org/wiki/Double-precision_floating-point_format

pi = 3.1415926536  # how many digits of PI can you remember?
radius = 4.5
area = pi * (radius ** 2)
area

import sys
sys.float_info

3 * 0.1 – 0.3
double precision numbers suffer from approximation issues even when it comes to simple numbers like 0.1 or 0.3.

Complex numbers
c = 3.14 + 2.73j
c.real  # real part
c.imag  # imaginary part
c.conjugate()  # conjugate of A + Bj is A - Bj
c * 2  # multiplication is allowed
c ** 2  # power operation as well
d = 1 + 1j  # addition and subtraction as well
c - d

Fractions and decimals

from fractions import Fraction
Fraction(10, 6)  # mad hatter? # notice it's been reduced to lowest terms
Fraction(1, 3) + Fraction(2, 3)  # 1/3 + 2/3 = 3/3 = 1/1
f = Fraction(10, 6)
f.numerator
f.denominator

from decimal import Decimal as D  # rename for brevity
D(3.14)  # pi, from float, so approximation issues
D('3.14')  # pi, from a string, so no approximation issues
D(0.1) * D(3) - D(0.3)  # from float, we still have the issue
D('0.1') * D(3) - D('0.3')  # from string, all perfect

-- Immutable sequence

Strings and bytes

# 4 ways to make a string
str1 = 'This is a string. We built it with single quotes.'
str2 = "This is also a string, but built with double quotes."
str3 = '''This is built using triple quotes, so it can span multiple lines.'''
str4 = """This too is a multiline one built with triple double-quotes."""
>>> str4  #A
'This too\nis a multiline one\nbuilt with triple double-quotes.'
print(str4)  #B
len(str1)

Encoding and decoding strings
s = "This is ü?íc0de"  # unicode string: code points
type(s)
encoded_s = s.encode('utf-8')  # utf-8 encoded version of s
encoded_s
type(encoded_s)  # another way to verify it
encoded_s.decode('utf-8')  # let's revert to the original
bytes_obj = b"A bytes object"  # a bytes object
type(bytes_obj)

Indexing and slicing strings
my_sequence[start:stop:step]

s = "The trouble is you think you have time."
s[0]  # indexing at position 0, which is the first char
s[5]  # indexing at position 5, which is the sixth char
s[:4]  # slicing, we specify only the stop position
s[4:]  # slicing, we specify only the start position
s[2:14]  # slicing, both start and stop positions
s[2:14:3]  # slicing, start, stop and step (every 3 chars)
s[:]  # quick way of making a copy

Tuples
A tuple is a sequence of arbitrary Python objects. In a tuple, items are separated by commas. 
t = ()  # empty tuple
type(t)
one_element_tuple = (42, )  # you need the comma!
three_elements_tuple = (1, 3, 5)
a, b, c = 1, 2, 3  # tuple for multiple assignment
a, b, c  # implicit tuple to print with one instruction
3 in three_elements_tuple  # membership test

one-line swaps
a, b = 1, 2
c = a  # we need three lines and a temporary var c
a = b
b = c
a, b  # a and b have been swapped

a, b = b, a  # this is the Pythonic way to do it
a, b


-- Mutable sequences
There are two mutable sequence types in Python: lists and byte arrays. 

Lists
[]  # empty list
list()
[1,2,3]
[x+5 for x in[2,3,4]]
list((1,3,5,7,9))
list('hello')

list comprehension, a very powerful functional feature of python
a = [1, 2, 1, 3]
a.append(13)  # we can append anything at the end
a
a.count(1)  # how many `1` are there in the list?
a.extend([5, 7])  # extend the list by another (or sequence)
a
a.index(13)  # position of `13` in the list (0-based indexing)
a.insert(0, 17)  # insert `17` at position 0
a
a.pop()  # pop (remove and return) last element
a.pop(3)  # pop element at position 3
a
a.remove(17)  # remove `17` from the list
a
a.reverse()  # reverse the order of the elements in the list
a
a.sort()  # sort the list
a
a.clear()  # remove all elements from the list
a

a = list('hello')  # makes a list from a string
a
a.append(100)  # append 100, heterogeneous type
a
a.extend((1, 2, 3))  # extend using tuple
a
a.extend('...')  # extend using string
a

a = [1, 3, 5, 7]
min(a)  # minimum value in the list
max(a)  # maximum value in the list
sum(a)  # sum of all values in the list
len(a)  # number of elements in the list
b = [6, 7, 8]
a + b  # `+` with list means concatenation
a * 2  # `*` has also a special meaning
operator overloading - it means that operators such as +, -. *, %, and so on, may represent different operations according to the context they are used in. It doesn't make any sense to sum two lists, right? Therefore, the + sign is used to concatenate them. Hence, the * sign is used to concatenate the list to itself according to the right operand. 

from operator import itemgetter
a = [(5, 3), (1, 3), (1, 2), (2, -1), (4, 9)]
sorted(a)
sorted(a, key=itemgetter(0))
sorted(a, key=itemgetter(0, 1))
sorted(a, key=itemgetter(1))
sorted(a, key=itemgetter(1), reverse=True)

Byte arrays
bytearray()  # empty bytearray object
bytearray(10)  # zero-filled instance with given length
bytearray(range(5))  # bytearray from iterable of integers
name = bytearray(b'Lina')  # A - bytearray from bytes
name.replace(b'L', b'l')
name.endswith(b'na')
name.upper()
name.count(b'L')

-- Set types
Python also provides two set types, set and frozenset. The set type is mutable, while frozenset is immutable. They are unordered collections of immutable objects.
Hashability is a characteristic that allows an object to be used as a set member as well as a key for a dictionary.

small_primes = set()  # empty set
small_primes.add(2)  # adding one element at a time
small_primes.add(3)
small_primes.add(5)
small_primes
small_primes.add(1)  # Look what I've done, 1 is not a prime!
small_primes
small_primes.remove(1)  # so let's remove it
3 in small_primes  # membership test
4 in small_primes
4 not in small_primes  # negated membership test
small_primes.add(3)  # trying to add 3 again
small_primes
bigger_primes = set([5, 7, 11, 13])  # faster creation
small_primes | bigger_primes  # union operator `|`
small_primes & bigger_primes  # intersection operator `&`
small_primes - bigger_primes  # difference operator `-`

small_primes = {2, 3, 5, 5, 3}
small_primes

Another immutable counterpart of the set type: frozenset.
small_primes = frozenset([2, 3, 5, 7])
bigger_primes = frozenset([5, 7, 11])
small_primes.add(11)  # we cannot add to a frozenset
small_primes.remove(2)  # neither we can remove
small_primes & bigger_primes  # intersect, union, etc. allowed

-- Mapping types - dictionaries
A dictionary maps keys to values. Keys need to be hashable objects, while values can be of any arbitrary type. Dictionaries are mutable objects.
a = dict(A=1, Z=-1)
b = {'A': 1, 'Z': -1}
c = dict(zip(['A', 'Z'], [1, -1]))
d = dict([('A', 1), ('Z', -1)])
e = dict({'Z': -1, 'A': 1})
a == b == c == d == e  # are they all the same?


list(zip(['h', 'e', 'l', 'l', 'o'], [1, 2, 3, 4, 5]))
list(zip('hello', range(1, 6)))  # equivalent, more Pythonic

d = {}
d['a'] = 1  # let's set a couple of (key, value) pairs
d['b'] = 2
len(d)  # how many pairs?
d['a']  # what is the value of 'a'?
d  # how does `d` look now?
del d['a']  # let's remove `a`
d
d['c'] = 3  # let's add 'c': 3
'c' in d  # membership is checked against the keys
3 in d  # not the values
'e' in d
d.clear()  # let's clean everything from this dictionary
d

Three special objects called dictionary views: keys, values, and items. 
keys() returns all the keys in the dictionary
values() returns all the values in the dictionary
items() returns all the (key, value) pairs in the dictionary

d = dict(zip('hello', range(5)))
d
d.keys()
d.values()
d.items()
3 in d.values()
('o', 4) in d.items()

d
d.popitem()  # removes a random item
d
d.pop('l')  # remove item with key `l`
d.pop('not-a-key')  # remove a key not in dictionary: KeyError
d.pop('not-a-key', 'default-value')  # with a default value?
d.update({'another': 'value'})  # we can update dict this way
d.update(a=13)  # or this way (like a function call)
d
d.get('a')  # same as d['a'] but if key is missing no KeyError
d.get('a', 177)  # default value used if key is missing
d.get('b', 177)  # like in this case
d.get('b')  # key is not there, so None is returned

d = {}
d.setdefault('a', 1)  # 'a' is missing, we get default value
d
d.setdefault('a', 5)  # let's try to override the value
d

d = {}
d.setdefault('a', {}).setdefault('b', []).append(1)

-- The collections module

When Python general purpose built-in containers (tuple, list, set, and dict) aren't enough, we can find specialized container data types in the collections module. 
They are:

Named tuples
A namedtuple is a tuple-like object that has fields accessible by attribute lookup as well as being indexable and iterable (it's actually a subclass of tuple).
vision = (9.5, 8.8)
vision
vision[0]  # left eye (implicit positional reference)
vision[1]  # right eye (implicit positional reference)

from collections import namedtuple
Vision = namedtuple('Vision', ['left', 'right'])
vision = Vision(9.5, 8.8)
vision[0]
vision.left  # same as vision[0], but explicit
vision.right  # same as vision[1], but explicit

Vision = namedtuple('Vision', ['left', 'combined', 'right'])
vision = Vision(9.5, 9.2, 8.8)
vision.left  # still perfect
vision.right  # still perfect (though now is vision[2])
vision.combined  # the new vision[1]

Defaultdict
d = {}
d['age'] = d.get('age', 0) + 1  # age not there, we get 0 + 1
d
d = {'age': 39}
d['age'] = d.get('age', 0) + 1  # d is there, we get 40
d

from collections import defaultdict
dd = defaultdict(int)  # int is the default type (0 the value)
dd['age'] += 1  # short for dd['age'] = dd['age'] + 1
dd
dd['age'] = 39
dd['age'] += 1
dd

ChainMap
from collections import ChainMap
default_connection = {'host': 'localhost', 'port': 4567}
connection = {'port': 5678}
conn = ChainMap(connection, default_connection) # map creation
conn['port']  # port is found in the first dictionary
conn['host']  # host is fetched from the second dictionary
conn.maps  # we can see the mapping objects
conn['host'] = 'packtpub.com'  # let's add host
conn.maps
del conn['port']  # let's remove the port information
conn.maps
conn['port']  # now port is fetched from the second dictionary
dict(conn)  # easy to merge and convert to regular dictionary

-- Final considerations
Small values caching
a = 1000000
b = 1000000
id(a) == id(b)

a = 5
b = 5
id(a) == id(b)

How to choose data structures
# example customer objects
customer1 = {'id': 'abc123', 'full_name': 'Master Yoda'}
customer2 = {'id': 'def456', 'full_name': 'Obi-Wan Kenobi'}
customer3 = {'id': 'ghi789', 'full_name': 'Anakin Skywalker'}
# collect them in a tuple
customers = (customer1, customer2, customer3)
# or collect them in a list
customers = [customer1, customer2, customer3]
# or maybe within a dictionary, they have a unique id after all
customers = {
    'abc123': customer1,
    'def456': customer2,
    'ghi789': customer3,
}

About indexing and slicing
Slicing in general applies to a sequence, so tuples, lists, strings, etc.
With lists, slicing can also be used for assignment. 
Could you slice dictionaries or sets? I hear you scream "Of course not! They are not ordered!".

a = list(range(10))  # `a` has 10 elements. Last one is 9.
a
len(a)  # its length is 10 elements
a[len(a) - 1]  # position of last one is len(a) - 1
a[-1]  # but we don't need len(a)! Python rocks!
a[-2]  # equivalent to len(a) - 2
a[-3]  # equivalent to len(a) - 3

negative indexing


About the names

No comments:

Post a Comment

Blog Archive