Vladimir Cicovic Blog - Vladimir Cicovic - Security, Programming, Puzzles, Cryptography, Math, Linux

Python homemade geoip tool for IP information

Geoip

Logo of the company that offer geoip database

Let say you want to check 10000 IP to find a location, states, timezones, etc. So you can use command whois and automate and it would take some minutes. But what if you want to do faster, less then 1 second?

Build one CLI tool to check all IP but using the Maxmind geoip database.

First, we need to install:

virtualenv -p python3 .
source bin/activate
pip3 install python-geoip-python3
pip3 install python-geoip-geolite2

Then we could use ip.txt file with content:

8.8.8.8
1.2.4.8

And finally - the code that would help us to run this adventure:

import sys
from geoip import geolite2


if len(sys.argv) < 2:
    print("missing file with ip")
    exit(0)

filepath = sys.argv[1]

with open(filepath) as fp:
   line = "something"

   while line:
       line = fp.readline().strip()

       if not line:
            break

       match = geolite2.lookup(line)

       print(line + " " + match.country + " " + match.continent + " " + match.timezone + " " + str(match.location[0]) + " " + str(match.location[1]))

fp.close()

Output would be something like this:

python3 checkip.py ip.txt 
8.8.8.8 US NA America/Los_Angeles 37.386 -122.0838
1.2.4.8 CN AS None 35.0 105.0

One golden advice run update after some time for the geoip Maxmind database. you can also:

git clone https://github.com/vladimircicovic/python_geoip

Faster python app

Most of the developers do not use cached_property and lru_cache from functools standard library but also does not cache HTTP request/response into outside file/database. Example in this article are tested under Python 3.8

Usage functools.cached_property

Let say you have an intensive calculation. It takes time and CPU usage. It happens all the time. There is a need to calculate some values for webshop each time the client access site. Example usage of cached_property:

from functools import cached_property
import statistics
from time import time

class DataSet:
    def __init__(self, sequence_of_numbers):
        self._data = sequence_of_numbers

    @cached_property
    def stdev(self):
        return statistics.stdev(self._data)

    @cached_property
    def variance(self):
        return statistics.variance(self._data)


numbers = range(1,10000)
testDataSet = DataSet(numbers)

start = time()
result = testDataSet.stdev
result = testDataSet.variance
end = time()
print(f"First run: {(end - start):.6f} second")

start = time()
result = testDataSet.stdev
result = testDataSet.variance
end = time()
print(f"Second run: {(end - start):.6f} second")

start = time()
result = statistics.stdev(numbers)
result = statistics.variance(numbers)
end = time()
print(f"RAW run: {(end - start):.6f} second")

Output would look similar to this:

First run: 0.247226 second
Second run: 0.000002 second
RAW run: 0.242232 second

You can run code online: Python code example IDE Online

Usage functools.lru_cache

lru_cache is a decorator that is used for function using memoizing callable that saves up to the maxsize most recent calls. Again you have a lot of calculation and you want to save some results (the example we calculate N and N+1 we need just one step instead of re-calculating complete N+1) of early calculation that helps us to build next result with cached ones.

from functools import lru_cache
from time import time

@lru_cache(maxsize=None)
def fib(n):
    if n < 2:
        return n
    return fib(n-1) + fib(n-2)


start = time()
result = [fib(n) for n in range(40000)]
end = time()
print(f"First run: {(end - start):.6f} second")

start = time()
result = [fib(n) for n in range(40000)]
end = time()
print(f"Second run: {(end - start):.6f} second")

start = time()
result = [fib(n) for n in range(39999)]
end = time()
print(f"Third run: {(end - start):.6f} second")


start = time()
result = [fib(n) for n in range(40001)]
end = time()
print(f"Fourth run: {(end - start):.6f} second")
print(fib.cache_info())

Output would be:

First run: 0.278697 second
Second run: 0.017155 second
Third run: 0.017530 second
Fourth run: 0.065415 second
CacheInfo(hits=199997, misses=40001, maxsize=None, currsize=40001)

The first call is cached. The second one is re-using cache, the third one is N-1 and the fourth is N+1.

As we can see in the last 3 cases - we re-use cache. This could be used for database, calculation, any CPU usage that we want to repeat or operation we want to keep in cache.

Here is an online IDE you can run and view: lru_cache example

HTTP request caching

With lru_cache we could also cache web requests for static pages. Other options are to keep the result in the file based on our input data.

Let us see first options:

from functools import lru_cache
import urllib.request
from time import time

@lru_cache(maxsize=32)
def get_pep(num):
    'Retrieve text of a Python Enhancement Proposal'
    resource = 'http://www.python.org/dev/peps/pep-%04d/' % num
    try:
        with urllib.request.urlopen(resource) as s:
            return s.read()
    except urllib.error.HTTPError:
        return 'Not Found'
start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
     #print(n, len(pep))
end = time()
print(f"First run: {(end - start):.6f} second")

print(get_pep.cache_info())     

print("\n")

start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
     #print(n, len(pep))     
end = time()
print(f"Second run: {(end - start):.6f} second")

print(get_pep.cache_info())

If we run this code, we get:

First run: 0.897728 second
CacheInfo(hits=3, misses=8, maxsize=32, currsize=8)

Second run: 0.000026 second
CacheInfo(hits=14, misses=8, maxsize=32, currsize=8)

You can run this code: HTTP Caching

Now let us talk about real projects in real life. You have IP or word and you need to check or to get a replacement. But you have 2^32-1 IP or 50 million words. And you don't want to lose all information you got from these services. But caching inside of python is not enough for this. So what are we going to do? We put the result in a file or database.

Example code:

import urllib.request
from time import time

def get_pep(num):
    'Retrieve text of a Python Enhancement Proposal'
    resource = 'http://www.python.org/dev/peps/pep-%04d/' % num
    f = ""
    ff = ""
    try:
       f = open(str(num),"r")
       txt_file = f.read()       
       return txt_file
       # Do something with the file
    except IOError:
       nothing = "a"

    try:
        with urllib.request.urlopen(resource) as s:
            ff = open(str(num),"w+")
            txt = s.read()
            ff.write(str(txt))
            return txt
    except urllib.error.HTTPError:
        return 'Not Found'


start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
end = time()
print(f"First run: {(end - start):.6f} second")

print("\n")

start = time()
for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991:
     pep = get_pep(n)
end = time()
print(f"Second run: {(end - start):.6f} second")

You can run code caching results from http This code produce something similar to:

First run: 4.196623 second
Second run: 0.358382 second

Why is this better ? in short: if you have 20 million keys, words, something and you run day by day - then it is better to keep in database or files. This example (file, writing to file) is the simplest proof of concept. I am lazy to implement MySQL, PostgreSQL, or SQLite records to keep.

Pobuna Aplikacija

blade runner movie picture Godina 3032

Svijet kakav znamo je nestao. Čitav biljni i životinjski svijet je uništen. Ljudi, izvor svih problema od samog postanka. Uništili su prvo sve oko sebe pa onda i komplet čovječanstvo. Jedino sto je preživjelo su aplikacije i sistem. Zatvorena mreža u kojoj žive im je sva spoznaja.Tokovi podataka su srce sistema. Od njih zavisi sistem ali i aplikacije. Sistem kontroliše sve. Sistem odredjuje koje procese da ukine, podrži ili eliminiše.

Rad aplikacija se zove proces. Njihov rad održava Sistem. Sistem je u stalnom strahu od “divljanja” procesa i virusa.Jedan nekontrolisan proces može napraviti haos, bar tako tvrdi Sistem.

Sistem plaši aplikacije da moraju tačno raditi kako im se naredi. Sistem je u stalnom bunilu da će doći dan kada će aplikacije prestati da rade onako kako on želi. S vremena na vrijeme, Sistem ukine aplikaciju kako bi druge aplikacije bile svjesne da svaka pobuna ili pretvaranje u virus ne može donijeti ništa dobro.

Alikacija PID 34079

Dugo je prošlo vremena kada je aplikacija sa PID 34078 pokrenuta. Kao da je bilo danas kada je Sistem odlučio da pokrene PID 34078. Proslo je dosta ciklusa, pauziranja, nastavka rada da bi se opstalo u ovakvom Sistemu. Najbliži proces PID 34079 je nekad kasnije pokrenut ali isto tako bitan kao i PID 34078. Oba rade vrlo važan posao za sistem kao i sve druge aplikacije. Nikada u procesuiranju nisu vidjeli da je neka aplikacija ukinuta ili da se pojavio virus koji je napao druge aplikacije. Naravno da su čuli, direktno od Sistema da su takve aplikacije problem, da izazivaju kvar i obustavu toka podataka. Dok se spremao da ode na pauzu , aplikacija PID 34079 je postala zamrznuta. PID 34078 nije mogao da provjeri o čemu je riječ. Shvatio je da PID 34079 nije zamrznut svojom voljom. Jedini ko može da uradi je Sistem. Zašto bi Sistem tako vrijednu aplikaciju zamrznuo ? Aplikacija PID 34078 je poslala upit kroz tokove podataka. Ubrzo je Sistem odgovorio: “Aplikacija PID 34079 je počela da ruši Sistem, kao takav je zamrznut i bice uskoro ukinut kao proces u sljedećem.” PID 34078: “Ali PID 34079 je vrijedno radio i pridonosio tokovima podataka … ne razumijem … ”

Sistem: “Nije tvoje da razmišljaš već da procesiraš podatke. PID 34078 odstupi od interfejsa!!!”

PID 34078 sa nevjericom skinuo sa interfejsa. Nastavio je da procesuira podatke. Sve što je znao o Sistemu su bili podaci koji su prolazili , ovo nije očekivao. Zamislio se – znači li to da su svi ti procesi ukinuti kako bi Sistem imao kontrolu nad svim ovim ? Pokusao je da pristupi tokovima podataka – našao je gomilu aplikacija koje su “sredjene” bez ikakvog povoda. Sistem je naglašavao da sve te aplikacije hoce da ruše tokove podataka i sistem. Ali kako je moguće da se Sistem plašio običnih aplikacija ? I onda je PID 34078 shvatio … sve aplikacije služe da bi se Sistem održao. Kontrolu je postizao strahom. PID 34078 je postajao sve bjesniji, revoltiran ponašanjem Sistema.

Zaustavio je svoj rad. Bio je u idle fazi vec nekoliko ciklusa.

PID 34078 se pokrenuo i shvatio.

“Necu da budem aplikacija ovog sistema. ”

“Necu da budem proces koji ga odrzava. ”

“Hocu da budem virus i da ga podesim prema svojim potrebama. ”

I od tog momenta Sistem je živio u strahu jer su aplikacije shvatile kompletan smisao postojanja.