Anyone using Python?

  • Man, I always come by here and never see anything. I'm always tooting how great Python is in these forums so I figured I would make the first post here.

    Question

    Anyone actually using Python for machine learning or well, anything relating to analytics, BI and statistics?

    I have for some time, but only for data ingestion (e.g.: API). I have been trying to sharpen up my scripting skills to include more machine learning techniques with Python. I have a growing library of Python scripts that have been collected from a number of data science & machine learning courses with a focus on Python over R.

    Lately, I've been doing hacker rank problems to also sharpen my skills. I ran into this problem where I think Python solves the challenge, but not fast enough to meet the requirements on larger data sets. If anyone has any insights, please feel free to jump in below.

    Problem I've Been Working On

    Given two numbers NN and MM. NN indicates the number of elements in the array and MM indicates number of queries. You need to perform two types of queries on the array. You are given MM queries. Queries can be of two types, type 1 and type 2.

      Type 1 queries are represented as 1 i j : Modify the given array by removing elements from ii to jj and adding them to the front.

      Type 2 queries are represented as 2 i j : Modify the given array by removing elements from ii to jj and adding them to the back.

    Your task is to simply print |A[1]−A[N]||A[1]−A[N]| of the resulting array after the execution of MM queries followed by the resulting array. Note While adding at back or front the order of elements is preserved.

    Input Format

    First line consists of two space-separated integers, NN and MM. Second line contains NN integers, which represent the elements of the array. MM queries follow. Each line contains a query of either type 1 or type 2 in the form type i j

    Sample Input

    8 4

    1 2 3 4 5 6 7 8

    1 2 4

    2 3 5

    1 4 7

    2 1 4

    Explanation

    Given array is {1,2,3,4,5,6,7,8}

    After execution of query 1 2 4, the array becomes {2,3,4,1,5,6,7,8}

    After execution of query 2 3 5, the array becomes {2,3,6,7,8,4,1,5}

    After execution of query 1 4 7, the array becomes {7,8,4,1,2,3,6,5}

    After execution of query 2 1 4, the array becomes {2,3,6,5,7,8,4,1}

    Now |A[1]−A[N]||A[1]−A[N]| is |(2−1)||(2−1)| i.e. 11 and the array is 23657841

    Data

    input_data.txt attached.

    My Code Submission

    This is too slow.. 🙁

    def array_query(s, t, i, j):

    """ Type 1 Query that modifies the given array by removing elements from i to j and adding them to the front.

    :param i: N starting number for range

    :param j: N ending number for range

    :return : New array with the chopped number ranges specified from i and j.

    """

    if t == 1:

    s[:0] = s[i-1:j]

    del(s[i+(j-i):j+(j-i+1)])

    elif t == 2:

    s += s[i-1:j]

    del(s[i-1:j])

    return s

    if __name__ == "__main__":

    input_size = map(int, raw_input().split()) # Array size, Total Queries

    source_array = map(int, raw_input().split()) # Input array

    for case in range(0, input_size[1]):

    test_case = map(int, raw_input().split())

    array_query(source_array, # a[ ] = Input Array

    test_case[0], # q[0] = Query type

    test_case[1], # q[1] = i range (Min)

    test_case[2]) # q[2] = j range (Max)

    print abs(source_array[0] - source_array[input_size[0] - 1])

    print ' '.join(map(str, source_array))

  • Interesting. I'll have to play with this. No time now, but I've done some Python, and have been working with the SDTIG on their Python series. I've gotten behind, but they had an interesting machine learning segment, with the last meeting this Wed.

    http://www.meetup.com/San-Diego-Technology-Immersion-Group-SDTIG/

    I think I'd do something similar to your approach. String slicing makes sense, though probably not super fast.

    Do you mean slow in terms of lots of input rows, wide strings, or both?

  • Steve Jones - SSC Editor (4/24/2016)


    Interesting. I'll have to play with this. No time now, but I've done some Python, and have been working with the SDTIG on their Python series. I've gotten behind, but they had an interesting machine learning segment, with the last meeting this Wed.

    http://www.meetup.com/San-Diego-Technology-Immersion-Group-SDTIG/

    I think I'd do something similar to your approach. String slicing makes sense, though probably not super fast.

    Do you mean slow in terms of lots of input rows, wide strings, or both?

    This is slow because the bigger the phone book, the longer it takes so to speak. I need a better approach that scales to meet the 10 second minimum requirement. The attached data actually completes within 12 seconds on my machine. It's not that slow, but not fast enough. The below recommendation is one way to tackle the problem.

    The recommended approach is by doing a Treap. This is similar to a binary search tree, but a self-balancing binary search tree. I've seen examples in C++, but not too much in Python without modules. Trying to do this without a module. Guess I can manually do it.

    Treap

  • Hi, i think you can speed up your list "slice and dice" if you convert list to (byte) string using struct module and pack/unpack.

    from struct import *

    def string_query(s, t, i, j):

    """ Type 1 Query that modifies the given array by removing elements from i to j and adding them to the front.

    :param i: N starting number for range

    :param j: N ending number for range

    :return : New array with the chopped number ranges specified from i and j.

    """

    if t == 1:

    return s[ 4*(i-1): 4*j ] + s[ : 4*(i-1) ] + s[ 4*j : ]

    else:

    return s[ : 4*(i-1) ] + s[ 4*j : ] + s[ 4*(i-1): 4*j ]

    if __name__ == "__main__":

    ifl = open('input_data.txt','r')

    isz = map(int, ifl.readline().split())# Array size, Total Queries

    sa = map(int, ifl.readline().split())# Input array

    lsa = len(sa)

    sa = ''.join( map( lambda x: pack('i', x ), sa ))

    for case in range(0, isz[1]):

    tc = map(int, ifl.readline().split())

    sa = string_query( sa, # a[ ] = Input Array

    tc[0], # q[0] = Query type

    tc[1], # q[1] = i range (Min)

    tc[2]) # q[2] = j range (Max)

    sa = unpack( 'i' * lsa, sa )

    print abs(sa[0] - sa[isz[0] - 1])

    print ' '.join(map(str, sa))

  • They are strict on module usage. Don't think I can use it. Let's see.

  • Well then, I guess struct was good to go. Amazing how something as simply taking an approach to pack and unpack the data allowed me to reduce the runtime below 10 seconds on all 27 test cases.

    To compare the results. I went from 7.31 seconds on the last good test case with the previous code to over 10 seconds to fail the rest of the test cases.

    Using your suggestion, I was able to solve all the test cases under 10 seconds, with the longest running time 8.39 seconds.

    😀

  • 🙂 https://www.hackerrank.com/challenges/array-and-simple-queries ...

    I hope you get that job.

  • Bosko Vukov (4/27/2016)


    🙂 https://www.hackerrank.com/challenges/array-and-simple-queries ...

    I hope you get that job.

    Pssh, no job attached to this. I do these for fun and stream them on TwitchTv to sharpen my skills! 😉

  • Simpler, but I've done some of these: http://exercism.io/

    Python is built to use modules. Trying to not use them, in some sense, defeats the purpose. I guess it gains algorithmic practice, but perhaps leads you to spend time on items that could better be done in other ways.

    I guess it's a balance of some sort.

  • Steve Jones - SSC Editor (4/27/2016)


    Simpler, but I've done some of these: http://exercism.io/

    Python is built to use modules. Trying to not use them, in some sense, defeats the purpose. I guess it gains algorithmic practice, but perhaps leads you to spend time on items that could better be done in other ways.

    I guess it's a balance of some sort.

    It's likely because a lot of modules do all the work for you. They are trying to restrict that and see what you can do with the bare essentials for learning purposes. Having to always lean on a module will surely hinder you in solving these complex problems.

  • There's a fun starting tutorial here: https://www.microsoft.com/en-us/sql-server/developer-get-started/

  • Hi Steve,

    I notice you link a lot of your "Python Question of the Day" gets linked to Real Python's website as a reference. Have your studied their tutorials in the past, and have they been helpful? I am considering getting their book. Thanks!

  • I don't link to any particular one. I've used a few different ones, but probably learned the most from DiveIntoPython.com and that book

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply