Tech Life On The Way

Posts

Showing posts from July, 2018

Hanoi Tower

July 28, 2018

最近学习到了一个很有趣的problem. 汉诺塔。我认真的想了好几天，终于在今天找IT给我set up machine等着没事干的时候大概想通了。哈哈哈。解决这个问题的方法让我想到了binary tree traverse 里面的in order的遍历，汉诺塔的solution之一的 recursive 的方式跟二叉树in order就是异曲同工，而且in human mind, 汉诺塔should make more sense to understand compared to binary tree. 在网上看到了一个印度小男孩对汉诺塔的分析，果然是tree. Basically 解决汉诺塔的问题就是一个搬家公司应该怎么搬家的问题，from mission completed to start, which is from bottom to top. Let's define Hanoi move function as: HanoiMove(int leftToSit, int from - 0 , int to - 1, int rest - 2) and we have n pieces in total waiting for the move In theory, to approach the last step nth of move, we just need finish move rest of Hanoi pieces (n-1 in total) from start to rest pile,(简单说来就是，清空路障，直捣虎穴。）and then it should be able to move the last piece from start to To pile. so we just do 1) HanoiMove(n-1, from, res, to) ---->>> left as we planned in theory, we move the nth piece move(from, to) after this move, we have n-1 pieces sit in the rest pile, a

C++ insights

July 27, 2018

1. Rule of three Copy assignment and copy constructor https://stackoverflow.com/questions/4172722/what-is-the-rule-of-three/4172724#4172724 2. best resources https://stackoverflow.com/questions/388242/the-definitive-c-book-guide-and-list 3. YouTube Video Cherno's Channal https://www.youtube.com/channel/UCQ-W1KE9EYfdxhL6S4twUNw?&ab_channel=TheChernoProject

从文件流中读取第K大的数

July 17, 2018

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 int getKthLargest (std :: istream & numStream, uint32_t k) { if ( ! numStream.good()) { throw std :: invalid_argument( "numStream" ); } if (k == 0 ) { throw std :: invalid_argument( "k" ); } try { auto initHeap = std :: vector < int > (k, INT_MIN); std :: priority_queue < int , vector < int > , std :: greater < int >> minHeap(initHeap.begin(), initHeap.end()); int num; while ( ! numStream.eof()) { numStream >> num; if (numStream.fail()) { // bad format number continue ; } minHeap.push(num); minHeap.pop(); } // assume input stream doesn't have any number

Dynamic Programming

July 14, 2018

https://www.hiredintech.com/classrooms/algorithm-design/lesson/40 what is dynamic programming? In short, it's a method, which allows you to solve a problem by breaking it down into smaller sub-problems. --> 以小见大方法： 1. Breaking down a problem into sub-problems. 2. Two implementations "bottom-up" where we start from the base cases and compute the values until we reach the desired value. "top-down" in which we recursively compute the answers for smaller problems, on demand, but try to store the computed valued in order not to compute them multiple times. This technique is usually called "memoization". Sometimes, especially for "bottom-up" implementations it is possible to store only one part of the computed values at a time and free the memory for other parts once they have served their job in the computations.

Prime Number相关

July 14, 2018

Prime Number: 只能被1和它本身整除的的数。1不是Prime number. Ugly Number: 1, 2, 3, 5, 以及能被[2, 3, 5] 整除的数。直觉上看，除了prime number以外的所有数都能被ugly number整除。也就是说，两个问题归一了。无论是find kth prime number或者是kth ugly number. 都属于同一问题，如何有效地在一个int streaming中找到满足条件的数。进一步分析，如何找到prime number / ugly number 暴力解法就是从1开始每一个数都check, 然后一直check到第kth满足条件的数。这个方法用来找ugly number应该还不错，原因是从[1. x], 基本上所有的数都属于ugly number. 应该能够在~k 的条件下找到ugly number. 去implement一下看看有没有TLE的问题。 oops. 思路off了一点点。重新回到ugly number的定义。 ugly number是只能被2, 3, 5整除的数。但事实上，从1开始的数中，除了能够被2, 3, 5整除的数之外，还有的数的factor是prime number同时他们也能被2, 3, 5整除，所以在check数的同时要排除其能被遇见过的prime number整除的可能性。果然 TLE了。 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 class Solution : """ @param n: An integer @return: the nth prime number as description. """ def nthUglyNumber ( self , n): # write your code here if n == 1 : return 1 order =

Design: 如何设计一个web crawler

July 12, 2018

对于design的总结，就从我最熟悉的crawl 开始。按照我在工作中的学习和了解实际应用一下。确定design purpose. crawler需要crawl什么，是固定的host, ip还是某一个domain. crawler是为谁服务？ crawl 的scale是多少？几百，几千还是几个B？ crawl后的内容如何使用？ crawl的latency的宽容度是多少？ crawl的seed如何选择？如何filter junk/spam/bad url? 如何保证url的freshness? 如何dedup crawl的次数？在crawl之后的document需要进行process吗？假设我们的crawler是一个private crawler, 只是target某一个特定的host, 比如wikipedia. 有一个seed url list, 然后从parent url开始往下衍生找outlinks.

linked list

July 12, 2018

3 type of linked list Three different kinds of linked lists exist: Singly linked lists have a single pointer pointing to the next element in the list. The last pointer is empty or points to null, signaling the end of the list. Doubly linked lists have two pointers, one pointing to the next element and one pointing to the previous element. The head node's previous pointer points to null and the tail node's next pointer points to null to signal the end of the list. Circular linked lists usually represent buffers. They have no head or tail, and the primary issue is avoiding infinite traversals because of the cycle. These questions rarely come up in interview questions. Arrays are oftentimes a better substitute for a circular linked list, using the modulus operator to wrap around. 小技巧： 1. 固定head 指针 2. 每次加新节点的时候加在head

还不错的技术博客

July 12, 2018

http://dreamrunner.org/

K Closest Points

July 11, 2018

Description Given some points and a point origin in two dimensional space, find k points out of the some points which are nearest to origin . Return these points sorted by distance, if they are same with distance, sorted by x-axis, otherwise sorted by y-axis. Example Given points = [[4,6],[4,7],[4,4],[2,5],[1,1]] , origin = [0, 0] , k = 3 return [[1,1],[2,5],[4,4]] 思考：这个问题还蛮容易出错的。对于距离，使用 Euclidean distance, (x1-x2)**2 + (y1-y2)**2 第一种方法： Bucket sort. 建立一个dictionary with key = distance，value = points with distance. output to sorted list. 这个蠢办法实现的复杂度很高，要以key sort dictionary然后每个bucket的成员还要再继续的sort. 第二种方法：利用已有的函数。heapify. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 """ Definition for a point. class Point: def __init__(self, a=0, b=0): self.x = a self.y = b """ import heapq class Solution : """ @param points: a list

Top k Largest Numbers

July 11, 2018

这道题的错也充分显示了基础知识不扎实导致的。非常值得一记。 544. Top k Largest Numbers Given an integer array, find the top k largest numbers in it. Example Given [3,10,1000,-99,4,100] and k = 3 . Return [1000, 100, 10] . 思考过程： O(N) 一遍nums是免不了的，所以关键就在于top k上。top k会被定义为一个数组。在本题的答案中，其实隐含了要求让所有的K个数都降序排列，但在这点上，题目并没有真正的要求。所以sorting是一个必要的过程。关键是怎么sort, 什么时候sort, 还有谁sort 暴力解法。每次都遍历k数组，然后找到k数组中的最小数的index, 然后replace. 然后再在最后输出答案的时候，list.sort(reverse = True) 毫无疑问。time limit exceeded. 因为这样做的最差的时间复杂度为 O(N^2) 如果 K ~= len(nums) 聪明的解法。 Scenario: 想来，如果我的nums是一个streaming, 是不是可以让它在通过了某一个filter或者component的时候可以自动过滤出满足某一个criteria的值 priority_queue: heapq: heap[k] <= heap[2*k+1] and heap[k] <= heap[2*k+2] for all k , counting elements from zero. 我的答案。complexity O(NLogN) heapq的用法， https://docs.python.org/2/library/heapq.html https://www.geeksforgeeks.org/heap-queue-or-heapq-in-python/ 1 2 3 4 5 6 7 8 9 10 11 12 13 def topk ( self , nums, k):

错误汇总。

July 11, 2018

经常犯的错误 1. 在循环一个list的时候index应该注意使用范围使用的对象是list的成员还是真正的index 不要忘记return 不要忘记用while循环的时候，i += 1

深入了解Python

July 11, 2018

如果你认为你了解python，那么你得首先回答如下问题？ 1. 是否了解动态语言的鸭子模型？ 2. 是否了解可变参数与关键字参数？ 3. 对函数式编程有初步了解。 4. 是否知道列表生成式？ 5. 是否知道lambda/decorator/slots？ 6. 为什么要把缺省参数设为immutable？ 7. 是否知道Mixin？ 8. 是否知道WSGI接口？ 9. 是否知道异步框架如gevent/tornado？ 10. 是否深入了解过Python的GC和GIL？ 11.是否了解python对象的查找机制？

Word Search 2

July 09, 2018

This question is a little bit dumb hard, but it's a standard Google style question and expose quite a lot problems in my code. Thanks to it! Plus, plus, plus.... Given a 2D board and a list of words from the dictionary, find all words in the board. Each word must be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or vertically neighboring. The same letter cell may not be used more than once in a word. Example: Input: words = ["oath","pea","eat","rain"] and board = [ [' o ',' a ','a','n'], ['e',' t ',' a ',' e '], ['i',' h ','k','r'], ['i','f','l','v'] ] Output: ["eat","oath"] Note: You may assume that all inputs are consist of lowercase letters a-z . Solution without Optimization. Using DFS的暴力解法 1

How to define an 2-d matrix?

July 08, 2018

Example: We want to define a 2 dimension array with width = 3, height = 4, and the initial value = False w, h = 3, 4 1. Using the concept of list comprehension. list comprehension: [expression(x) for x in old_list if condition(x)] return: a list of items that satisfy the condition # define the row list row = [False for i in range(w)] matrix = [row for j in range(h)] 2. Using operator "*" for list "+" for list is equally to add item to the list a = [1, 2, 3] b = [4, 5, 6] a + b = [1, 2, 3, 4, 5, 6] "*" for list is equally to duplicate items in the list x times [False, False] * 3 = [False, False, False, False, False, False] # define Matrix [[False] * w] * h Note: 悲剧的初始化错误！！！ https://stackoverflow.com/questions/240178/list-of-lists-changes-reflected-across-sublists-unexpectedly?noredirect=1&lq=1 When you write [x]*3 you get, essentially, the list [x, x, x] . That is, a list with 3 references to the same x . When you the

Sequence Reconstruction

July 05, 2018

遇到了一题想了半天，觉得自己应该会做，感觉应该像排会议一样用，topological sort, 可又不知道如何抽象成graph node. 又觉得可能可以用union set, 但也不知道怎么写。来来去去想了很多遍，发现还是写不出一行code. 然后看了答案，发现，晕晕晕，真的是用topo sort. 只能怪自己道行不够。。。然后恍然大悟，好蠢的题目。做不出题目的我就是好蠢的我。哈哈哈 Description Check whether the original sequence org can be uniquely reconstructed from the sequences in seqs . The org sequence is a permutation of the integers from 1 to n, with 1 ≤ n ≤ 10^4. Reconstruction means building a shortest common supersequence of the sequences in seqs (i.e., a shortest sequence so that all sequences in seqs are subsequences of it). Determine whether there is only one sequence that can be reconstructed from seqs and it is the org sequence. Example Given org = [1,2,3], seqs = [[1,2],[1,3]] Return false Explanation: [1,2,3] is not the only one sequence that can be reconstructed, because [1,3,2] is also a valid sequence that can be reconstructed. Given org = [1,2,3], seqs = [[1,2]] Return false Explanation: The reconstructed sequence