Category Archives: 学习

2012.9-2013.1 Semester 3 上的课

第三个学期

2IW15 Automated Reasoning

  • Leaned:
    • Obtaining insight how various problems can be transformed to formulas, and can be solved automatically by computer programs manipulating these formulas.
  • Project:
    • 3 assignments
  • Skills:
    • Yices, bddsolve
  • Ps: 感觉就是练了vim和python

2IN28 Grid and Cloud Computing

  • Leaned:
    • Scheduling and Resource Management
    • Data Centers and Energy Efficiency
    • Multi-tenancy and Virtualization
    • Cloud Programming Models
  • Project:
    • Term extraction system on Amazon EC2
    • Deployed a term extraction system on Amazon EC2
    • The system consists of two parts: the resource management server and the labors
    • The labors are virtual machines on Amazon EC2, and processes inputs, outputs the terms
    • The resource management server controls the resources (labors); elastically allocates the resources.
  • Skills:
    • AWS SDK, web2py, Spring framework
  • Ps: elaborate

2IV35 Visualization

  • Leaned:
    • color mapping, contouring
    • Vector visualization
    • Volume visualization
    • Information visualization
  • Project:
    • 3 assignments
  • Skills:
    • OpenGL in Java, Prefuse library

2ID95 Information System Seminar

  • Project: distributed sampling
    • Altered a pattern sampling algorithm to a distributed fashion
      • Involving reservoir sampling
      • Implemented Cartesian Join in hadoop
    • Implemented it in hadoop environment
  • Skills: hadoop old and new APIs

Internship in Philips Research

  • Position: software engineer
  • Project: TClouds, TPaaS (Trustworthy Platform as a Service)
  • Skills: Spring framework (MVC, Security), Hibernate, Maven, OAuth

2012.2-2012.7 Semester 2 上的课

拖到今天终于要写了, 不晓得还记得以前干了什么不.

9ST14 Academic Skills in English

  • Learned: presentation and writing
  • Assignments: presentation and writing
  • Ps: 其实很好的一门课, 可惜当时不知道在干吗

5MB20 Adaptive Information Processing

  • Learned: Bayesian machine learning; Gaussians; Linear regression; Generative classification; Discriminative classification; Gaussian mixture models; Entropy maximum algorithm; Hidden Markov models; Context tree; MDL principle
  • Ps: 这课还是学了很多东西的, 就是忘光了

2ID45 Advanced Databases

  • Learned:
    • deductive databases (datalog);
    • data warehousing and online analytical processing (OLAP);
    • XML data model
  • Project: a CV generator website. The user can enter blocks of information to the website, decide which of those would be included each time to the final CV edition, and in the end, produces a pdf version of his CV.
    • Used BaseX server on an Apache server to support the XML database and XQuery
    • Used PHP to implement the webpage interface to allow users to access their profiles
    • Used latex as the backend processor to generate CV
  • Skill acquired: XQuery, BaseX, PHP
  • Ps: 俄罗斯哥说这课上东西没实用价值

2II55 Business Process Management Systems

  • Learned:
    • Modeling and implementation of workflows
    • analysis of workflows/business processes
  • Project: modeled a Retail Supply Chain, and a Robotic Distribution Centre
  • Skill acquired: YAWL, Protos
  • Ps: 挂了, 补考过了

1BM46 Data Mining and Process Mining

  • Learned:
    • Data mining:
      • K-nearest neighbors, decision trees, information gain, over fitting
      • performance measurements, experimental design, k-fold cross-validation
      • ANN, clustering
    • Process mining:
      • structure of an event-log, CPN-tools for generating event logs, alpha-algorithm
      • Conformance checking (fitness precision, generalization, understand ability) and the LTL checker)
      • Process mining in practice: C-nets, Flexible Heuristics Miner, Fuzzy Miner
      • Process mining and k-fold cross-validation, multidimensional process mining
  • Project:
    • Prediction of cancer patients’ duration in hospital
    • Divided the duration into two classes: short-term and long-term
    • Eliminate irrelevant attributes by backward feature elimination
    • Develop a decision tree
    • Cross validated the develop tree, and achieved an accuracy of 75.89%
    • Explored the relation between durations and treating physicians
  • Skill acquired: KNIME, CPN-tools, ProM 5&6

2ID35 Database Technology

  • Learned:
    • Storage, the I/O computational model, & external sorting
    • Indexing: B-trees, R-trees, and GiST
    • Query processing and optimization
    • Distributed query processing
    • Transaction management
  • Project:
    • Cardinality estimation method for RDF star joins
    • implemented the characteristic sets using Java
  • Skill acquired: Apache Jena APIs, Sparql

2IL55 Geometry Algorithms

  • Learned: Geometry algorithms, e.g.
  • Project: Map matching
    • Map matching is a process of identifying the roads which the vehicle actually drives on based on its GPS trajectory.
    • Implemented two algorithms for the map matching problem:
      • Incremental
      • Global based on weak Fréchet distance
  • Skill acquired: Nothing
  • Ps: 俄罗斯哥一手包办了project的coding, report和presentation

2IW02 Real-time Software Development

  • Learned: CSP (Communicating sequential processes)
  • Project: controlling a camera…
  • Skill acquired:
  • Ps: 最后的作业去UTwente做的, 不过没做完, 考试也没考, 挂了.
    • 奇葩课程, 工具是一边开发一边给学生用
    • 在twente还看见了开发者, 一中国妹子, 吓尿了

5N520 Statistical Bioinformatics

  • Learned:
    • essential molecular biology
    • sequence alignment and dynamic programming
    • BLAST statistics and substitution matrices
    • multiple sequence alignment
    • Hidden Markov Models for sequence alignment
    • phylogenetic trees
    • sequencing and genome assembly
  • Project: No project
  • Skill acquired: Matlab Bioinformatics Toolbox

数据库技术(Database Technology)作业

Cardinality Estimation using Characteristic Sets

– RDF格式, rdfs, tdb, notation3, n-triple.

– Jena API, TDB 打开数据库, 建立Model

– SPARQL, Star-join.


这project比想象的要难. Characteristic Set 算法很简单, 但是需要…多次扫描整个数据库.
一开始, 想要先算Plain Set without annotations 然后再计算annotations, 当然失败告终, 数据结构设计的太差太差.
后来有重构(每次project都要这样, 真不知道怎么办, 真的需要重看软工的书了), 略好些.
幻想用简单的SPARQL先分组再全部扫描, 当然悲剧了. 那时还不知道ARQ 中 ResultSet竟然是个流, 还天真以为会一次做完查询然后把结果存内存里.
貌似数据库结果都是这样做的, 本科时数据量太小没意识到.
Stackoverflow里问人得知还有个命令叫 group_concat, 果断用之, 把所有同样集合里的count全部concat起来, 然后再拆分. 嗯…一下是我写出来的..

	String queryString =
		 "SELECT (COUNT(?s) AS ?distinct) " 
		+ "?propset "
	        + "(group_concat(?count; separator = "\t") AS ?counts)"
		+ "{"
		+ "SELECT ?s "
		+ "(group_concat(?p; separator = " ") AS ?propset) " 
		+ "(group_concat(?c; separator = " ") AS ?count)" 
		+ " {"
		+ "SELECT ?s ?p ?c"
		+ " WHERE "
		+ "{"
		+ "SELECT ?s ?p (COUNT(*) AS ?c)  "
		+ "WHERE { ?s ?p ?o .}"
		+ " GROUP BY ?s ?p"
		+ "} ORDER BY ?s ?p"
		+ "} GROUP BY ?s ORDER BY ?s"
		+ "} GROUP BY ?propset ORDER BY ?propset";

当然, 效率无比低下, 貌似要扫三次整个数据库, 还不算group by和order by的代价.
在Yago上彻底悲剧了, 一夜都没结果.

写信问作者, 他说要三步…最后还是要扫整个数据库并进行group by的. 真想骂句竟然好意思在文章里写”implemented it by only two group-by operators”
不过我想他也是受不了我信里明显讽刺才详细回我的吧= =


嗯现在知道为什么慢了, 就是因为TDB不是一次性载入的. 然后自然就想直接把YAGO载入内存里嘛, 反正我有8G, 区区2G数据算什么?
于是我又悲剧了. Jena无法解析Yago..说好的都是N3文件呢, 为什么无法解析呢.
继续天真, 找了几个文本工具想把不能解析的地方改过去, 当然又悲剧了. btw, Notepad++无法打开这么大的, vim可以打开, 可以修改, 保存的话就直接挂了. 010Edit倒是可以,
但我发现有很多地方需要修改啊, 用下搜索吧…于是这软件也挂了.
这就怪了啊, 为什么无法解析, 找了下发现YAGO这货竟然是用ASCII编码的..然后他还能自成一套能把UTF-8的东西也编进去, 完全震惊了.
他们也没提供可以转的工具, 只有一个可以转String的. 那我岂不是要2G的String?
最后..既然Jena可以读TDB, 那我把TDB转成N3不就行了?
貌似还是天真了, 从TDB中创建的Model, write了20分钟, 文件大小还是0, 不知道他在干嘛…
嗯睡觉了, 明天早上来看. 希望有惊喜.

2011-2012 Semester 1 上的课

2011-2012 Semester 1 上的课
来这里的第一学期选了7门课, 拿了33学分. 领教了国外教育, 确实很苦, 而且很多东西叫兽都默认你会, 但实际上要自己学.
搜索和看文档的本领显著提高啊. 学了latex写文档, 非常好用的工具, 开始鄙视word. 会了点python, 开始不抵制没学过的语言.
尝试学过vim, 终因各种借口放弃, 以后再学吧, ..首先学用键盘控制查找替换, 这真不如notepad++我觉得:)
差不多这就是我能想到的, 第一学期干的事情.


2IL45 Advanced Algorithms

  • Learnt: randomized algorithms; approximation algorithms – PTAS; geometry Algorithms
  • Project: For load balancing problem, implemented greedy, modified greedy, genetic algorithm, and PTAS
    • Drawed figures in Python, using Matplotlib library
  • Skills acquired: basic python, latex, and svn. 谢谢俄罗斯人让我学了这三样工具, 真的很重要

2IS15 Generic Language Technology

  • Learnt: dynamic semantics, design of domain-specific language, and model transformation
  • Assignment(4): Defined coordinates and movements language using Xtext, translated coordinates to movements using ATL, and transformed the two languages to NXT using Xpand.
  • Skills acquired: ASFSDF meta-environment (tue自己搞的), Xtext, ATL, and Xpand

2ID55 Adaptive Systems

  • Learnt: user modelling, and part of maching learning
  • Project: designed an HTML tutor system using GALE
  • Skills acquired: GALE (又是tue自己搞的工具, 在我看来没什么用而且过时了)

2IW26 System Validation

  • Learnt: transition language and verifying a system
  • Project: designed a wafter controller, and verified it with mCRL2:identify global requirements for the whole system and the interactions;
    tranlate the global requirements to the transition language; use mCRL2 to verify it.
  • Skills acquired: mCRL2

2ID25 Information Retrieval & 2II35 Web Information Systems

这两个放一起,因为project是一起的,而且这两节课一个老师教并且我后来基本不去。

  • Learnt: information retrieval入门, wis也是一堆东西但都只是入门
  • Project: designed a meta search engine, taking queries by the user and forwarding them to different search engines, namely, Google and Bing. Stored and indexed the result using Lucene, and ranked the results based on the user’s preference.
      这项目我做了Lucene这块.

    • parsing documents, tokenizing and writing, indexing and searching
    • ranking documents by changing class Similiary in Lucene; boosting them using class Field and Document.学了Lucene里的排序公式和Goolge的page ranking
  • Skills acquired: Lucene3.5本科毕设时候用了stanford parser会调用Java 开源API了, 用了Lucene后觉得自己又上了一层台阶:) 后来(现在), 看Jena代码觉得Apache的项目都有一种自成一体的风格, 而且documentation非常完善非常舒服.

2IT17 Automata and Process

  • 本科的课, 现在是homologation; 状态机, 图灵机,不算太难,考前突击拿了个8,欢天喜地过了.

 

高级算法Python0

考完System Validation并且周一提交GLT作业后,轮到高级算法(Advanced Algorithms, AA)了。

 

这课的Project是实现几种Job Shop算法并比较。我组成员是一俄罗斯Geek还有一荷兰Geek。第一次讨论把语言定了,他们都认为Python比较好,我默不作声,因为不会。第二次小会分配任务,他们都找到了自己想写的算法并且会在两个礼拜之后(部分)实现,然后对我的任务呢,就是,学习Python并且实现老师给的两个弱爆了的贪心。

 

所以我至少得在下周三前搞定这个,不能太丢脸。

 

学习Python。

首先杨姐姐给的PDF非常有用,叫byte of python,看了一晚上基本对这语言有个印象,至少能读。

初印象,python是流氓语言。变量不需要声明类型,没有花括号等(优良)特性让我非常不习惯。

然后,今天其实没学到什么,写个总结,语言真的都有共性,尽管这么流氓,还是脱不开三个控制结构,类,继承等概念。

 

遗留问题(to do),

没有搞清楚self的概念,但愿写起来会明白。

另,YAML需要看,看起来是一标记语言,跟XML差不多。

以及找一个靠谱的算法。