Academic Positions

  • Present 2013

    Assistant Professor

    Computer Sciences Department, COMSATS Institute of Information Technology, Lahore, Pakistan.

  • 2013 2011

    Lecturer

    Computer Sciences Department, COMSATS Institute of Information Technology, Lahore, Pakistan.

  • 2011 2010

    Lecturer

    Computer Sciences Department, Univerity of Lahore, Lahore, Pakistan.

Education

  • Ph.D. In Progress

    Doctor of Philosophy in Computer Science

    Lancaster University, Lancaster, UK.

  • M.Sc2009

    Master of Science in Computer Science

    Swinburne University of Technology, Melbourne, Australia.

  • B.C.S2006

    Bachelor of Science in Computer Science

    University of Peshawar, Pakistan.

  • H.S.S.C2001

    F.Sc (Pre-Engineering)

    P.E.F Model Degree College for Boys, Peshawar, Pakistan.

  • S.S.C1999

    Matric (Science)

    F.G. Boys High School for Boys, Peshawar, Pakistan.

Honors, Awards and Grants

  • RPA 2013
    CIIT Research Productivity Award
    image

    Since its foundation, one of the chief aims of CIIT has been to promote quality research. This has been done by engaging its faculty, students and researchers to challenge existing ideas and by providing a research friendly environment. To encourage its faculty and promote quality research, CIIT-Research Productivity Awards are an annual feature. CIIT RPA for its Faculty, Staff and Students are for research papers published in a calendar year (in Impact factor and ISI indexed journals), the researchers will be awarded a certificate and cash prize.

  • ACS 2009
    Australian Computer Society Member
    image

    The Australian Computer Society is the professional association for Australia’s Information and Communication Technology (ICT) sector. ACS is about recognising professionalism, developing ICT skills and building a community with a true sense of belonging. It help members realise their professional ambitions in the global economy, making the most of an era of extraordinary possibility.

Great Personnel

Dr. Paul Clough

Research Mentor

+ Follow

Dr. Mark Stevenson

Research Mentor

+ Follow

Dr. Rao Muhammad Adeel Nawab

PhD Supervisor

+ Follow

Dr. Paul Rayson

PhD Supervisor

+ Follow

Dr. Alberto Barrón-Cedeño

Research Mentor

+ Follow

Jawad Shafi Mian

Postdoctoral fellow

+ Follow

Touseef Tahir

Postdoctoral fellow

+ Follow

The list on the left displays my supervisors, research mentors in the field of NLP and few colleages that inspires and motivates me to work hard everyday.

Research Projects

  • image

    COUNTER

    Corpus Of Urdu News TExt Reuse

    COUNTER - Corpus Of Urdu News TExt Reuse is a Urdu text reuse corpus developed at CIIT Lahore in partnership with Lancaster University. The corpus is released with an intention that it will foster the research in mono-lingual text reuse detection systems specifically for Urdu language. The corpus has 600 source and 600 derived (suspicious) documents. It contains in total 275,387 words (tokens), 21,426 unique words and 10,841 sentences. It has been manually annotated at document level with three levels of reuse: wholly derived (135), partially derived (288) and non derived (177).

    Click here for details

  • image

    TRUE

    Text Reuse Urdu English

    TRUE - Text Reuse English Urdu is a research project between NLPT at CIIT Lahore, Pakistan and UCREL at Lancaster University, Lancaster, UK. It aims to develop cross script cross language corpora and methods to detect text reuse at both document and sentence level. An initial corpus is under development that contains 2,500 source derived document pairs. The source and derived documents are from the field of journalism and contain real example of text reuse.

    Click here for details

  • image

    UPlag

    Urdu Plagiarism

    UPlag is a project that aims to contribute benchmark Urdu Plagiarism corpus with simulated as well as artificial examples of plagiarism. Moreover, the project has a secondary focus on developing (or modifying) state-of-the-art techniques for Urdu plagiarism detection system.

  • image

    UPPC

    Urdu Paraphrase Plagiarism Corpus

    UPPC is a corpus that contains 160 documents (20 source documents and 140 suspicious ones). The source documents are original Wikipedia articles on 20 personalities while the set of suspicious documents are either manually paraphrased versions produced by applying different rewriting techniques or set of independently written (non-plagiarised) documents. The resource is the first of its kind developed for the Urdu language and we believe that it will be a valuable contribution to the evaluation of paraphrase plagiarism detection systems. The corpus can be used for: (1) the development, analysis and evaluation of automated paraphrase plagiarism detection systems for Urdu language, (2) identifying which types of obfuscations (paraphrase strategies) are easy or difficult to detect and (3) would be a valuable resource for Urdu paraphrase identification task.

Filter by type:

Sort by year:

AAA

Muhammad Sharjeel, Paul Rayson, Rao Muhammad Adeel Nawab
Journal Paper Submitted

Abstract

COUNTER - COrpus of Urdu News TExt Reuse

Muhammad Sharjeel, Rao Muhammad Adeel Nawab, Paul Rayson
Journal Paper Submitted to Language Resources and Evaluation (LRE) IF: 0.9 (waiting for reviews)

Abstract

Text reuse is the process of creating new texts using existing ones. Freely available and easily accessible large on-line repositories are not only making reuse of text more common in society but also harder to detect programmatically. A major hindrance in the development and evaluation of existing mono-lingual text reuse detection methods, especially for South Asian languages, is the unavailability of standardized benchmark corpora. Amongst other things, a gold standard corpus enables researchers to directly compare with existing state-of-the-art methods. In our study, we address this gap by developing a benchmark corpus for one of the widely spoken but under resourced languages i.e. Urdu. The COUNTER corpus contains 1,200 documents with real examples of text reuse from the field of journalism. It has been manually annotated at document level with three levels of reuse: wholly derived, partially derived and non derived. In this paper, we also apply two simple similarity estimation methods (n-gram overlap and longest common subsequence) on our corpus to show how it can be used in the evaluation of text reuse detection systems. The corpus is a vital resource for the development and evaluation of text reuse detection systems in general and specifically for Urdu language.

UPPC - Urdu Paraphrase Plagiarism Corpus

Muhammad Sharjeel, Paul Rayson, Rao Muhammad Adeel Nawab
Conference PapersLanguage Resource and Evaluation Conference (LREC) 2016

Abstract

Paraphrase plagiarism is a significant and widespread problem and research shows that it is hard to detect. Several methods and automatic systems have been proposed to deal with it. However, evaluation and comparison of such solutions is not possible because of the unavailability of benchmark corpora with manual examples of paraphrase plagiarism. To deal with this issue, we present the novel development of a paraphrase plagiarism corpus containing simulated (manually created) examples in the Urdu language - a language widely spoken around the world. This resource is the first of its kind developed for the Urdu language and we believe that it will be a valuable contribution to the evaluation of paraphrase plagiarism detection systems.

Currrent Teaching

  • Fall 2015

    CSC101 - Introduction to Computing

Teaching History

COMSATS Institute of Information Technology, Lahore

  • Fall 2014

    CSC101 - Introduction to Computing

    CSC332 - Network Security

  • Spring 2014

    CSC101 - Introduction to Computing

    CSC332 - Network Security

  • Fall 2013

    CSC101 - Introduction to Computing

    CSC344 - Wireless and Mobile Computing

  • Spring 2013

    CSC101 - Introduction to Computing

    CSC344 - Wireless and Mobile Computing

  • Fall 2012

    CSC401 - Computing for Management

    CSC344 - Wireless and Mobile Computing

  • Spring 2012

    CSC401 - Computing for Management

    CSC141 - Introduction to Computer Programming

  • Fall 2011

    CSC401 - Computing for Management

    CSC101 - Introduction to Computing

  • Spring 2011

    CSC101 - Introduction to Computing

    CSC112 - Algorithms and Data Structures


  • University of Lahore

  • Fall 2010

    CSC1012 - Programming Fundamentals

    CSC3535 - Computer Networks

    ECE3323 - Data Communications

    CS522 - Network Security and Cryptography

  • Winter 2010

    CSC1011 - Introduction to Computing

    CSC3535 - Computer Networks

    ECE3323 - Data Communications

    CS521 - Advanced Computer Networks

At Office (Lahore, Pakistan)

You can find me at my office located at Mathematics Deparment, cabin # 1 (right, upstairs).

I am at my office (apart from my scheduled lecture slots) working days from 8:30 am until 6:30 pm, but you may consider a call or drop an email (preferred) to fix an appointment.

At Lab (Lancaster, United Kingdom)

You can find me at InfoLab21, Room # C30.

I am there weekdays from 9:00 am until 8:00 pm.