E-government research insights: Text mining analysis

E-government research insights: Text mining analysis

Journal Pre-proofs E-Government Research Insights: Text Mining Analysis Emad Abu-Shanab, Yousra Harb PII: DOI: Reference: S1567-4223(19)30069-9 https...

671KB Sizes 0 Downloads 107 Views

Journal Pre-proofs E-Government Research Insights: Text Mining Analysis Emad Abu-Shanab, Yousra Harb PII: DOI: Reference:

S1567-4223(19)30069-9 https://doi.org/10.1016/j.elerap.2019.100892 ELERAP 100892

To appear in:

Electronic Commerce Research and Applications

Received Date: Revised Date: Accepted Date:

20 November 2018 15 September 2019 16 September 2019

Please cite this article as: E. Abu-Shanab, Y. Harb, E-Government Research Insights: Text Mining Analysis, Electronic Commerce Research and Applications (2019), doi: https://doi.org/10.1016/j.elerap.2019.100892

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Elsevier B.V. All rights reserved.

E-Government Research Insights: Text Mining Analysis Emad Abu-Shanab* Accounting & Information Systems Dept., CBE, Qatar University, Qatar, [email protected] *Corresponding author

Yousra Harb Management Information Systems Dept., IT college, Yarmouk University, Jordan [email protected]

E-Government Research Insights: Text Mining Analysis Abstract Research on e-government is attracting more attention from scholars and growing exponentially. A wide spectrum of studies has explored a variety of topics to lay the foundations of this research area. The aim of this study was to extend our knowledge of egovernment and investigate dominant and future research directions. To that end, this paper attempted to explore published research in the e-government discipline using the reported keywords for a large selection of journals. This study provided a text mining-based review of 2018 articles collected from 11 journals using 12692 keywords. The results indicated that new topics like open government, smart cities, and analytics have recently been attracting more research. Researchers continue to pursue areas like governance, e-government adoption, eparticipation, e-democracy, administration and procurement topics. Finally, researchers’ quest for a theory or framework that guides the study of e-government has faded. More details are reported in this study. Keywords: E-government, Research Agenda, Research directions, Typology, Keywords Analysis, Text Mining

1. INTRODUCTION The topic of e-government is becoming more important with the increased use of information and communication technology (ICT) and its applications in public administration and the political arena. Governments are keen on reaching their citizens and opening channels with businesses to improve their operations and better serve their societies. As a consequence, research related to e-government is growing exponentially. Researchers are exploring a variety 1

of topics to extend our knowledge in this domain and build a theoretical foundation for this area. E-government is defined in its simplest form as the use of ICT tools and applications to provide better services to citizens and businesses (Yildiz, 2007; Abu-Shanab and Al-Azzam, 2012). Many definitions indicate the importance of sharing information and collaboration between the government and its external and internal partners (Bhatnagar, 2004). Basu (2004), in a review of a range of e-government definitions, asserted that e-government is the utilization of technology to promote governance and enhance the accessibility, delivery, and transparency of government services. Another source defined e-government as a means of improving the transparency of government practices and the vital participation of citizens through the proposition of connected governance (UNDESA, 2008). Still, the e-government concept is expanding to a more diverse concepts and applications. The major directions in e-government are related to providing e-services to citizens and improving existing services by utilizing the capabilities of ICT and the Internet (Alshehri et al., 2012; AlNaimat et al., 2011). Governments are encouraged to follow the steps of the private sector in employing information systems and technologies (IS/IT) to improve their operations and performance (Cook et al., 2002; Bhatnager, 2004). The movement of e-government towards reaching and including citizens in the public and political administration process falls under many terms and initiative titles such as eparticipation and e-inclusion. E-participation includes many initiatives such as informing citizens, involving them, collaborating with them, engaging them in the governance process, and empowering them (Tambouris et al., 2007; Gatautis, 2008; Al-Dalou’ and Abu-Shanab, 2013). To make such initiatives/processes successful, research on the digital divide started to raise issues of efficiency of communication and collaboration as well as accessibility (Bansode and Patil, 2011; Helbig et al., 2009). E-government projects can be considered a tool for bridging the digital divide (Abu-Shanab and Al-Jamal, 2015), while, conversely, the wide spread of ICT infrastructure and knowledge within societies can also improve the chances of e-government success. The amount of e-government research is expanding with the increase in the number of journals, conferences and magazines dedicated to e-government. E-government is a cross-disciplinary topic that intersects with social sciences, public administration, computer science and political science. Previously, several studies reviewed the e-government discipline and tried to achieve a foundational understanding of this domain. The focus of these studies covers a variety of topics ranging from identifying the main research streams on e-government, popular research methods, and the most cited papers to a mixture of these issues. The major research topics identified in the literature revolved around testing e-government websites’ usability, exploring citizen adoption, building secure systems, or investigating the interactions of political and administrative sciences with ICT and the Internet. Examples of the different research methods applied include empirical research, website testing, field research, archival research, and experimental studies. The objective of this study is to understand the prevailing directions of e-government research by analyzing keywords included in e-government publications and a group of journals. Our analysis focused on keywords and themes associated with papers in the e-government discipline. To the best of our knowledge, this study is the first documented attempt to analyze selected e-government publications using the text mining approach. The text mining method estimates its results based on the actual existence of a research term, which can be stronger than other methods that base their conclusions on simple contextual data or the authors’ opinions. Text mining also accounts for the dominance of terms/trends in data and is suitable 2

for use with a large number of keywords. These features make the method more valid and generalizable than other methods. This study contributes to research and practice by identifying the main research themes or trends in e-government and shedding light on past and prospective research priorities. The results of this study reveal the research interests and character of the e-government discipline. The following section introduces e-government research directions and the research methods used in the literature. The third section describes the research method and questions of this paper. Section four reports the analyses and discusses the results. Finally, conclusions, contributions, and limitations are reported in the fifth section.

2. E-GOVERNMENT RESEARCH The following two sections will review the e-government research dimensions and methods utilized in the reported research. It is essential to explore such areas to see how text mining can contribute to new, valid directions not reported in existing research. 2.1 E-government research dimensions When reviewing e-government definitions, we uncovered two major aspects: the first is the disagreement over the definition of the concept of e-government (Yildiz, 2007; Halchin, 2004), which included issues and concepts imported from diverse domains. The second aspect is the fragmentation of the discipline, which has not formulated a major theory or typology to organize research efforts. This is typical based on research resources and objectives, especially when the supporting domains are stretched between computer sciences, social sciences and political sciences (Abu-Shanab, 2013). Yildiz (2007) highlighted how marketing language and research directions shape the area of e-government, especially when such research is supported by governments. This emphasis caused a biased perspective. Yildiz commented that the type of research dominating this area is descriptive research focusing on criticizing some government practices and websites. We can conclude that focusing on the definition of egovernment was one of the major research directions in the literature. Another area of research in e-government is related to the need to build frameworks that integrate the views from multiple disciplines (Hardy and Williams, 2011). The authors based their findings on a review of e-procurement research. They concluded that multiple frames of references are required and that research findings need to be understandable, usable, accessible, and time-based. Moreover, the previous discussion demonstrates the expansion of the discipline of e-government and the vital role this discipline plays in influencing society as a whole. E-government research is defined as research that combines a focus on government practices and information technology (Fountain, 2003). Other researchers define e-government research as activities that support government opportunities to build a strategic vision with respect to information technology utilization. It is important to focus research on e-government as it carries the governance perspective and government responsibilities towards society, businesses, and individuals. A third area of research focused on e-government projects, reporting issues that portrayed the area as challenging because of the wide range of the e-government perspective with respect to size and scope (Abu-Shanab and Bataineh, 2014). A simple public website that displays information is considered an e-government initiative, as is a complicated interactive website that offers many diverse services. Such a wide scope requires focusing. Imitating the business discipline, we can borrow the small and medium-size business classification to group egovernment projects into categories with common characteristics such as size, focus, number of services, and area of coverage. 3

Along the same line, Rodríguez et al. (2010) asserted the need for a framework to improve the implementation process of e-government projects. The authors reviewed 321 articles published between 2000 and 2009 by the Institute for Scientific Information and concluded that a set of knowledge gaps prevail in the domain. They focused on titles, keywords, and abstracts to categorize the prevailing themes into two major areas: public administration and information systems. The authors then reported research opportunities in the reviewed articles that would require changes in research methods. Such a trend supports the development of a theoretical framework that would in the future contribute to improving public sector management. This result aligns with the findings of Bryson et al. (2010) mentioned previously. Garson (1999) suggested four major dimensions of e-government research: information technology capabilities in supporting democracy, normative versus dystopian directions of research, the interaction of social systems with information technology and the institutional environment, and, finally, integration of e-government within the global environment. In a study by Heeks and Bailur (2007), the authors analyzed the content and research methods in 84 articles published in two e-government journals and given at a conference. They proposed to organize the research according to the department of the researcher, source of the literature, research method, and knowledge framework implemented. The last two criteria will be investigated more in the next section. Results based on literature source showed that e-government was the source for 33 papers, and information systems the source for 19 papers. The academic departments where the researchers worked were business administration (11 papers), public administration (10 papers), and political science and computer science (8 papers each) (Heeks and Bailur, 2007). The study might be limited by the selection of the two journals and conference. The results might be totally different when using other sources, but we can still confirm our view regarding the diversity of disciplines and research sources and the fragmentation of e-government research among many research outlets and academic departments. Madsen et al. (2014) conducted a study focused on the most-cited papers in the e-government discipline in each year from 2001 to 2010. The authors applied some restrictions (e.g., that the articles used needed to have “e-government” in the title) and listed certain structural aspects of the paper (e-government purpose, definition, research method, scope, impact and other aspects). The authors found that that some initial attempts to build a research philosophy were reported, but most papers were still positivistic. Extending from Heeks and Bailurs’ conclusions, the authors also concluded that more research on social constructivism and levels of services provided to citizens and more use of primary data are needed. The authors (Madsen et al, 2014) utilized template analysis, where text phrases are analyzed through multiple iterations of coding to reach their conclusions. In a workshop organized in the USA, more than 30 researchers asserted that e-government research is primarily technical and focuses on utilizing ICT capabilities to reach government objectives. This argument supports our previous observation regarding the focus of egovernment research on website analysis. Other researchers asserted that the technical emphasis of e-government research results from the importance of this factor for the success of e-government projects (Abdelghaffar et al., 2005). With the advent of social media, more egovernment research is expected to be directed towards social, institutional and organizational aspects. Wimmer (2007) conducted a set of workshops and reflected on 141 responses from participants to propose a scenario for future development of e-government research. Thirteen research themes emerged, among them data privacy, trust in e-government, citizen engagement, governance issues, assessing e-government value, and its direction and performance. Another study that utilized 110 peer-reviewed articles in the field of e-government concluded that 34% 4

of papers focused on issues related to e-service, 25% on e-democracy, 24% on conceptualization, and 18% on technology adoption (Andersen and Henriksen, 2005). A study by Yildiz (2012) proposed a set of questions to help direct e-government research and explore its dimensions. These questions involve how the research relates to public administration, how to make such research more multidisciplinary and comparative, how to evaluate the performance of e-government projects, and how to generate more usable models and theories. The author utilized a study of major keywords and proposed these issues as part of what he labeled “big questions”. The author concluded by emphasizing the vital role of egovernment research in improving government performance and enhancing democracy. Belanger and Carter (2012) investigated e-government and IS research and made recommendations for future research that revolved around five dimensions: 1- Publication outlets: Focus more on interactions with stakeholders and participatory governance, publish practical aspects, and try to reach non-English-language research. 2- Theoretical foundations and contributions: Create a theory of e-government, conduct a meta-analysis of e-government qualitative research, focus on design and action efforts, and extend the theory base of e-government. 3- Methodological approach: Continue qualitative and quantitative research especially at the government level, carry out more longitudinal research, and investigate more unit-ofanalysis types. 4- Sampling: Try to diversify samples, validate the sample choice, and sample at different levels of government. 5- Topic areas: Try to broaden the research questions explored in e-government research and focus on new dependent variables. Meta-analyses of research focused on one dimension of e-government are not as popular as those on e-government in general. A study by Medaglia (2012) analyzed 122 articles published from 2006 to 2011 that included keywords related to e-participation and its activities, participant effects, evaluation and methods. The results highlighted the dynamism of the domain and a shift in a more counterintuitive direction. The author recommended moving away from technological research and focusing on other categories of stakeholders, emphasizing the need to make this research direction more participatory in nature. Another study utilized a multimethod approach in which 24 e-government experts were used as subjects in a Delphi technique and results were reviewed from a meta-analysis of literature collected from 12 journal publications from 2000 to 2009 (Niehaves, 2011). The author’s conclusions regarding the aging phenomenon of societies highlighted 6 action fields and 12 measures for future research. The six action fields are the following: new perspectives on aging, managing access, age-aware e-government service design, ubiquitous e-government for the elderly, the aging workforce in government organizations, and changing the way we do research. In this study, we aim to extend the previous work identifying the main research topics in the egovernment discipline. To achieve this objective, we focus on analyzing the keywords, concepts, and themes associated with the articles published over the past 15 years. Bentley (2008) argued that keyword frequencies can be used to track scientific evolution. Similarly, Wang et al. (2012) analyzed paper titles’ words, keywords, and word clusters in the anaerobic digestion for methane academic field. This study focuses on the e-government field and utilizes text mining approach to perform frequency and clustering analysis of concepts. 2.2 Research methods used in e-government Based on the diversity and multidisciplinary nature of the concept, the e-government domain is a rich environment for the use of many types of research methods. The same studies that 5

evaluated e-government research noted the different types of research methods used and reported their results. One study found that the bulk of e-government research is either conceptual or case studies (Dwivedi, 2009). The previously mentioned work by Heeks and Bailur (2007) indicated that the research methods employed in their sample articles were the following: hunt and pick (19 papers), questionnaires (15 papers), interviews (14 papers), and document analysis (14 papers), with 20 papers that did not indicate any method. Also, results indicated that 29 papers were model based, 22 papers category based, 10 papers framework based, and 10 papers non-framework based. Yildiz and Saylam (2013) conducted an analysis of media articles (newspapers in Turkey) and concluded that research should be directed more towards discourse analysis and media influence on the e-government domain. They recommended that further research be conducted to compare countries and the influence of different discourses on public stakeholders and, finally, explore the link between vendors and the results of the research method applied. Along the same line, in the e-participation study, Medaglia (2012) investigated the methods used, which were the following: case studies, surveys, content and discourse analysis, action research, experiments, and focus groups. Recommendations for certain methods in e-government research were proposed by many researchers based on their belief that the e-government domain is fragmented and needs a framework to govern its research. Joseph (2010) recommended using mixed methods (quantitative and qualitative) for reasons such as their flexibility to accommodate the richness of the domain, diverse theory base, and research streams. Similarly, Rodríguez et al. (2010) found that the existing status of e-government research displays a greater application of quantitative methods, which calls for a need for qualitative research. In summary, based on the previous work reported, we can see that research in the area of egovernment has not delivered a theory that sums up and unifies the domain. The previous studies exploring research directions also displayed diverse methodologies and a wide-ranging topic selection. These studies also indicate that no single study has extensively explored the research studies and summarized this area. The current study is thus one of the first documented attempts to explore the domain based on text analysis using keywords from e-government research. The following section is a description of the research methodology and data analysis.

3. RESEARCH METHODOLOGY This study pursued three research questions that will contribute to enhancing our knowledge of the e-government discipline and building its research community. The following are the research questions: - RQ1: What are the key e-government research topics witnessed in the dataset? - RQ2: What are the e-government research changes witnessed over sample period (from 2002 to 2016)? - RQ3: What main research topics are likely to flourish in future periods? The research methodology utilized a large set of research keywords associated with articles related to the e-government topic. The data used included the year of publication of the article and the set of keywords posted with the abstract. The list of journals adopted in this study included the most popular in the field according to the research team’s judgement and as ranked by indexing websites. The major criterion for selecting the journals was the relation of the egovernment topic to ICT or the Internet. Based on this criterion, political science and public administration journals were excluded. Such a journal sample was deemed appropriate for the purpose of this type of study. 6

Based on the previous step, we utilized the research published in the journals listed in Table 1 during the years shown for each journal. The data ranged from 2002 to 2016 (15 years). The total number of journals included in this study is 11, the total number of articles 2018, and the total number of keywords 12692. The collected data also included the research title, authors, publication year, journal name, and volume and issue number. The research team collected the data from the journals’ websites, where the keywords were posted under the article abstracts. The collection was done manually and did not utilize any software (engine) or application. A few articles (such as editorial introductions and book reviews) included no keywords. Finally, this study adopted the keywords published by journals and separated by a comma or semicolon. For the analysis, the research data were entered into an Excel sheet, a format acceptable for using Python language.

4. DATA ANALYSIS AND DISCUSSION The following two sections describe the two directions of analysis we followed for answering the research questions. The first is to utilize descriptive analysis and relate that to the literature review. Such an analysis was deemed necessary to better understand the domain. The second direction is text mining techniques. 4.1 Descriptive analysis The research data entered into Excel were used to conduct a cluster and frequency analysis. First, we estimated the distribution of articles per year, with the resulting data shown in Figure 1. Research in e-government boomed in the last few years, which seems natural based on the popularity of the topic and technology advancement. It is also reasonable to state that a few of the selected set of journals were founded between 2007 and 2009, which caused the substantial increase in keywords (see Figure 1 on the last two observations).

Table 1: Journal list and summary of articles and keywords Code 1 2 3 4 5 6 7 8 9 10 11

Journal Name Government Information Quarterly Electronic Journal of e-Government Electronic Commerce Research and Applications Electronic Government: An International Journal Electronic Journal of e-Government Information Polity International Journal of Electronic Governance International Journal of eBusiness and eGovernment Studies International Journal of Electronic Government Research Transforming Government: People, Process and Policy Journal of e-Government Studies and Best Practices Total

Total Articles 511 44

Total keywords 2784 236

55

Years 2008-2016 2012-2014 2002-2005 + 2012-2016 2008-2012 2002-2011 + 2015-2016 2002-2016 2007-2016 2009-2011+ 2015-2016

155

2010-2016

910

237

2007-2016

1279

6 2018

2015-2016 2002-2016

36 12692

294 154 184 199 179

1626 1680 987 1043 1886 225

7

2016

265

2015

203

2014

236

2013

219

2012

246

2011

199

2010

174

2009

182

2008

70

2007

63

2006

22

2005

51

2004

52 28

2003 2002

8

Figure 1: Publication distribution per year

A second step in the analysis was to generate a tentative keyword distribution, where the frequency of popular keywords was estimated. The results are shown in Tables 2 and 3. Naturally, e-government appears as the top keyword, with the highest frequency among all terms. Such a result is logical based on our selection of journals. Still, Table 2 provides insights into other terms that saturate the domain. Terms such as open government started to appear in the literature more often and attract more research in the last few years. Also, we can see that terms such as e-participation, e-democracy, e-governance, transparency, and digital divide draw more attention. Table 2: Frequency of sample keywords related to e-government topic Keyword E-government Electronic Government E-participation Governance E-democracy Government E-governance Transparency Open government Digital divide

Freq. 557 163 84 76 74 69 68 56 46 43

Keyword eGovernment Democracy Citizens participation Participation Digital government E-voting Government 2.0 Electronic participation Electronic democracy Accountability

Freq. 38 35 33 30 29 24 23 21 21 21

Keyword WeGov/We-government E-gov eGov Connected government Gov 2.0

Freq. 4 4 4 3 3

Table 3: Frequency of sample technology and management related keywords Keyword Social media ICT Open data Public administration

Freq. 93 52 36 42

Keyword Security Trust Privacy e-service

Freq. 32 31 29 25

8

Adoption

34

In Table 3 we focused more on technology-related terms and other administrative and managerial terms. It is important to see how other related domains interact with e-government research. The latest trends and development of social media made that term popular in research. However, it might not be accurate to say that all such instances are jointly researched under egovernment research, as we see the Electronic Commerce Research and Applications journal listed among the titles we selected for this study. Another journal that might publish articles outside the e-government focus is the International Journal of eBusiness and eGovernment Studies. Such instances may influence the data, but not to such a degree as to influence our conclusion; social media is indeed gaining popularity among researchers. Other terms that attracted more research are open data, adoption, trust, security, privacy and e-service. ICT is a term used frequently by researchers to introduce the e-government umbrella, where it presents a coverage of communication (Internet and web areas) and computing (computing and systems). Such terms are used by researchers to cover the infrastructure supporting egovernment (regardless of its status: offline or online). 4.2 Text mining analysis (clustering) In this research, we applied a text mining model (document clustering) to answer the research questions. In particular, we used the k-means clustering algorithm, through which keywords are classified into groups called clusters. K-means is one of the most widely used clustering algorithms due to its simplicity, efficiency, and empirical success (Jain, 2010). In this unsupervised process, there are no predefined classes or labels. The clusters are formed based on keyword similarity. Clustering algorithms provide corpus summarization in the form of word clusters, which can be used to offer insights into the content of the underlying corpus (Aggarwal and Zhai, 2012; Bekkerman et al., 2001). In a k-means clustering algorithm, “k” represents the predetermined number of clusters. The algorithm generates k random points as initial cluster centers. Then, it assigns each point to the closest cluster center. The new cluster centers are then recomputed until some convergence criterion is met—usually when no more changes occur (Jain, 2010). There are several tools that can be used to conduct text mining analysis. For this paper, the text mining technique was performed utilizing Python , since it is a popoluar and widely used open source language for text processing and provides high flexibility through the use of its packages (Bird et al., 2009). In particular, the following three Python packages were used to perform kmeans clustering: pandas, Numpy, and scikit-learn. Also, we used NLTK (the Natural Language Toolkit package) to perform preprocessing tasks on the data such as converting the text to lower case and removing numbers and punctation (Cielen et al., 2016). To classify the keywords into clusters using the k-means clustering algorithm, we integrated all publications’ keywords into one big data file and treated each publication keyword as one document. The result of this step is a file with 2018 documents. Then, before importing the data into Python, we separated the whole dataset into three different corpora based on their associated publication year (variable). The first corpus contained data from 2002-2007, and the second and third corpora contained data from 2008-2012 and 2013-2016, respectively. This separation was necessary to answer the research questions and analyze the e-government research trends for each corpus. 4.2.1

Clustering results 9

The clustering procedures started by performing some important transformations on each corpus. First, we harmonized all the keywords to lower case to achieve consistency in the results. Second, stop words, numbers, and punctuation were removed from the text, as they do not add value to the analysis. Third, we wrote some regular expressions tokenizers to keep words with hyphens and treat compound concepts, such as online government, as a single concept to enhance the semantical interpretability of the analysis. Rather than using a simple term-document frequency by keywords, in this research, the TFIDF vector is calculated for each document. TF represents term frequency and IDF represents the inverse document frequency (Salton et al., 1975). TF-IDF is a well-known weighting schema that is widely used in text mining research (Haddi et al., 2013). Term frequency counts the frequency of a particular word in a document. The higher the value, the more important a word in a document. Inverse document frequency is also considered and calculated by dividing the total number of documents (N) by number of documents containing the term (DF), and then taking the logarithm of that quotient (N/DF). The IDF value is used to represent how frequently a certain word appears in a collection of documents. The value increases when few documents contain a particular word among all documents. As a result, each cluster is represented by highly frequent words. Next, k-means clustering was used. In the k-means clustering algorithm, it is necessary for users to define the number of clusters (k). Based on a trial-and-error approach (Pham et al., 2005), we specified the best number of clusters. This is was achieved by comparing the clustering results obtained for different values of k and deciding the most appropriate value of k for the given dataset; hence, we set the number of clusters to 10. The following diagram represents the overall distribution of publications per time category set up for this research. The figure shows that 11% of the publications are in the time span from 2002 to 2007, 43% of the publications are in the period from 2008 to 2012, and 46% of the publications are in the period from 2013 to 2016.

Figure 2: Research articles distribution according to the three time spans

10

The following figure shows the total number of tokens/keywords in each time span. The total number of keywords associated with all collected articles is 12,692. Nine percent (1,147) of the keywords fall in the time period 2002 to 2007, 43% (5,479) in the period 2008 to 2012, and 48% (6,066) in the period 2013 to 2016.

Figure 3: Research keywords distribution according to the three time spans For the same periods we decomposed the frequencies of a sample of words related to this study (as shown in Table 4). It is important to see the depth of the e-government topic based on the keyword distribution, where the first time period included fewer keywords related to the egovernment topic within the percentage witnessed for that period (less than 9%). This study is limited in its coverage of the first period for the list of titles selected; nonetheless, we can still make a rational conclusion based on the percentage of words compared to the percentage of total words listed in the three periods.

Table 4: Sample word distribution by the three time periods Term e-Government electronic government open government eGovernment digital government Government 2.0 eGov E-gov Gov 2.0 Online government We- government connected government WeGov

2002-2007 6 6 0 0 2 0 1 0 0 0 0 0 0

2008-2012 276 113 16 27 10 12 0 4 1 2 1 1 1

2013-2016 275 44 41 11 17 11 3 0 2 1 2 2 0

Total 557 163 57 38 29 23 4 4 3 3 3 3 1

11

This section presents the main research results related to the clustering model. As mentioned above, we limited the number of clusters in each time period to 10 and the number of concepts in each cluster to 10. We assigned a name to each cluster based on its unique association and the set of words within the cluster. This step provides meaningful and concise descriptions of the underlying concepts/keywords and distinguishes each cluster from the others. The results offer a clue to the concepts’ focus per cluster in each time span and thus helps in assessing future research directions. The set of clusters included some firm labels based on the uniqueness of the words associated with them. Still, the task required extra effort and careful inspection because of the existence of general words like government, management, electronic, etc. Such general terms serve the purpose of clustering, but also make the boundaries between clusters blurrier. The following three figures show the clusters estimated for the three time periods (Figures 4-6). The overlap between clusters represent a conceptual variance that the best tool in the world would miss; still, we based our logical discussion and conclusion on a solid list of words collected from the top known e-government journals.

Figure 4: The ten clusters for the period from 2002-2007

12

Figure 5: The ten clusters for the period from 2008-2012

Figure 6: The ten clusters for the period from 2013-2016

The results of the previous clustering step revealed a few continuous research directions, where the same terms continued to appear in the three periods of time. The following research directions existed in the three periods:  

Public administration-related research: such work is related to service provision, ICT utilization, the influence of e-government on bureaucracy, and other public sector managerial issues. Governance issues: With the growing interest in e-government and the global calls for adopting such practices, international governance was a hot topic in the first period and 13





continued in the second and third period with a greater focus on national and public sector constituents. Democratic and political issues: It is important to associate the web with a better image of political processes, where democracy, participation and voting were researched extensively in the three periods. This important research direction existed in 5 clusters in the three periods. Adoption of e-government: one of the popular topics in e-government is the focus on its adoption and diffusion and the many factors influencing the area. Research related to technology acceptance model (TAM), trust, usability, and many other factors are all indicators of the importance of this research direction.

On the other hand, research topics that started to be popular in later periods, particularly the last one (based on word frequencies), are related to open government, social media, smart cities, and analysis. Finally, it is important to see the subjectivity of analyzing such clusters, where looking into the keyword distribution opens doors for more interpretations. Another issue that might be considered here is the authors’ choices of keywords, which might not reflect the papers’ actual content. It is important to pay attention to the limitation of words at this stage. We built our argument on three periods in time and reported our results based on those. To avoid limitations from a small sample size, we can focus on the last two periods, although this will deprive our analysis of depth over time. On the other hand, spliting the data range into two periods only would cause the same limitation (based on the data shown in Figure 1, where a range from 2002-2010 would result in a much smaller total number of articles than that of the other period). 4.2.2 Word frequency distribution Based on the previous clustering results, a word frequency distribution was conducted per period and a manual clustering technique was applied. A manual clustering process was applied based on the previous work of Abu-Shanab and Abu-Baker (2014), where frequencies of all popular words within the clusters were estimated and then summated into dimensions (major concepts). Their work focused on mobile phone use and purchases, deploying mixed methods and a new methodology that can aid our purpose for this research. Their estimations were also based on three periods. Appendix A lists all keywords, and Table 4 below shows all summated dimensions. The data shown in Appendix A was generated from the clustering tool. The bars shown in the cells represnts the magnitude allocated to the frequency based on the total size of frequencies and compared to other terms. Such visualization aids the reader into a fast recognition of popular terms witnessed in data. We took the list of terms generated from the clustering process and manually summated them into a set of logical dimensions (as shown in Table 4, left column). We further summed the new set of dimensions into ten major dimensions. The word frequency distribution is a method proposed in previous research and depends on a small sample. This research clustered words and did the frequency based on text analysis. The next step tried to sum such keywords into more general dimensions. Research directions that are flourishing and interest researchers are better drawn this time with a line diagram across periods. Based on such analysis we can see the following trends. The first is a flourishing direction for some dimensions like transparency, open government, social media, accountability, privacy issues, trust, adoption, satisfaction, UTAUT, and TAM (including perceived usefulness + perceived ease of use). On the other hand, a declining interest in the diffusion of innovation theory, which might be surprising based on its close behavioral relation to the previous topics. Finally, the rest of topics listed in Table 4 indicated an initial 14

interest, but leveled or faded in the last period. Such topics (in the last category) might not have lost their importance, but compared to the period from 2008-2012, they showed a lower interest from researchers. It is important to see the logic behind our calssification system, where terms like accountability might drive some debate: is it a governance related issue or a public service related issue (administration)? Similarly, does voting belong only to election/democracy/participation or it might be utilized in a decision making domain (public sector performance or analytics). Such debates enrich the topic and builds some momentum into a theory of e-government.

5. CONCLUSIONS The major objective of this study was to investigate the research directions related to the egovernment topic. This study selected the top e-government journals and collected keywords from each article to be analyzed. The total number of journals surveyed was 11, the total number of articles inspected was 2018, and the total number of words collected was 12,692. This rich sample of keywords was analyzed to draw conclusions on the major directions in egovernment research. Table 4: Major dimensions and their corresponding frequencies.

15

Dimension

2002-2007 2008-2012 2013-2016

Security Issues

5

22

16

Privacy Issues

2

17

20

Trust

3

1

27

Adoption

3

16

41

Satisfaction

0

9

11

UTAUT

0

6

9

TAM (and PU + PEOU)

2

18

19

DoI

6

5

3

Usability Issues

6

39

27

Public Admin. + Procurement

7

31

25

Service Issues

6

30

17

Accessibility & DD

4

48

28

E-readiness

3

18

10

KM and information flows

4

27

24

E-voting Issues

17

48

15

E-democracy

19

79

41

E-participation

8

117

91

E-Governance Issues

21

106

76

Accountability

2

5

14

M-government issues

0

20

13

Transparency

2

17

39

Open Government Issues

0

10

120

Social media Issues

0

33

91

Trend Lines

General Dimensions Security and Privacy Issues

Technology Adoption Issues (Trust, usability, and satisfaction)

Public Service and Procurement

Digital divide and Accessibility Knowledge Management

E-democracy, voting and participation

E-governance and Accountability M-government

Open Government

Social Media

Results indicated that new trends in e-government are attracting researchers and journal editorial boards. Topics like open government, smart cities, and analytics are recently attracting more research. On the other hand, researchers continued to pursue areas like governance, egovernment adoption, e-participation, e-democracy, administration and procurement topics. Finally, researchers’ quest for a theory or framework to guide the e-government discipline faded after the first period. The e-government area is guided by IT theories (for technical issues), public administration theories (for managerial and administrative issues), political 16

science theories (for democracy and governance issues), and behavioral and psychology theories (for adoption research). A word distribution was conducted on the data and a trend analysis was done (refer to Table 4). The results pointed to 10 dimensions that govern research in e-government: security and privacy issues, technology adoption issues (trust, usability, and satisfaction), public service and procurement, the digital divide and accessibility, knowledge management, e-democracy, voting and participation, e-governance and accountability, m-government, open government, and social media. 5.1 Contributions This study utilized one of the largest samples of articles in the area of e-government. Previous research was limited by sample size based on its qualitative direction and piece-by-piece inspection. This study utilized 12692 terms published in e-government journals. The trends reported in this study describe the research history in e-government for 15 years and bring our attention to what areas are ascendant and what areas are fading. This study also conducted a quantitative analysis on the existing data, and is the first to do so (to the knowledge of authors) in the e-government area. This analysis avoids researchers’ bias when evaluating research and how they perceive the data collected (as qualitative methods and small samples are commonly used in research trends analysis). The data collected in this study are also available for researchers to inspect and cluster (which can be done manually with the data available in Appendix A). This open offering aims to allow researchers to build their own perspective of the e-government domain. Finally, this is a first attempt to inspect research directions in e-government using text mining. This type of study provides researchers with insights into new trends prevailing in the domain and areas that are not attracting much acceptance by journals. 5.2 Limitations This research is limited by its choice of journals for data collection. E-government research is not conclusively published within the list of journals used in this study. In addition, the availability of data within each time period is another limitation, as the first period included fewer observations (fewer articles) compared to the second and third periods. This study admits such a limitation but contributes an initial perspective of the topic. E-government research is also published in books and at conferences. This limitation calls for more comprehensive research or other research projects to cover such aspects. The second limitation of this study is our judgement on the dimensions built (manual clustering). This study depended on the researchers’ experience in e-government research and the literature review conducted. The dimensions built are important and set the stage for future theory in e-government. Abu-Shanab (2013) claims that the e-government domain is defined by four dimensions: service, public performance, democracy and participation, and inclusion and the digital divide. This classification might fall short when looking at the results of this study. We thus conclude that the fragmented nature of the e-government topic between more than one discipline and the collective topics that constitute it (ICT, political science, and public administration) prevent theory formulation. E-government will continue to follow other disciplines (that we mentioned previously) and benefit from their existing theories.

References 17

Abdelghaffar, H., Bakry, W., Duquenoy, P., 2005. E-Government: A New vision or success. In: Proceedings European and Mediterranean Conference on Information Systems (EMCIS 2005), Cairo, Egypt, 1-8. Abu-Shanab, E., 2013. Electronic Government, a Tool for Good Governance and Better Service. first ed., Dar Al-Ketab, Irbid, Jordan. Abu-Shanab, E., Bataineh, L., 2014. Challenges facing e-government projects: How to avoid failure? Int. J. Emerg. Sci., 4(4), 207-217. Abu-Shanab, E., Abu-Baker, A., 2014. Using and buying mobile phones in Jordan: Implications for future research and the development of new methodology. Technol. Soc. 38, 103-110. Abu-Shanab, E., Al-Azzam, A., 2012. Trust dimensions and the adoption of e-government in Jordan. Int. J. Inf. Commun. Technol. and Hum. Dev. 4(1), 39-51. Abu-Shanab, E., Al-Jamal, N., 2015. Exploring the gender digital divide in Jordan. Gend., Technol. and Dev. 19(1), 91-113. Aggarwal, C. C., Zhai, C., 2012. A Survey of Text Clustering Algorithms. In Aggarwal, C., Mining Text Data. first ed., Springer, Boston, USA. Al-Dalou’, R. Abu-Shanab, E., 2013. E-Participation levels and technologies. The 6th International Conference on Information Technology (ICIT 2013), Amman, Jordan,1-8. Al-Naimat, A., Abdullah, M., Osman, W., Ahmad, F., 2011. E-government implementation problems in developing countries. 2nd World Conference on Information Technology (WCIT2011), Antalya, Turkey, 1-6. Alshehri M., Drew S., Alfarraj O., 2012. A Comprehensive analysis of e-government services adoption in Saudi Arabia: Obstacles and Challenges. Int. J. Adv. Comput. Sci. and Appl. 3(2), 2012, 1-6. Andersen, K. V., Henriksen, H., 2005. The first leg of e-government research: Domains and application areas 1998-2003. Int. J. Electron. Gov. Res. 1(14), 26-44. Bansode, S., Patil, S., 2011. Bridging digital divide in India: Some initiatives. Asia Pac. J. Libr. and Inf. Sci. 1(1), 58-68. Basu, S., 2004. E-government and developing countries: An overview. Int. Rev. Law, Comput. & Technol. 18(1), 109-132. Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y., 2001. On feature distributional clustering for text categorization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, 146153. 18

Belanger, F., Carter, L., 2012. Digitizing government interactions with constituents: An historical review of e-government research in information systems. J. Assoc. for Inf. Syst. 13 (3), 363-394. Bentley, R. A., 2008. Random drift versus selection in academic vocabulary: An evolutionary analysis of published keywords. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2518107/ (accessed 13 October 2018) Bhatnager S., 2004. E-government From Vision to Implementation, first ed. Sage Publications, Delhi, India. Bird, S., Klein, E., Loper, E., 2009. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit, first ed. O'Reilly Media, Inc. USA. Bryson, J., Berry, F., Yang, K., 2010. The state of public strategic management research: A selective literature review and set of future directions. The Am. Rev. Public Adm. 40(5), 495521. Cielen, D., Meysman, A., Ali, M., 2016. Introducing Data science: Big Data, Machine Learning, and More, using Python Tools, first ed. Manning Publications Co., USA. Cook, M., LaVigne, M., Pagano, C., Dawes, S., Pardo, T., 2002. Making a case for local egovernment. Publication of the Center for Technology in Government. http://www.ctg.albany.edu/publications/guides/making_a_case/making_a_case.pdf (accessed 11 September 2018) Datar, M., Panikar, A., Farooqi, J., 2009. Emerging trends in e-government. Part of Critical Thinking in E-governance. https://www.csi-sigegov.org/critical_pdf/4_37-46.pdf (accessed 13 September 2018). Dwivedi, Y. K., 2009. An analysis of e-government research published in Transforming Government: People, Process and Policy (TGPPP). Transforming Gov. People, Process and Policy 3(1), 7-15. Fountain, J. E., 2001. Building the virtual state: Information technology and institutional change. Washington, DC: Brookings Institution Press. Fountain, J., 2003. Information, institutions and governance: Advancing a basic social science research program for digital government. National Center for Digital Government, University of Massachusetts Amherst, Massachusetts, USA. Garson, G. D., 1999. Information systems, politics, and government: Leading theoretical perspectives. In G. D. Garson (Ed.), Handbook of Public Information Systems, first ed. New York, USA. Gatautis, R., 2010. Creating public value through e-participation: Wave project. Econ. and Manag. 15, 483-490.

19

Haddi, E., Liu, X., Shi, Y., 2013. The role of text pre-processing in sentiment analysis. Procedia Comput. Sci. 17(2013), 26-32. Halchin, L. E., 2004. Electronic government: Government capability and terrorist resource. Gov. Inf. Q. 21(4), 406−419. Hardy, C. Williams, S., 2011. Assembling e-government research designs: A transdisciplinary view and interactive approach. Public Adm. Rev. 73(3), 405-413. Heeks, R., Bailur, S., 2007. Analyzing e-government research: Perspectives, philosophies, theories, methods, and practice. Gov. Inf. Q. 24(2), 243-265. Helbig, N., Gil-García, R., Ferro, E., 2009. Understanding the complexity of electronic government: Implications from the digital divide literature. Gov. Inf. Q. 26(1), 89–97. Jain, A. K., 2010. Data clustering: 50 years beyond K-means. Pattern Recognit. Lett. 31(8), 651-666. Janssen, derDuin, Wagenear, Dawes, Bicking, Wimmer, and Petrauskas, 2007. Scenario building for e-government in 2020: Consolidating the results from regional workshops, Proceedings of the 40th Hawaii International Conference on System Sciences, Hawaii, USA. 109-115. Joseph, R., 2010. Perspectives on research methods used in e-government. 2010 Northeast Decision Sciences Institute Conference, Alexandria, USA, 622-627. Madsen, C., Berger, J., Phythian, M., 2014. The development in leading e-government articles 2001-2010: Definitions, perspectives, scope, research philosophies, methods and recommendations: An update of Heeks and Bailur. 13th IFIP WG 8.5 International Conference, EGOV 2014, Dublin, Ireland, 17–34. Medaglia, R. (2012). eParticipation research: Moving characterization forward (2006–2011). Gov. Inf. Q. 29(3), 346-360. Niehaves, B., 2011. Iceberg ahead: On electronic government research and societal aging. Gov. Inf. Q. 28(3), 310-319. Pham, D. T., Dimov, S. S., Nguyen, C.D., 2005. Selection of K in K-means clustering. Proceedings of the Institution of Mechanical Engineers, Part C. J. Mech. Eng. Sci. 219(1), 103119. PITAC, 1999. President's information technology advisory committee. Report to the President, February 24, 1999. http://www.ccic.gov/ac/report/ (accessed in 13 October 2018) Rodríguez, M., Alcaide, L., López, A., 2010. Trends of e-government research. Contextualization and Research Opportunities. Int. J. of Digit. Account. Res. 10(4), 87-111. Salton, G., Wong, A., Yang, C. S., 1975. A vector space model for automatic indexing. Commun. of ACM. 18(11), 613-620. 20

Tambouris, E., Liotas, N., Tarabanis, K., 2007. A framework for assessing e-participation projects and tools. 40th Hawaii International Conference on System Sciences, Hawaii, USA, 90-96. UNDESA, 2008. UN e-government survey 2008, from e-government to connected governance. Publication of the United Nations Department for Social & Economic Affairs. https://publicadministration.un.org/egovkb/portals/egovkb/Documents/un/2008Survey/unpan028607.pdf (accessed 1 September 2018) Wang, L. H., Wang, Q., Zhang, X., Cai, W., Sun, X., 2013. A bibliometric analysis of anaerobic digestion for methane research during the period 1994–2011. J. Mater. Cycles Waste Manag. 15(1), 1-8. Wimmer, M., 2007. Reflections on the Egovrtd2020 roadmap for e-Government research, Proceedings of ICEGOV 2007, Macao, 417–426. Yildiz, M., 2007. E-government research: Reviewing the literature, limitations, and ways forward, Gov. Inf. Q. 24(3), 646-665. Yildiz, M., 2012. Big questions of e-government research. Inf. Polity 17 (3/4), 343-355. Yildiz, M., Saylam, A., 2013. E-government discourses: An inductive analysis. Gov. Inf. Q. 30(2), 141-153.

21

Appendix A: Frequency of terms in the three time periods Term Security privacy Information security privacy protection Data protection Fraud detection information privacy Trust Adoption e-Government adoption Technology adoption TAM perceived ease of use perceived usefulness Diffusion Diffusion of innovations Technology acceptance model user satisfaction UTAUT Customer Satisfaction Citizen satisfaction Usability Design website evaluation evaluation

2002-2007

2008-2012

2013-2017

4 2 0 0 1 0 0 3 3 0 0 0 0 0 4 2 2 0 0 0 0 1 0 0 2

16 13 3 4 1 2 0 1 8 3 5 4 3 3 3 2 8 3 6 4 2 11 4 6 10

12 15 1 3 1 2 2 27 23 10 8 5 3 3 1 2 8 5 9 3 3 5 4 0 11

interoperability

3

8

7

public administration

3

23

16

administration

1

0

1

e-procurement

3

8

8

e-Services

2

18

5

Public Services

1

1

5

Electronic services

1

1

2

government services

0

3

4

e-government services

1

6

0

Electronic government services

1

1

1

Accessibility

1

10

3

Access to information

0

3

2

Internet access

1

3

3

Web accessibility

0

1

5

Access

1

3

1

Digital divide

1

28

14

e-readiness

2

8

4

readiness

1

2

2

e-government readiness

0

6

1

e-Government readiness index

0

2

3

22

Appendix A (continued): Term

2013-2017

2008-2012

2002-2007

mobile government

0

6

5

m-government

0

7

7

mobile services

0

2

1

0 0 0 0 0 2 0 0 1 0 0 0 1 1 7 0 4 1 3 1 16 1 2 0 2 0 0 3 2 1 0 11 3 7 2 2 0 0 0 0 0 0

5 6 3 21 3 5 8 3 2 3 3 3 0 7 15 5 6 3 12 0 35 16 23 5 38 12 15 6 23 17 6 34 34 38 5 17 1 6 0 2 1 6

0 10 4 72 5 3 9 0 2 8 2 0 0 2 2 5 2 2 2 0 24 4 11 2 45 6 6 8 9 12 5 31 31 14 14 39 41 3 11 8 13 44

mobile phones social networks social networking social media social network analysis Knowledge Management information sharing knowledge sharing Information retrieval information quality information exchange information visualization knowledge representation online voting e-voting voting elections internet voting electronic voting distance voting e-Democracy Electronic democracy democracy Digital Democracy e-Participation online participation Electronic participation eparticipation citizen participation Participation Public participation Governance e- governance electronic governance Accountability transparency Open data data mining Big data cloud computing Open government data open government

23

Highlights:     

Explored research status in e-government area and tried to investigate research direction. Used the reported keywords for a large selected set of journals. Used text mining based review of 2018 articles collected from 11 journals, and 12692 keywords. Results indicated that new topics like open government, smart cities, and analytics are recently attracting more research. Researchers continued to pursue areas like governance, e-government adoption, eparticipation, e-democracy, administration and procurement topics.

24