NeoDataset - Um conjunto de dados com User Stories e Story Points

Autores/as

DOI:

https://doi.org/10.51359/2317-0115.2024.265431

Palabras clave:

user story, story points, conjunto de dados, agil, linguagem natural

Resumen

As equipes geralmente utilizam ferramentas de gerenciamento para acompanhar as User Stories, controlar o seu código-fonte, registrar suas estimativas de esforço e os responsáveis. Essas ferramentas registram dados que podem ser utilizados em diversas pesquisas. Por outro lado, é desafiador encontrar dados para pesquisas, pois as empresas privadas são relutantes em compartilhá-los. O objetivo deste artigo é apresentar um conjunto de dados contendo dados brutos de 33 Projetos de Software Ágil de código aberto, minerados do GitLab, totalizando 122.627 Story Points e 20.474 User Stories. Disponibilizamos esses dados publicamente para facilitar o seu uso pela comunidade científica. Acreditamos que esse conjunto pode ser utilizado em várias linhas de pesquisa de engenharia de software, incluindo classificação e vetorização de texto, aprendizagem de máquina, estimativa de esforço e priorização de tarefas.

Citas

BECK, K. Extreme Programming Explained: Embrace Change. 1ª ed. Boston: Addison-Wesley, 2001.

CHAPARRO, O., LU, J., ZAMPETTI, F., MORENO, L., DI PENTA, M., MARCUS, A., BAVOTA, G., AND NG, V. Detecting missing information in bug descriptions. Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering Part F130154 (2017), 396–407.

CHOETKIERTIKUL, M., DAM, H. K., TRAN, T., GHOSE, A., AND GRUNDY, J. Predicting Delivery Capability in Iterative Software Development. IEEE Transactions on Software Engineering 44, 6 (2018), 551–573.

CHOU, P., CROWSTON, K., DAHLANDER, L., MINERVINI, M. S., AND RAGHURAM, S. GitLab: work where you want, when you want. Journal of Organization Design 9, 1 (2020).

COHN, M. User Stories Applied: For Agile Software Development. 1ª ed. Boston: Addison-Wesley, 2005.

DALPIAZ, F. 2018. Disponível em: https://data.mendeley.com/datasets/7zbk8zsd8y.

DIMÍTRIJEVIC, S., JOVANOVIC, J., AND DEVEDZIC, V. A comparative study of software tools for user story management. Information and Software Technology 57 (2015), 352–368.

DRAGICEVIC, S., CELAR, S., AND TURIC, M. Bayesian network model for task effort estimation in agile software development. Journal of Systems and Software 127 (2017), 109–119.

DYBÅ, T., AND DINGSØYR, T. Empirical studies of agile software development: A systematic review. Information and Software Technology 50, 9-10 (2008), 833–859.

GAVIDIA-CALDERON, C., SARRO, F., HARMAN, M., AND BARR, E. T. The Assessor’s Dilemma: Improving Bug Repair via Empirical Game Theory. IEEE Transactions on Software Engineering 47, 10 (2021), 2143–2161.

HARDT, M.; NARAYANAN, A. Data and Society: A Critical Introduction. Cambridge: Cambridge University Press, 2019.

HUANG, Y., WANG, J., WANG, S., LIU, Z., WANG, D., AND WANG, Q. Characterizing and predicting good first issues. International Symposium on Empirical Software Engineering and Measurement (2021).

JADHAV, D., KUNDALE, J., BHAGWAT, S., AND JOSHI, J. A Systematic Review of the Tools and Techniques in Distributed Agile Software Development. Agile Software Development: Trends, Challenges and Applications (2023), 161–186.

JIMÉNEZ, S., ALANIS, A., BELTRÁN, C., JUÁREZ-RAMÍREZ, R., RAMÍREZ-NORIEGA, A., AND TONA, C. Usqa: A user story quality analyzer prototype for supporting software engineering students. Computer Applications in Engineering Education (2023).

JUST, R., JALALI, D., AND ERNST, M. D. Defects4J: A database of existing faults to enable controlled testing studies for Java programs. 2014 International Symposium on Software Testing and Analysis, ISSTA 2014 - Proceedings (2014), 437–440.

KIM, D.; PARK, S. Structured versus unstructured data: Implications for data sharing among corporations. Data Science Review, v. 5, n. 2, p. 125-138, 2020.

KONONOV, Dmitriy. ICT for my work: The GitLab – is it a more powerful alternative to GitHub?. 2018. Disponível em: https://kononovdm.github.io/2018/11/GitLab-review/.

MAYER-SCHÖNBERGER, V.; CUKIER, K. Big Data: A Revolution That Will Transform How We Live, Work, and Think. New York: Houghton Mifflin Harcourt, 2013.

MANI, S., SANKARAN, A., AND ARALIKATTE, R. Deeptriage: Exploring the effectiveness of deep learning for bug triaging. ACM International Conference Proceeding Series (2019), 171–179.

MERGEL, I. Agile innovation management in government: A research agenda. Government Information Quarterly 33, 3 (2016), 516–523.

MOUNTAIN SOFTWARE, G., 2004. Disponível em https://www.mountaingoatsoftware.com/uploads/documents/example-user-stories.pdf.

MURPHY, Chris et al. GitLab CI/CD and DevOps: Reference Design and Implementation Guide for GitLab Pipelines. 2020. Disponível em: https://docs.gitlab.com/ee/ci/examples/.

NEO, Giseldo et al. User Story Tutor (UST) to Support Agile Software Developers. In: CSEDU (2). 2024. p. 51-62.

ORTU, M., DESTEFANIS, G., ADAMS, B., MURGIA, A., MARCHESI, M., AND TONELLI, R. The JIRA repository dataset: Understanding social aspects of software development. ACM International Conference Proceeding Series 2015-October (2015).

ORTU, M., MURGIA, A., DESTEFANIS, G., TOURANI, P., TONELLI, R., MARCHESI, M., AND ADAMS, B. The emotional side of software developers in JIRA. Proceedings - 13th Working Conference on Mining Software Repositories, MSR 2016 (2016), 480–483.

PANDAS Documentation. IO Tools (Text, CSV, HDF5, …). Disponível em: https://pandas.pydata.org/pandas-docs/stable/reference/io.html.

PMI. Success Rates Rise - 2017 9th Global Project Management Survey. Tech. rep., PMI, 2017.

PORRU, S., MURGIA, A., DEMEYER, S., MARCHESI, M., AND TONELLI, R. Estimating story points from issue reports. ACM International Conference Proceeding Series (2016).

RIGBY, D. K., SUTHERLAND, J., AND NOBLE, A. Agile Scale: How to go from teams to hundreds. Havard Business Review May-June, June (2018), 1–3.

SABBAGH, R. Scrum: Gestão ágil para projetos de sucesso. Editora Casa do Código, 2014.

SCHWABER, K.; SUTHERLAND, J. The Scrum Guide. 2020. Disponível em: https://scrumguides.org.

SMITH, J. Privacy concerns in the sharing of corporate data for research purposes. Corporate Data Journal, v. 8, n. 1, p. 112-126, 2017.

SOARES, R. G. Effort Estimation via Text Classification And Autoencoders. In 2018 International Joint Conference on Neural Networks (IJCNN) (2018), vol. July, IEEE, pp. 1–8.

SUTHERLAND, J. SCRUM: A arte de fazer o dobro de trabalho na metade do tempo. Leya, 2014.

TAWOSI, V., AL-SUBAIHIN, A., MOUSSA, R., AND SARRO, F. Agile effort estimation: Have we solved the problem yet? insights from a replication study. IEEE Transactions on Software Engineering 49, 4 (2022), 2677–2697.

TAWOSI, V., AL-SUBAIHIN, A., MOUSSA, R., AND SARRO, F. A Versatile Dataset of Agile Open Source Software Projects. In Proceedings - 2022 Mining Software Repositories Conference, MSR 2022 (2022), pp. 707–711.

TAWOSI, V., SARRO, F., PETROZZIELLO, A., AND HARMAN, M. Multi-Objective Software Effort Estimation: A Replication Study. IEEE Transactions on Software Engineering 48, 8 (2022), 3185–3205.

TAWOSI, V., MOUSSA, R., AND SARRO, F. Investigating the Effectiveness of Clustering for Story Point Estimation. In Proceedings - 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2022 (2022), pp. 827–838.

TOMASSI, D. A., DMEIRI, N., WANG, Y., BHOWMICK, A., LIU, Y. C., DEVANBU,P. T., VASILESCU, B., AND RUBIO-GONZALEZ, C. BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and Fixes. Proceedings - International Conference on Software Engineering 2019-May (2019), 339–349.

TRIMBLE, J., SHIRLEY, M. H., AND HOBART, S. G. Agile: From software to mission system. 14th International Conference on Space Operations, 2016 (2016), 1–8.

UMER, Q., LIU, H., AND ILLAHI, I. CNN-Based Automatic Prioritization of Bug Reports. IEEE Transactions on Reliability 69, 4 (2020), 1341–1354.

VALDEZ, A., OKTABA, H., GOMEZ, H., AND VIZCAINO, A. Sentiment analysis in jira software repositories. Proceedings - 2020 8th Edition of the International Conference in Software Engineering Research and Innovation, CONISOFT 2020 (2020), 254–259.

VENKATRAMAN, K.; AKASHVARMA, M.; SIDDHARTH, S. Enhancing Software Test Effort Estimation using Ensemble Learning Algorithms. In: 2023 4th International Conference on Intelligent Technologies (CONIT). IEEE, 2024. p. 1-5.

WIKIPEDIA. CSV. Disponível em: https://pt.wikipedia.org/wiki/Comma-separated_values.

Publicado

2025-03-20