Abstract
Fibonacci numbers appear in numerous engineering and computing applications including population growth models, software engineering, task management, and data structure analysis. This mandates a computationally efficient way for generating a long sequence of successive Fibonacci integers. With the advent of GPU computing and the associated specialized tools, this task is greatly facilitated by harnessing the potential of parallel computing. This work presents two alternative parallel Fibonacci generators implemented in TensorFlow, one based on the well-known recurrence equation generating the Fibonacci sequence and one expressed on inherent linear algebraic properties of Fibonacci numbers. Additionally, the question of using lookup tables in conjunction with spline interpolation or direct computation within a parallel context for the computation of the powers of known quantities is explored. Although both parallel generators outperform the baseline serial implementation in terms of wallclock time and FLOPS, there is no clear winner between them as the results rely on the number of integers generated. Additionally, replacing computations with a lookup table degrades performance, which can be attributed to the frequent access to the shared memory.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Fibonacci sequence
- Linear recurrence equations
- Finite differences
- Google TensorFlow
- GPU computing
- Parallel computing
1 Introduction
The sequence of Fibonacci integers \({\left\langle {f_k}\right\rangle }\) appears often in a broad spectrum of engineering applications including coding theory, cryptography, simulation, and software management. Additionally, Fibonacci numbers are very closely tied to the golden ratio \(\varphi \) which is frequently encountered in nature, such as in population growth models and in botanics. Moreover, in architecture \(\varphi \) is almost considered synonymous to harmony. Thus, Fibonacci integers are arguably among the most significant sequences. Although their defining linear recurrence equation is simple, serially generating a long sequence of consecutive Fibonacci integers is by no means a trivial task.
However, with the advent of GPU computing, the efficient parallel generation of \({\left\langle {f_k}\right\rangle }\) has been rendered feasible. Indeed, by exploiting known properties of the Fibonacci integers it is possible to build parallel generators which exploit the underlying hardware potential to a great extent, achieving low response times. This requires specialized scientific software such as TensorFlow which not only contains very efficient libraries, but also facilitates the development of high quality custom source code.
The primary research contribution of this work is twofold. First, it lays the groundwork for two parallel Fibonacci integer generators, one based on a closed form for each number in the sequence and one based on certain linear algebraic properties of Fibonacci integer pairs. The performance of the two proposed generators developed in TensorFlow for Python is evaluated in terms of both total turnaround time and FLOPS against a serial implementation with the same software tools. Second, it explores the question whether it is worth substituing the computation of known quantities with a lookup table.
This conference paper is structured as follows. Section 2 reviews current scientific literature regarding the computational aspects of Fibonacci numbers. Their fundamental properties are described in Sect. 3. The TensorFlow implementation is presented in Sect. 4, whereas future research directions are outlined in Sect. 5. Table 1 summarizes the notation of this work. Concerning notation, matrices and vectors are depicted with boldface uppercase and boldface lowercase respectively, whereas roman lowercase is reserved for scalars.
2 Previous Work
The Fibonacci sequence of integers, examined among others in [6] and [30], has perhaps the most applications not only in computer science and in engineering but in science as a whole. Closed forms for the spectral norms of circulant matrics whose entries are either Fibonacci or Lucas integers are derived in [21]. In data structure analysis the Fibonacci heap [19] and the associated pairing heap [18] have efficient search and insertion operations with numerous applications such as network optimization. A pair of successive Fibonacci numbers are known to be the worst case in Euclidean integer division algorithm as shown in [9] as well as in [29]. Fibonacci numbers play a central role in estimating task duration and, consequently, task difficulty in scrum based software engineering methodologies [27, 28], including inaccurate estimation discovery [25] and using agile methodologies to predict student progress [26]. Moreover, Fibonacci numbers are very closely linked to the golden ratio \(\varphi \)Footnote 1 as well as to symmetry of many geometric shapes, the latter having important implications in group theory [16]. Many identities in combinatorics regarding Fibonacci numbers can be found in [31] as well as in the most recent works [5] and [23]. Finally, the Lucas sequence \({\left\langle {\ell _k}\right\rangle }\) is closely associated with the Fibonacci sequence \({\left\langle {f_k}\right\rangle }\) since the two integer sequences constitute a Lucas complementary pair and share similar properties such as growth rate and generation mechanism [3, 4].
TensorFlow, originally developed by Google for massive brain circuit simulation, is an open source computational framework whose algorithmic cornerstone is the dataflow paradigm as described in [1, 2], or [20]. A library for generating Gaussian processes in TensorFlow is presented in [24]. For a genetic algorithm implementation in TensorFlow for heuristically discovering community structure, a problem examined in [10], in large multilayer graphs, such as those presented in [12] and in [11], see [15]. In [14] the ways insurance and digital health markets can benefit from blockchain and GPU computing are explored. For a path and triangle based graph resilience metric in TensorFlow see [13]. A very popular front end for the low level TensorFlow is keras, which allows the easy manipulation of neural network layers, including connecticity patterns and activation functions [8]. Model training and prediction generation is done also easily in keras in four stages [22]. Convolutional kernels whose lengths depend on their relative location inside the neural network architecture for computational vision purposes implemented in keras are introduced [7].
3 Fibonacci Numbers
The n-th integer in the Fibonacci sequence \({\left\langle {f_n}\right\rangle }\) is defined as:
Theorem 1
The n-th Fibonacci number \(f_n\) has the closed form:
Proof
The characteristic polynomial of (1) is:
And its two real and distinct roots are:
Therefore, it follows that:
The constants \(\alpha _0\) and \(\alpha _1\) are computed using the initial conditions derived by the first two Fibonacci numbers as follows:
The above conditions yield:
\(\square \)
Another way to prove Theorem 1 is the following:
Proof
Another way to directly compute the n-th Fibonacci number \(f_n\) is to rewrite the Fibonacci definition of Eq. (1) and the identity \(f_n \,=\, f_n\) combined in matrix-vector format as follows:
The eigenvalues \(\lambda _1\) and \(\lambda _2\) (recall that \({{\,\mathrm{det}\,}}\left( {{\mathbf {A}}}\right) \,=\, \lambda _1\lambda _2\) and that \({{\,\mathrm{tr}\,}}\left( {{\mathbf {A}}}\right) \,=\, \lambda _1 + \lambda _2\)) and the corresponding eigenvectors \({\mathbf {e}}_1\) and \({\mathbf {e}}_2\) are:
Notice that \(\left\| {{\mathbf {e}}_1}\right\| _2 \,=\, 1\) and \(\left\| {{\mathbf {e}}_2}\right\| _2 \,=\, 1\) and, additionally, \({\mathbf {e}}^T_1{\mathbf {e}}_2 \,=\, 0\). From the spectral decomposition of \({\mathbf {A}}\) it follows that:
Observe that the first element of vector \({\mathbf {f}}_{n-1}\) is \(f_n\) and it is equal to:
Finally, the structure of \({\mathbf {A}}^{\!n}\) can be shown by induction to be:
\(\square \)
Despite form (5), Fibonacci numbers as evident by the initial conditions and their generation mechanism. Another way to see this is the following theorem:
Theorem 2
Fibonacci numbers are integers.
Proof
Applying the Newton binomial theorem to (5) yields:
When \(k \,\equiv \, 0 \pmod {2}\) then \(\gamma _k\) is zero, whereas if \(k \,\equiv \, 1 \pmod {2}\) then \(\gamma _k\) equals \(2 \cdot 5^{\frac{k+1}{2}}\). Therefore:
\(\square \)
4 TensorFlow Implementation
As stated earlier, TensorFlow relies on the dataflow algorithmic paradigm, which essentially utilizes a potentially large operations graph in order to break down the desired computational task to smaller manageable components. Dataflow graphs have the following properties:
-
Vertices represent a wide array of mathematical operations including advanced ones such as eingenvector computation, singular value decomposition, and least squares fitting.
-
Edges describe the directed data flow between operation results.
-
The operands between the various graph operations are tensors of aritrary dimensions as long as they are compatible.
TensorFlow r1.12 was installed to Ubuntu 18.04 LTS for Python 3.6 using the pip package installer. An NVIDIA Titan Xp GPU based on Pascal architecture was available in the system and was successfully discovered by TensorFlow as gpu0.
Three Fibonacci generators were implemented in total. Each such generator yields a batch consisting of the first n consecutive Fibonacci integers. In the experiments, n ranged from 2 to 1024 with an exponentially increasing distance between two successive batch sizes. Since the particular GPU has 3840 CUDA cores, the parallelism potential is high. The generators are:
-
A serial implementation which consists of a single loop which adds one new Fibonacci number with each pass.
-
A parallel implementation which directly computes the k-th element of the sequence \({\left\langle {f_k}\right\rangle }\) based on Eq. (11).
-
A second implementation which relies on the slightly simpler closed expression of Eq. (2).
Figure 1 shows the total number of floating point operations which were required for each batch size. The values are the arithmetic mean of ten executions for each batch size. Preceding each such execution there was a trial run not taken into consideration in the final results which served the single purpose of loading the data into system cache. Since the serial implementation is a loop, the number of additions is linear in terms of n. On the contrary, the parallel implementations require a number of auxiliary floating point operations, most notably the exponentiation of certai parameters. Thus, they require more operations whose number is a polynomical function, approximately quadratic, of batch size, with the second generator clearly always being more expensive in terms of operations.
However, the fact that a generator requires more floating point operations does not necessarily makes it slower in terms of total execution time. Instead, the results shown in Fig. 2 indicate that both parallel generators achieve considerably lower wallclock execution time in milliseconds. Since the computation of \({\left\langle {f_k}\right\rangle }\) is GPU-bound process and the design of both parallel generators entail very low communication across the memory hierarchy, ordinary wallclock time in this case consists almost entirely of time spent to actual computations.
Notice that, unlike the previous figure, no parallel implementation appears to be ideal for every batch size. Specifically, the second implementation is better for lower sized batches, whereas the first one becomes more preferable as batch size grows despite its more complex formula. This can be attributed to the fact that the second generator achieves more locality as certain parameters are common across each batch size.
This difference in performance can be also seen by dividing the number of floating point operations to the wallclock time, yielding an approximation of the FLOPS for each generator as seen in Fig. 3. Although by no means a single absolute benchmark, especially within a parallel computation context, FLOPS are in this case indicative since the algorithmic core is purely computational. The difference from the serial implementation is now obvious, as is also the fact that the second generatorfor large batches performs better.
Notice that in both parallel implementations appear many consecutive powers of known quantities. In order to save floating point operations, a lookup table could have been used according the design principles found for instance in [17]. In order to evaluate the impact of relying on a lookup table to the total FLOPS for the two parallel generators, two variants of each were also implemented. The first version used locally half the known quantities required, whereas the second only one quarter of them. In both cases, and in order to achieve comparable result accuracy for fairness reasons, spline interpolation was used. As it can be seen from Fig. 4, the introduction of a lookup table for both generators downgraded their FLOPS counter. This can be explained from the facts that TensorFlow has an efficient multipication algorithm especially for large numbers and that frequently accessesing the shared GPU memory eventually slowed computations down.
5 Conclusions and Future Work
This conference paper presented two parallel Fibonacci integer generators for TensorFlow running over Python 3.6 and an NVIDIA Titan Xp GPU and discussed certain implementation aspects. Both generators yield batches of n integers, with n ranging from 2 to 1024. The maximum batch size is smaller than the number of cores in the GPU, increasing thus parallel potential. The baseline was a serial implementation for TensorFlow consisting of a single loop generating one new Fibonacci integer in each pass.
The primary finding of the experiments is the superior performance of parallel generators. Although requiring more floating point operations in total, both parallel implementations outperform the baseline in terms of wallclock time and of FLOPS. This is attributed to the efficient use of parallelism. A secondary finding was that replacing actual computations with frequent accesses to a shared lookup table led to lower FLOPS values. This can be explained by the latency caused by a large number of threads asking for the same information.
Concerning future research directions, there is a number of options which can be followed. Experiments with larger batch sizes should be conducted, especially with sizes which exceed the number of GPU cores. Additionally, more algorithmic schemes should be tested, such as those constructing Fibonacci integers bitwise, as they may led in generators with higher parallelism. Finally, the performance of the proposed algorithms to other parallel architectures can be evaluated, in order to understand whether a given hardware architecture is more appropriate for particular algorithmic principles.
Notes
- 1.
OEIS sequence A001622.
References
Abadi, M.: TensorFlow: learning functions at scale. ACM SIGPLAN Not. 51(9) (2016)
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)
Akbary, A., Wang, Q.: A generalized Lucas sequence and permutation binomials. Proc. Am. Math. Soc. 134(1), 15–22 (2006)
Bilgici, G.: Two generalizations of Lucas sequence. Appl. Math. Comput. 245, 526–538 (2014)
Bolat, C., Köse, H.: On the properties of k-Fibonacci numbers. Int. J. Contemp. Math. Sci. 5(22), 1097–1105 (2010)
Capocelli, R.M., Cerbone, G., Cull, P., Holloway, J.L.: Fibonacci facts and formulas. In: Capocelli, R.M. (ed.) Sequences, pp. 123–137. Springer, New York (1990). https://doi.org/10.1007/978-1-4612-3352-7_9
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: CVPR, pp. 1251–1258 (2017)
Chollet, F., et al.: Keras: deep learning library for theano and TensorFlow (2015)
Dixon, J.D.: The number of steps in the Euclidean algorithm. J. Number Theory 2(4), 414–422 (1970)
Drakopoulos, G., Gourgaris, P., Kanavos, A.: Graph communities in Neo4j: four algorithms at work. Evolving Syst. (2018). https://doi.org/10.1007/s12530-018-9244-x
Drakopoulos, G., Kanavos, A., Karydis, I., Sioutas, S., Vrahatis, A.G.: Tensor-based semantically-aware topic clustering of biomedical documents. Computation 5(3) (2017)
Drakopoulos, G., Kanavos, A., Tsolis, D., Mylonas, P., Sioutas, S.: Towards a framework for tensor ontologies over Neo4j: representations and operations. In: IISA, August 2017
Drakopoulos, G., Liapakis, X., Tzimas, G., Mylonas, P.: A graph resilience metric based on paths: higher order analytics with GPU. In: ICTAI. IEEE, November 2018
Drakopoulos, G., Marountas, M., Liapakis, X., Tzimas, G., Mylonas, P., Sioutas, S.: Blockchain for mobile health applications: acceleration with GPU computing. In: Vlamos, P. (ed.) GeNeDis 2018. Springer, Heidelberg (2018)
Drakopoulos, G., et al.: A genetic algorithm for spatiosocial tensor clustering: exploiting TensorFlow potential. Evolving Syst. (2019). https://doi.org/10.1007/s12530-019-09274-9
Dunlap, R.A.: The Golden Ratio and Fibonacci Numbers. World Scientific (1997)
Fateman, R.J.: Lookup tables, recurrences and complexity. In: International Symposium on Symbolic and Algebraic Computation, pp. 68–73. ACM (1989)
Fredman, M.L., Sedgewick, R., Sleator, D.D., Tarjan, R.E.: The pairing heap: a new form of self-adjusting heap. Algorithmica 1(1–4), 111–129 (1986)
Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34(3), 596–615 (1987)
Goldsborough, P.: A tour of TensorFlow. arXiv preprint arXiv:1610.01178 (2016)
İpek, A.: On the spectral norms of circulant matrices with classical Fibonacci and Lucas numbers entries. Appl. Math. Comput. 217(12), 6011–6012 (2011)
Ketkar, N.: Introduction to keras. In: Ketkar, N. (ed.) Deep Learning with Python, pp. 97–111. Springer, Heidelberg (2017)
Koshy, T.: Fibonacci and Lucas Numbers with Applications. Wiley, Hoboken (2019)
Matthews, D.G., et al.: GPflow: a Gaussian process library using TensorFlow. J. Mach. Learn. Res. 18(1), 1299–1304 (2017)
Raith, F., Richter, I., Lindermeier, R., Klinker, G.: Identification of inaccurate effort estimates in agile software development. In: APSEC, vol. 2, pp. 67–72. IEEE (2013)
Rodríguez, G., Soria, Á., Campo, M.: Measuring the impact of agile coaching on students’ performance. Trans. Educ. 59(3), 202–209 (2016)
Rubin, K.S.: Essential Scrum: A Practical Guide to the Most Popular Agile Process. Addison-Wesley, Boston (2012)
Schwaber, K., Sutherland, J.: The scrum guide. Scrum Alliance 21 (2011)
Sorenson, J.: An analysis of Lehmer’s Euclidean GCD algorithm. In: International Symposium on Symbolic and Algebraic Computation, pp. 254–258. ACM (1995)
Takahashi, D.: A fast algorithm for computing large Fibonacci numbers. Inf. Process. Lett. 75(6), 243–246 (2000)
Zhang, W.: Some identities involving the Fibonacci numbers. Fibonacci Q. 35, 225–228 (1997)
Acknowledgements
This research was conducted within the project “Development of technologies and methods for cultural inventory data (ANTIKLIA)” under the EPAnEK 2014–2020 Operational Programme Competitiveness, Entrepreneurship, Innovation.
Additionally, this conference paper is part of Project 451, a long term research initiative whose primary objective is the development of novel, scalable, numerically stable, and interpretable tensor analytics.
Moreover, the authors gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 IFIP International Federation for Information Processing
About this paper
Cite this paper
Drakopoulos, G., Liapakis, X., Spyrou, E., Tzimas, G., Mylonas, P., Sioutas, S. (2019). Computing Long Sequences of Consecutive Fibonacci Integers with TensorFlow. In: MacIntyre, J., Maglogiannis, I., Iliadis, L., Pimenidis, E. (eds) Artificial Intelligence Applications and Innovations. AIAI 2019. IFIP Advances in Information and Communication Technology, vol 560. Springer, Cham. https://doi.org/10.1007/978-3-030-19909-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-19909-8_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-19908-1
Online ISBN: 978-3-030-19909-8
eBook Packages: Computer ScienceComputer Science (R0)