1 Introduction

Every year, over 130,000 people in the UK have a stroke. Many people live with moderate to severe disabilities as a result (The Stroke Association, 2006). Economic pressures within the National Health Service, and from health insurers, have forced such people to return to their homes immediately after initial rehabilitation in hospitals [13]. Patient treatment is generally stopped 6–9 months after release from hospital; however, motor function may take years to recover [710, 12, 1416]. Therefore, it is more cost-effective for patients to perform home-based rehabilitation using some form of assistive technology [3]. Telerehabilitation enables a hospital based clinician to better understand a patient’s home environment situation [18]. It also enables clinical consultation, research, evaluation, professional education, and health management, by overcoming geography, transportation and socio-economic barriers to treatment (Occupational Therapy and Veterans, 2007).

The following studies in telerehabilitation have indicated the effects of in-home rehabilitation, and their shortcomings: Chang and Lee [2] developed a telemedicine and tele-consultation system, and applied it to two clinical medicine programs: child rehabilitation in earthquake-struck areas, and severe acute respiratory syndrome (SARS) treatment in quarantined clients. Its videoconferencing facilities were web-based, and used the TCP/IP protocol. Unfortunately, this set-up may cause delays or image distortions/blackouts to both images and voice transmission during tele-consultation.

Another example of telerehabilitation can be found in [11], where a home-based haptic telerehabilitation system is described. The system focuses on a virtual driving environment, and a series of exercises which serve as an integrated diagnostic and therapeutic tool for upper limb motor rehabilitation. It facilitates effective visualisation and quantification of the patients’ motions, and associated pathologies during and following a prescribed series of exercise. Therapists can use remotely collected data to replay a patient’s exercise session on a digital human model. However, it does not provide real-time telecommunication between a patient and therapist.

Work in [1], presented a single unit staircase that provides the opportunity simultaneously to assess upper- and lower-limb function during stair gait. However, it is not suitable to be used in a home environment since it needs special equipments and is expensive.

Using our system, a patient can perform prescribed exercises at home. Motion trajectory is recorded and relayed to hospital based physiotherapists. After reviewing real-time video and asynchronous 3D animation, a physiotherapist can advise further therapy plans. This enables timely discussion or demonstration of rehabilitation movements, thereby improving patient compliance, and providing better patient care. The system is easy to setup, and uses free software and low-cost hardware.

In this paper we present a designed telerehabilitation prototype. The system structure, and algorithm employed for the purpose of motion detection and data transmission on-line, will be introduced subsequently. Experimental results for monitoring/detection of normal person daily activities will be subjectively evaluated. More specifically: Sect. 2 describes the proposed system architecture and introduces techniques used in our approach; Sect. 3 presents some experimental results to show the feasibility of the proposed system; and finally in Sect. 4, a brief discussion and conclusion are provided.

2 Design and methods

An Interactive internet-based System for tracking upper limb motion in home-based rehabilitation is presented in this section, which consists of three components, namely: motion tracking system; 3D animation; and telecommunication. The system architecture is shown in Fig. 1.

Fig. 1
figure 1

System configuration

In order to reduce cost, our system uses a peer-to-peer (P2P) network architecture, without the need for a server. internet protocol (IP) multicast is used to reduce network traffic, and the real-time transport protocol (RTP) is used to transmit video and audio over the internet in real time. In a P2P network, a number of peers are connected directly; and all peers provide resources, including storage space and computing power. Therefore, as nodes on the system increase, the total capacity of the system also increases. A P2P network has the following advantages over a client/server network (All about Peer-to-Peer, 2005):

  • No extra investment in server hardware and software is required.

  • It is easy to setup, and has high reliability.

  • Users are able to control resource sharing.

  • It enhances real-time communication and collaborative computing.

2.1 P2P network with group caching

Peer-to-peer networks allow video and audio streams to be used by a large group of users (e.g. videoconferencing). The strength of a videoconference system relies on the effective aggregation of communication bandwidth and disk space contributed by its participating peers. A group is defined to be a cluster of peers, which together can supply a complete video. In other words, the peers of a group together, supply group caching which caches a whole video. This can improve the performance of caching. Each group is created dynamically, and managed individually. There are two advantages to using a group caching technique. First, a peer requesting a video can locate a complete set of video pieces from its nearest peer (which caches some part of the video). Therefore, search scope is reduced significantly. Second, cached video data can be coordinated at a group level to balance data redundancy in each group. While such coordination maximally protects the video integrity of a group, it causes a minimal communication overhead; because the scheme splits a group whenever possible to limit its size.

2.2 P2P network with IP multicast

In a P2P network, the ability of each user to join the network can lead to a rapid growth in the network and distributed information repositories [4]. Since no server is used in a P2P network, a sender has to send the same data to a vast number of receivers; this generates heavy traffic. This highly variable characteristic makes IP multicast strategies the key technique in a P2P based videoconference (when attempting to achieve broadcast services over a network). IP multicast can help to reduce network traffic, because multicast routers make copies of incoming data and distribute them to a multicast tree. Over a multicast network service, data is sent from a source only once; the network determines an optimal route to all receivers, duplicating a data packet only when necessary (as shown in Fig. 2). Multicast is an efficient method for group communication.

Fig. 2
figure 2

Data flow in multicast

2.3 Data-driven 3D animation

Physiotherapists can continuously observe and chat with patients using the telecommunication system. However, this requires physiotherapists and patients to be online at the same time. It may be difficult for therapists who are busy to monitor patients all the time. Therefore, animation becomes necessary in this case. A 3D animation enables a therapist to display collected arm motion data off-line (when they cannot monitor patients in real time). In addition, therapists and patients can compare a data-driven 3D animation of a patient’s movements, with historical case studies of normal and abnormal motions. They can then evaluate the efficiency of rehabilitation schemes. Since the 3D animation was data-driven to simulate the human arm movement, both patients and therapists only need to observe the different txt data files for historical comparing. Therefore, this scheme has the advantages of less data storage and transmission than other options, such as video recording.

We aim to reconstruct arm movements using a digital human model with a data-driven 3D animation, where a data file comes from inertial sensors fixed to a patient’s arm. We implemented an animation system using Java and Java3d. Our system can animate arm motions from two different directions: one facing the front, the other facing the right-hand side. Therefore, a therapist can see a patient’s arm motion from different angles. Figure 3 shows the main interface for the 3D animation.

Fig. 3
figure 3

The 3D animation interface a Front view. b Side view

Before describing the animation, we need to look at coordinate alignment. That is to say, the coordinate system used by the animation package must match that of the sensor systems. Let:

  1. 1.

    Wrist position (WP) represents the wrist movement.

    $$ {\text{WP}} = (W_{x} ,W_{y} ,W_{z} ) $$
  2. 2.

    Elbow position (EP) and forearm rotation (FR) represent the forearm movement.

    $$ {\text{EP}} = (E_{x} ,E_{y} ,E_{z} ) \cup {\text{FR}} = (\varphi _{{\rm fr}} ,\theta _{{\rm fr}} ,\psi _{{\rm fr}} )$$
  3. 3.

    Shoulder position (SP) and upper arm rotation (UR) represent the upper arm movement.

    $$ {\text{SP}} = (S_{x} ,S_{y} ,S_{z} ) \cup {\text{UR}} = (\varphi _{{\rm ur}} ,\theta _{{\rm ur}} ,\psi _{{\rm ur}} ) $$

All of these three joint positions are represented using three elements (single-precision floating point x, y, and z coordinates in metres). The rotation is represented using three Euler Angles (in radians). If it is assumed that:

figure a

To convert data measured by the sensors into the value of Java3D coordinates, we use the following formula:

$$ \begin{aligned}{} {\text{Position}} & = V_{{{\text{measured}}}} + {\text{offset}} \\ {\text{Rotation}} & = {\text{gain}} \times V_{{{\text{measured}}}} + {\text{offset}} \\ \end{aligned} $$

The offset and gain value are different for each axis, and determined using the data model and sensor calibration. For example, computed elbow position and forearm rotation can be converted into Java3D coordinates as follows: In the equations above, ElbowP and ForearmR are the position and rotation value of an elbow in Java3D coordinates. EP and FR are the position and rotation measured values in sensor coordinates.

$$ \begin{aligned}{} &{\text{ElbowP}} = {\text{EP}} + {\text{offset}} \\ &{\text{ForearmR}} = {\text{gain}} \times {\text{FR}} + {\text{offset}} \\ \end{aligned} $$

In addition, the Java3D model used to represent a four-element axis-angle, consists of three floating-point coordinates (x, y, and z, and an angle of rotation in radians). An axis-angle is a rotation of angle radians about the vector x, y, z (the Java3D API Specification, 2002). Let EbAA.X, EbAA.Y, and EbAA.Z, be the x, y, and z components of an axis-angle EbAA (Elbow Axis-Angle). Then we have:

$$ \begin{aligned}{\text{EbAA}}_{i} &= {\text{EbAA.X}}_{i} \times {\text{EbAA.Y}}_{i} \times {\text{EbAA.Z}}_{i} \\ {\text{EbAA.X}}_{i} & = ({\text{EbP}}(x_{i} ),\;{\text{ForearmR}}(x_{i} )) \\ &= (Ex_{i} ,\;Ey_{{(i - 1)}} ,\;Ez_{{(i - 1)}} ,\;\rm gain(x) \times \varphi _{{\rm fri}} + \Updelta x)\\ {\text{EbAA.Y}}_{i} &= ({\text{EbP}}(y_{i} ),\;{\text{ForearmR}}(y_{i} )) \\ &= (Ex_{{(i - 1)}} ,\;Ey_{i} ,\;Ez_{{(i - 1)}} ,\;\rm gain(y) \times \theta _{{\rm fri}} + \Updelta y) \\ {\text{EbAA}}.Z_{i} & = ({\text{EbP}}(z_{i} ),\;{\text{ForearmR}}(z_{i} )) \\ &= (Ex_{{(i - 1)}} ,\;Ey_{{(i - 1)}} ,\;Ez_{i} ,\;\rm gain(z) \times \psi _{{fri}} + \Updelta z) \\ \end{aligned} $$

where the values of gain (x), gain (y), and gain (z), as well as Δx, Δy, and Δz, are adjustable constants. In addition, i is the frame index.

2.4 Telecommunication

The telecommunication software is written in Java, i.e. Java JDK1.5.0 and JMF2.1.1. Video and audio data are captured and transmitted using the Java media framework (JMF), and real-time transport protocol (RTP). The JMF is a powerful toolkit for processing time-based media, such as video and audio. It provides a unified architecture and messaging protocol for managing the acquisition, processing, and delivery, of time-based media data.

To send or receive a live media broadcast, or conduct video consultation over the internet or Intranet, we need to be able to receive and transmit media streams in real time. This requires high network throughput. The HTTP and FTP protocols are based on the transmission control protocol (TCP); which is a transport-layer protocol designed for reliable data communication over low-bandwidth, high-error-rate networks. TCP used for static data does not work well for streaming media. The user datagram protocol (UDP) is a general transport-layer protocol, and typically unreliable. The real-time transport protocol (RTP) is the internet standard for transporting real-time data such as audio and video. The RTP consists of a data part, and a control part; the RTCP (Real time control protocol). Real-time transport protocol is often used instead of UDP.

Figure 4 shows a flowchart of the interactive telecommunication part of our system. Initially the system detects webcams and microphones. If the detection result is “No”, a grey main window appears on the screen. At this time, it is unable to show local video, and will disable the transmission function on the main window. However, a user can still receive remote video and audio streams and use the whiteboard to communicate with other participants; in order to communicate with other participants. If the detection result is “Yes”, the system creates two kinds of DataSource for video (webcam) and audio (microphone) data, and then merges these into a single DataSource which can easily be transmitted over the internet.

Fig. 4
figure 4

The flowchart of interactive telecommunication

In order to facilitate different players for playback, we could take these actions from one DataSource: concurrently show video and audio in a local screen; transmit data over the internet; or recording data to a movie file. In such circumstances, it is necessary to create DataSource clones. A user only needs to click different buttons in a main window to achieve different tasks; as shown in Fig. 4. For example, a user can transmit and receive video and audio streams over the internet. Note that a receiving session can be implemented many times for different transmissions at the same time, and can be shown in different windows. Therefore, the system can implement one-to-one, one-to-many, or many-to-many bilateral real-time transmissions (for video and audio streams). In addition, users can save video and audio to their hard disk in a movie file format.

In order to provide a flexible choice for users, our system enables users to obtain a multicast IP address and port number by clicking a transmission or receive button. With this information, an applicant joins a multicast group in which RTP and RTCP are used. User video and audio data streams are encapsulated in RTP packets. Session control information is encapsulated in RTCP packets, which are in turn put in IP packets and sent out using multicast. On the receiving side, the multicast IP address and ports number should be the same as those being used by the sender. In order to connect correctly, a hospital should have a fixed multicast IP address and port number, as well as assign a fixed multicast IP address and range of port numbers to patients.

3 Results

In order to evaluate the proposed system, we designed three different experiments to demonstrate the performance of the system in motion detection, 3D animation, and telecommunication. Our system only needs a minimum hardware set-up: two MTx sensors produced by Xsens Dynamics Technologies, Netherlands, are placed next to the wrist (inwards), and elbow (outward) joints, respectively. In addition, a webcam and microphone need to be installed at both the patient’s and therapist’s site for real-time audio-visual communication.

3.1 Motion detection

The motion tracking system is used to track arm movements of a patient, and record the position and rotation of wrist, elbow, and shoulder joints to a data file, based on two MTx inertial sensors. The system employs sensor fusion and optimisation techniques to support post-stroke rehabilitation programs. It is implemented in Visual C++. There is a Bluetooth wireless connection between the computer and the two MTx sensors; via a small box worn on the subject’s waist (as shown in the motion tracking sensors of Fig. 1). The wireless connection allows a subject to carry out motion exercises freely [20]. Our design consists of minimal hardware, and is therefore suitable for home use.

In order to determine the position of an upper limb, the two inertial sensors are placed 1 cm away from the wrist and elbow joints respectively, facing away from the body, and each sensor box is mounted on a PVC holder using Velcro straps. Each sensor consists of a tri-axial accelerometer, a tri-axial gyroscope, and a tri-axial magnetic sensor. It measures drift-free 3D orientation, as well as kinematic data: 3D acceleration; 3D rate of turn (rate gyro); and the 3D earth-magnetic field data of a patient’s arm (from which real-time 3D arm movements can be reconstructed for a 3D demonstration). The motion tracking strategy applied in this system is illustrated in Fig. 5.

Fig. 5
figure 5

Motion tracking algorithm

In order to estimate the position of a patient’s arm, we use a real-time sensor fusion algorithm to accurately calculate the absolute orientation in 3D (from the miniature rate of turn sensors (gyroscopes), accelerometers, and magnetometers). We use a Kalman filter to compute the position of the shoulder, elbow, and wrist joints (based on kinematics models, and the inclination of the upper-arm). In our model, the shoulder and elbow joints have six degrees of freedom (6 DOFs), respectively.

Here we show three groups of tests, i.e. reach, drink, and reach-flexion, and their corresponding results. These ambulatory movements are frequently observed in a home based environment. Before all tests started, the left forearm was placed on a desk and kept in a horizontal posture. In our experiments, a measurement of 20 s was presented and was repeated five times for each session (with three times for each action); the acquisition rate was 25 Hz. A flowchart of arm motion tracking is shown in Fig. 6.

Fig. 6
figure 6

The process of arm motions tracking

As an example, Fig. 7 illustrates the data analysis of a reach test. More specifically, Fig. 7a shows that the wrist joint position has movements of −0.05 to 0.2 m along the z-axis; while Fig. 7b shows that the elbow joint position has movements of −0.02–0.08 m along the z-axis. Both figures show the wrist and elbow joint have five periods of movement respectively during an arm reach test. These results corresponded well to the real data (we prescribed the elbow joint to move about 0.09 m along the z-axis orientation in the reach test with iterations of five times). This shows that our motion tracking algorithm is feasible and accurate in these movement circumstances.

Fig. 7
figure 7

The reach test data: estimation of joint position a Wrist position, b Elbow position

3.2 Representing 3D animation

To render received motion data in a 3D space for both patients and therapists, three translations and three Euler angles were used for animation. In fact, these six-DOF data sets come from transmission via P2P communication. The accuracy of our 3D animation depends on two issues: one is the alignment between sensor and Java3D coordinate systems. The other is the real-time rendering of the six DOF from a text file (sent from the sender via the internet).

In order to evaluate arm movement imitation, we had to measure how close the 3D animation was to real human arm movement. For this purpose, it was useful to compare motion trajectories, number of periods, and the motion speed of the animation and real human movements. We performed three kinds of motion trajectory test, which were: reach, drink, and reach-flexion. We undertook tests of 20 s with five iterations. Their starting, middle, and ending positions are shown in Fig. 8. They have the same starting position. These three experiments indicated that the animation had very similar trajectories (as shown in Fig. 9, the animation and real human arm movement have similar trajectories of the drink motion), same periods (five times), and same motion times (20 s), to those of a real human arm motion.

Fig. 8
figure 8

The positions for three different motions a Reach movement, b Drink movement, c Reach-flexion movement

Fig. 9
figure 9

Comparing the trajectory of drink motion between the 3D animation and real human movement in three positions

Figure 10 shows the results of 3D animation in several different motions. Figure 10a and b show the reach action from different viewing angles; i.e. either facing the front, or facing the right-hand side. They show that the arm is extended and flexed. Figure 10c shows the arm conducting drinking tests; in which the forearm is stretched and then flexed to touch the mouth. The experimental results indicate that the proposed tracking and 3D animation system have good performance for different motions.

Fig. 10
figure 10

some motion sequences: reaching and drinking. a A sequence of reach movements when the patient facing the front, b A sequence of reach movements when the patient facing the side, c A sequence of drink movements when the patient facing the front

3.3 Real-time telecommunication

We performed two telecommunication models (one-to-one and many-to-many) to estimate the videoconference quality in P2P network architecture. To demonstrate the one-to-one model conversation, we firstly set up a telecommunication between the patient and therapist. The patient launches rehabilitation exercises at home, while being monitored and instructed remotely by a therapist. In this experiment, both the patient and therapist just need to click two buttons: Transmit and Receive. The two visual windows (local and remote) will be shown in their computer screen when they online at same time (as shown in Fig. 4). For a group visual discussion, we designed a many-to-many videoconferencing interface for four patients. In the experiment, the participants need to click the Transmit button one time and click the Receive button three times, thus the four visual windows that include one local and three remote will be shown in every participant’s computer screen. The videoconferencing quality will be guaranteed no matter how many participants get involved in the transmission (normally it is less than 10 persons in the many-to-many model).

During the videoconference, the video and audio streams were transmitted as separate RTP sessions over IP multicast network services. RTCP packets are transmitted for each media, using two different pairs of UDP ports and multicast IP addresses. One motivation for this separation was to allow some participants in the session to receive only one chosen media. In our two experiments, the network connection is the ADSL (Asymmetric Digital Subscriber Line) mode at a home environment. The ADSL speed is 2 M/512 kbs (2 M download and 512 kbs upload speed). In order to reduce network traffic, we set the image capture rate at 15 frames/s. The transmission rate was approximately 172.50 packets/s, and averaged 927.45 bytes/packet. In other words, 150.95 KB of video and audio data could be streamed per second. Figure 11 shows transmission packets per second. This graph indicates that the transmission was usually stable and consistent. We can get the high definition video and audio signal in the two-model videoconference. Figure 11 also shows, in some cases, transmission seemed to be lost. This was due to packet verification performed at the receiver end, and the network speed maybe not stable. But it does not affect the videoconference quality in our experiments. The problem of information loss is caused by signal element drop, or late arrival. In a videoconference, lost audio causes distortions such as noise or crackle sounds, while lost video, possibly causes image blurring, distortions, or blackouts (QoS solutions Configuration Guide, 2006). Therefore, video delivery requires guaranteed network consistency. Besides latency, bandwidth and data loss should also be emphasized. In our system, a management scheme has been employed to guarantee quality of service (QoS). It solves these problems using two techniques: group caching and IP multicast. It was described in Sect. 2.1 and 2.2. Our outcome of experiments proved, if the bandwidth is greater than 1.50 Mbps, a videoconference will run normally (if not, it will cause latency or blackouts).

Fig. 11
figure 11

Packets per second for video transmission

4 Discussion and conclusion

The proposed telerehabilitation system integrates multiple functions for estimating motion of human upper limb, including motion tracking, data-driven 3D animation, and telecommunication. The motion tracking system employs two inertial sensors, incorporating sensor fusion and optimization techniques, to collect orientation and location data of the whole arm movement, and record it to a file when patients do rehabilitation training (e.g. reach, drink, and reach-flexion). It has been reported that inertial sensors (including accelerometers and gyroscopes) outperform other sensors such as pedometers, goniometers and pressure sensors in terms of flexibility and accuracy [19]; and therefore inertial sensors are justified to be suitable for the home-based rehabilitation system like ours. Work in [15, 21] also show that the inertial sensor integrates the measurements from the accelerometers and gyroscopes, the sensor fusion strategy can successfully handle the drift problem. However, these established systems can only recover part of the arm parameters, e.g. lower arm’s position.

The data-driven 3D animation system enables both therapists and patients to watch the arm movement offline using the collected arm motion data that comes from a motion tracking system by a digital human model. The telecommunication system allows patients to observe therapy instructions from therapists or have a group videoconference with several patients and therapists. Due to the small size of the data file that was used to control the 3D animation, the historical rehabilitation data can be easy stored locally and transmitted remotely via the internet. To our knowledge, so far, many studies in the 3D animation have been developed for virtual rehabilitation games (such as work in [5, 11, 13]), but little research has been carried out in the data-driven 3D animation for rehabilitation exercises. This is a novel area open for investigation.

The videoconference system uses the P2P network architecture, IP multicast, and RTP protocol to transmit the video and audio streams in real time. It not only can reduce the cost (without need for a server), but also can reduce network traffic. It enables the convergence of high-speed data, high definition video and audio. Especially, the telecommunication can be held in many-to-many model. For example, several patients could, if they desired, visually chat together over the internet, which might encourage them to engage in rehabilitation.

There are a number of the tele-rehabilitation systems that are developed recently. Java Therapy is a developed system for guiding repeated movement practice [17]. This system is affordable, accessible, and readily flexible to different input devices, e.g. force-feedback joystick. As pointed out in their report, Reinkensmeyer et al. suggest using a camera for taking pictures for a remote clinician to observe the user’s performance. Our system, in spite of its lack of force detection, is capable of supporting this remote observation function, whilst allowing the communication between the user and the clinicians. We believe that this function may encourage the extensive use of the proposed system as the interaction between the user and the clinicians is maintained during a real clinical trial.

Rutgers Ankle is a new orthopedic rehabilitation system for home use, which allows remote monitoring by therapists [5]. Patients can gain recovery when they perform diagnostic functions, move their ankles, experience force exertion and coordination. The attached host computer records patient progress and send away for remote evaluation. Although this system can accomplish the necessary functions requested by clinicians, it shows a significant weakness, where the users cannot achieve free motion indoors and outdoors. This is due to the connection between the sensor units and the main PC. Our system covers up this weakness in a generic way. In other words, our system does not constrain the users when the latter is conducting any clinical trial. This feature is important because the users can choose the practice sites wherever they want and hence improve the interests of performing home based rehabilitation. A similar system to ours is “Rutgers Arm” [12]. Nevertheless, this “Rutgers Arm” can only be used for forearm recovery due to the lack of sensors attached with the upper arm. in addition, although the system allows remote monitoring, it does not allow a visual discussion between more than two participants.

A videoconferencing system is recently developed by Hoenig et al. [6]. This system uses a wireless video camera for a therapist to observe a patient when the latter conducts indoor activities. No three-dimensional information is available for further clinical assessment.

In conclusion, we have developed a new multi-function system for home-based rehabilitation. The arm motion tracking method is general and robust in different movement circumstances. In order to analysis motion data and evaluate the efficiency of rehabilitation training, both patients and therapists can save rehabilitation history data files (e.g. for 2 days or a week). Following a period of rehabilitation, users can choose different data files to control animation, and compare rehabilitation efficiency. Therefore, the proposed 3D animation can show the arm motion of patients repeatedly, and allow helpful data analysis, comparisons, and feedback. The patient launches rehabilitation exercises at home, while being monitored and instructed remotely by a therapist, it can improve patient compliance and recovery. The home-based rehabilitation is convenient for patients and their families, as well as huge cost saving for national health services.